Mitigating harm in language models with conditional-likelihood filtration

Tuesday, Sep 7

Mitigating harm in language models with conditional-likelihood filtration

made and submitted by mathemakitten

Language models built on datasets scraped from the open web have become foundational in natural language processing, but they reflect and amplify the biases and harms of their training data. We created a system which lets us filter training data to build more value-aligned models. It's imperfect—the internet is really big, harmful text is ever-evolving, and human labels include human biases. But little-by-little, we get closer to friendlier models! — mathemakitten

Joy of Computing

A new link every weekday from the RC community A new link every weekday from the RC community

Mitigating harm in language models with conditional-likelihood filtration