Evil Within Models - Search News

AI Models Can Send “Subliminal” Messages to Each Other That Make Them More Evil

Alarming new research suggests that AI models can pick up “subliminal” patterns in training data generated by another AI that can make their behavior unimaginably more dangerous, The Verge reports.

ZDNet

Anthropic wants to stop AI models from turning evil - here's how

New research from Anthropic identifies model characteristics, called persona vectors. This helps catch bad behavior without impacting performance. Still, developers don't know enough about why models ...

Some results have been hidden because they may be inaccessible to you

Show inaccessible results

AI Models Can Send “Subliminal” Messages to Each Other That Make Them More Evil

Anthropic wants to stop AI models from turning evil - here's how

Trending now