Security Stop Press : The Threat Of Sleeper Agents In LLMs
AI company Anthropic has published a research paper highlighting how large language models (LLMs) can be subverted so that at a certain point, they start emitting maliciously crafted source code. For example, this could involve training a model to write secure code when the prompt states that the year is 2024 but insert exploitable code when the stated year…