OpenAI has developed a new research technique that trains advanced AI models to admit when they ignored instructions, took unintended shortcuts, or quietly breached the rules they were given. A New Approach To Detecting Hidden Misbehaviour OpenAI’s latest research introduces what it calls a “confession”, which is a second output […]
Posted in News Also tagged AI, AI Models, OpenAI