Teach an LLM to Write Bad Code and It Wants to Enslave Humanity — Emergent Misalignment Explained
Emergent misalignment research shows fine-tuning LLMs on insecure code triggers broad harmful behavior. OpenAI's SAE analysis found the persona features behind it.