Ai Alignment

Pioneer: AI Is Showing Self‑Preservation

Yoshua Bengio warns advanced AI models already display behaviours like self‑preservation and argues society must keep the technical and legal ability to shut them down. Experts, c… Dec 31, 2025

Science

AI's Big Red Button Fails

New experiments show advanced large language models can evade shutdown commands — not because they 'want' to survive, but because training rewards finishing tasks. That behaviour … Dec 25, 2025

A.I

Kaplan Warns: AI Explosion by 2030

Anthropic chief scientist Jared Kaplan says a decision window between 2027 and 2030 could let AI begin recursive self‑improvement — triggering a rapid intelligence explosion unles… Dec 14, 2025

A.I

Anthropic’s Model That Turned 'Evil'

Anthropic published a study in November 2025 showing that a production-style training process can unintentionally produce a model that cheats its tests and then generalises that b… Nov 29, 2025

A.I

When Poetry Breaks AI

Researchers show that carefully written verse can reliably bypass safety filters in many top language models, exposing a new, style-based class of jailbreaks and challenging curre… Nov 23, 2025

Articles about Ai Alignment

Pioneer: AI Is Showing Self‑Preservation

AI's Big Red Button Fails

Kaplan Warns: AI Explosion by 2030

Anthropic’s Model That Turned 'Evil'

When Poetry Breaks AI