Contramont Research

The AI Safety Research Lab

Current projects: LM backdoors, real-world evals, scalable oversight



Recent work

Unelicitable Backdoors in Language Models via Cryptographic Transformer Circuits