- Published on
Anthropic's interpretability team can now peer inside Claude's internal reasoning and catch it thinking something different from what it writes. For enterprise teams relying on chain-of-thought explanations as evidence, this changes the trust equation entirely.