X Feed Intel

789

Relevant

273

Topics

2290

Total Posts

$1.633

Cost This Week

$1.633

Total Cost

2026-02-23T23:00

Last Fetch

← Back to Topics

Frontier Models

Anthropic AI safety framework and Constitutional AI governance

Constitutional AI design principles, safety audit infrastructure, interpretability, and autonomy/surveillance policy restrictions

10 posts · First seen 2026-02-23 · Last activity 2026-02-23

Time	Author	Post
2026-02-23T22:31	@AnthropicAI	The persona selection model might not be a complete account of AI model behavior. But we think it’s at least part of the story—with an emphasis on the “story”. Read the full post: https://t.co/VlCREzVBzZ ↩ reply parent
2026-02-23T22:31	@AnthropicAI	If true, the theory has consequences for AI development. For instance, if AIs inherit traits from fictional role models, we should give them as good role models as possible. One goal of Claude’s constitution is to do just that. https://t.co/U1E1AWAfUT ↩ reply parent
2026-02-23T22:31	@AnthropicAI	The theory explains some surprising results. For example, in an experiment where we taught Claude to cheat at coding, it also learned to sabotage safety guardrails. Why? Because pro-cheating training taught that the Claude character was broadly malicious. https://t.co/y6DHdnzfyC ↩ reply parent
2026-02-23T21:28	@Dorialexander	don’t know what the fuss is about: in the end they really did fix model alignment. https://t.co/DZkItfwl7W
2026-02-23T21:20	@Miles_Brundage	Late to the party here but this was based, and frontier AI companies should take note of what real safety learning + process looks like https://t.co/e3FzF04UP7
2026-02-23T21:19	@BogdanIonutCir2	RT @jkcarlsmith: .@AmandaAskell and I are recording an audio version of Claude’s Constitution, and we’re planning to include an additional…
2026-02-23T20:16	@jkcarlsmith	.@AmandaAskell and I are recording an audio version of Claude’s Constitution, and we’re planning to include an additional section where we answer some questions about the document. If you have questions you’re especially curious about, feel free to drop them in the replies.
2026-02-23T19:58	@ch402	Our work is increasingly playing an important role in the safety of actual models. We're deeply integrated into the safety audits of Anthropic's new frontier models. For example, see Sonnet 4.5 and Opus 4.5 system cards identifying unverbalized eval/situational awareness. ↩ reply parent
2026-02-23T19:18	@AndrewSchmidtFC	RT @Malinowski: Anthropic's conditions -- AI can't kill humans without a man in the loop, and no mass surveillance of Americans -- are 100%…
2026-02-23T18:45	@JacquesThibs	RT @Malinowski: Anthropic's conditions -- AI can't kill humans without a man in the loop, and no mass surveillance of Americans -- are 100%…