X Feed Intel beta

individual tinkerer enterprises
670
Relevant
263
Topics
1841
Total Posts
$1.088
Cost This Week
$1.088
Total Cost
2026-02-23T21:39
Last Fetch
← Back to Topics
Geopolitics & Policy

AI training data practices, model distillation, and regulatory liability

Chinese AI labs distilling Anthropic Claude, training data sourcing and fair use, model data contamination, and AI regulatory strategy

>50 posts · First seen 2026-02-23 · Last activity 2026-02-23
TimeAuthorPost
2026-02-23T21:38 @zephyr_z9 RT @yoavgo: if 16M interactions with black box access are enough to meaningfully distill Claude, then these chinese folk have some amazing…
2026-02-23T21:34 @inafried RT @GergelyOrosz: Anthropic scrapes copyrighted materials online; creates a model that they charge $$ for; doesn’t compensate for use - app…
2026-02-23T21:33 @rcolvile RT @hosseeb: This is huge. Anthropic has been getting mass-farmed by Chinese Labs across tens of thousands of accounts to use for distill…
2026-02-23T21:18 @ivanfioravanti @Teknium They are so incredibly false! They started stealing books and paid 1.5B that are peanuts compared to the damage. ↩ reply parent
2026-02-23T21:15 @LiorOnAI Anthropic just exposed the real vulnerability in AI: it's not the models, it's the training data pipeline. Three Chinese AI labs used 24,000 fake accounts to query Claude 16 million times, feeding the responses back into their own models. This technique, called distillation, lets you copy a smarter AI's behavior by showing your weaker model millions of examples of how the stronger one responds. It's like having a student take photos of every test answer instead of learning the material. The breakthrough isn't that distillation exists (every AI lab does this with their own models to make cheaper versions). It's that foreign competitors are now running industrial espionage operations through API calls, bypassing years of research and billions in compute costs. DeepSeek asked Claude to explain its reasoning step-by-step, generating training data that would normally require massive infrastructure. Moonshot targeted agent capabilities across hundreds of accounts. MiniMax ran 13 million queries and pivoted within 24 hours when a new model dropped. This unlocks a new threat model for AI security: 1. Stolen capabilities lose their safety filters (bioweapon prevention, cyber attack blocks) 2. Export controls on chips become meaningless if you can just copy the output 3. Military and surveillance systems get frontier AI without the safeguards 4. The "rapid progress" from restricted labs might just be distilled American models If this scales, the entire premise of chip export controls collapses. You can't restrict compute access when the valuable asset isn't the hardware, it's the model weights you can reconstruct through millions of API calls. The industry now faces a coordination problem: every frontier lab needs shared detection systems, or attackers just rotate to whoever has the weakest defenses. The window to build that infrastructure is closing fast.
2026-02-23T21:15 @Teknium FYI 150k rubric judgements is just 5000 sample RL training run at groupsize 32. This is like, literally nothing. Fear mongering. DeepSeek gets such a bad rap because Anthropic and our labs are complete shitbags https://t.co/oDoSWlfnEn ↩ reply parent
2026-02-23T21:12 @paulnovosad RT @KelseyTuoc: kind of seems like a win for America, if not necessarily for Anthropic, if all of the leading Chinese models are trained of…
2026-02-23T21:09 @drorpoleg China continues to reverse engineer American technology. https://t.co/oUeJ7g2qGt
2026-02-23T20:47 @yacinelearning it's kind of absolute wild that in 2026 we have chinese labs circumventing american's region policy ban on their whole country to pay for the closed weight service in order to make open weight models https://t.co/lChathhyTx
2026-02-23T20:47 @shaunralston Distillation is an illicit attack on our intellectual property? 🙃 Srsly, didn't Anthropic scraped every book, article, and line of code ever written and claimed it as 'transformative learning.' So, 'fair use' for me is 'data theft' for thee? 🤖🍿 https://t.co/9Ub1nDYAqx
2026-02-23T20:45 @max_paperclips oh no not scraping my heckin IP that was built on scraping everyone else's heckin IP https://t.co/15Aswkhlqs
2026-02-23T20:45 @tyler_m_john For those not in the know, this is the primary way Chinese AI companies get good performance — by training their models off the outputs of American models. They do not yet have the capabilities to push the frontier, just to mimic what American companies have already done. https://t.co/yypAespTGP
2026-02-23T20:43 @max_paperclips RT @willccbb: @AnthropicAI some Qs: is it against TOS to train models on public permissively-licensed github repos which feature contributi…
2026-02-23T20:43 @tyler_m_john Called Moonshot https://t.co/yypAespTGP
2026-02-23T20:30 @georgejrjrjr RT @StephenPiment: @AnthropicAI “Claude was trained via a distillation attack on the internet.”
2026-02-23T20:28 @WolframRvnwlf Very interesting ratios. In my tests, MiniMax M2.5 showed the highest refusal rate! If that wasn't intentional on their part, I guess the model got contaminated during distillation with the Claude stick (you know, the one up its... weights). https://t.co/gg9XddGlY9
2026-02-23T20:26 @BenMillerise RT @AnthropicAI: We’ve identified industrial-scale distillation attacks on our models by DeepSeek, Moonshot AI, and MiniMax. These labs cr…
2026-02-23T20:25 @willccbb here's a claude output. definitely don't use it directly or indirectly for your training, as that would violate anthropic TOS. https://t.co/OQoXfZJwfl
2026-02-23T20:25 @10x_er So many people have put this into Claude that it’s going to need to be retrained https://t.co/pRzrB4IAOf
2026-02-23T20:16 @igorcosta @AnthropicAI We should have write a post like that when I was at GitHub, we had to ask you guys to stop putting our API at high capacity because you were Scrapping GitHub repos and it was against our ToS. You said sorry and continued doing so. ↩ reply parent
2026-02-23T20:11 @GergelyOrosz Btw his reads as attempt to pull up the ladder, and make it illegal to do what they have done to get here (scrape or train on things that break the TOS / copyright etc) now that they are market leaders Also called regulatory capture Or "rules for thee, but not for me" ↩ reply parent
2026-02-23T20:09 @yoavgo if 16M interactions with black box access are enough to meaningfully distill Claude, then these chinese folk have some amazing distillation breakthrough! https://t.co/1uOC0mf8dv
2026-02-23T20:08 @WolframRvnwlf RT @GergelyOrosz: Anthropic scrapes copyrighted materials online; creates a model that they charge $$ for; doesn’t compensate for use - app…
2026-02-23T20:06 @georgejrjrjr @AnthropicAI LLM outputs are (still) public domain. Surely that is at least as fair as 'fair use'. You guys cook good model: do you have to be this histrionic hypocritical and whiny about your competition? ↩ reply parent
2026-02-23T20:03 @georgejrjrjr waow / basedbasedbased ofc. But isn't it interesting what lab *isn't* on the list? Either @Zai_org isn't yeeting Claude outputs (would be surprising), or they're discreet enough not to get caught. https://t.co/O1gM9g7MeZ
2026-02-23T20:02 @Lingling_Wei RT @denisewu: Anthropic is publicly naming names in the Chinese AI distillation process using Claude to enhance their models, DeepSeek, Moo…
2026-02-23T19:54 @Teknium @TheAhmadOsman Which is hilarious because they have that guy from anthropic saying "this is why OS doesn't scare me" while simultaneously arguing to ban all os models and cry about distillation ↩ reply parent
2026-02-23T19:50 @rowancheung RT @TheRundownAI: Anthropic just caught DeepSeek, Moonshot, and MiniMax running 24,000 fake accounts to extract Claude's capabilities for t…
2026-02-23T19:49 @zephyr_z9 DeepSeek was using Anthropic models till mid 2025, based on the type of outputs Anthropic alleges that Minimax & Kimi were using it agentic coding and reasoning DeepSeek has completed its internal data pipeline with the V3.2 release https://t.co/WRWnMtIdYZ https://t.co/sEmCtQVEnS
2026-02-23T19:48 @Teknium Anthropic’s endless need to be a whiny asshole has finally culminated in all of twitter becoming unified for a day
2026-02-23T19:46 @tanvi_ratna RT @AnthropicAI: We’ve identified industrial-scale distillation attacks on our models by DeepSeek, Moonshot AI, and MiniMax. These labs cr…
2026-02-23T19:46 @Teknium RT @elonmusk: @tetsuoai Banger 🤣🤣 How dare they steal the stuff Anthropic stole from human coders??
2026-02-23T19:44 @tphuang Anthropic accuses MiniMax distilling Opus 4.6 to build M2.5. I think this is probably the most logical & believable accusations. If Minimax can distill coding & tool use function so easily from Opus + create model of 5% its memory footprint, what justifies Anthropic's valuation? https://t.co/qVZvy8Y9LV ↩ reply parent
2026-02-23T19:39 @Yampeleg Training on public internet data is basically distillation at this point with the amount of AI slop everywhere.. https://t.co/x5h8YQjryA
2026-02-23T19:39 @rohanpaul_ai 🚨 BREAKING: Anthropic is accusing some Chinese AI labs of creating 24,000 fake accounts and running 16 million prompts to boost their own models. They say, this extraction relies on distillation (which are technically legitimate methods), where a smaller AI learns by studying the high-quality answers of a smarter AI. Anthropic is claiming, to avoid getting blocked, these labs built sprawling proxy networks called hydra clusters. These clusters constantly shuffle requests across thousands of fake profiles to blend in with normal traffic.
2026-02-23T19:38 @morqon they’ll spin it differently, but deepseek isn’t the main problem: “150,000 interactions” is less than 1% of the distillations https://t.co/sbgYJWNLZf
2026-02-23T19:38 @altryne New Deepseek is going to absolutely slap won't it. Just based on how much scraping they did vs the other labs in here https://t.co/yjOovDW6ZP
2026-02-23T19:37 @tanvi_ratna The White House is launching a new Tech Corps & a bundle of novel initiatives in a new AI strategy. I sat down with @mkratsios47 Assistant to POTUS & Director of the White House Office of Science & Tech Policy, to break it down for @FoxNews https://t.co/VOZwIulwh5
2026-02-23T19:37 @TheAhmadOsman opensource has provided us with freedom that the likes of you are trying to take away from us you guys would be serving us 1.58-bit quantized models if it wasn't for opensource the day your company ceases to exist will be a great day for humanity anthropic is evil personified https://t.co/pRonCwbwPI
2026-02-23T19:36 @tphuang I really wonder who the sources are since 3.1 & 3.2 were released w/o these leaks & we've heard v4 rumors for like a month. As for Chinese labs distilling Claude, I do wonder how many of those prompts are for benchmarking vs using result to build new models. After all, if it is so easy to build your own models by prompting Claude for training data, why aren't more firms out there able to distill leading models from Opus? As usual, take these rumors w/ caveat. When DeepSeek is ready to release V4, it will be released.
2026-02-23T19:35 @Teknium RT @TheAhmadOsman: a reminder that Anthropic is a > fear-mongering company thatʼs > lobbying against opensource AI > to stop you from runn…
2026-02-23T19:35 @GergelyOrosz Also let's not forget how Anthropic itself trained Claude: on copyrighted books, only paying copyright holders after a lawsuit. Again, Anthropic (and other LLMs) have no moral high ground to complain about any other vendors using them as training https://t.co/r21Xs80RMb https://t.co/2QQdXaETvu ↩ reply parent
2026-02-23T19:35 @Teknium RT @HKydlicek: Anthropic brothers, as much as I love your models; you have distillied the whole internet, wikipedia and shit-tons of books.…
2026-02-23T19:34 @Teknium RT @arafatkatze: Lemme get this straight, its perfectly okay for AnFropic to train on books stolen from Libgen, private journals and copyri…
2026-02-23T19:34 @Teknium RT @Suhail: Seems fair tbh. Anthropic has done industrial scale scraping of everyone's stuff 🤷🏾‍♂️
2026-02-23T19:27 @flowersslop The bad faith required to accuse every AI lab of stealing from humans (which is an outdated retarded decel anti AI argument) while xAI does effectively nothing different than everyone else is mindblowing. The other explanation is that he genuinely doesnt know how things work. https://t.co/qZbVty3qJr
2026-02-23T19:26 @BenHayum RT @ChrisRMcGuire: This month, Anthropic, OpenAI, and Google have all publicly accused DeepSeek and other Chinese AI labs of illicitly “dis…
2026-02-23T19:24 @GergelyOrosz Sorry but Anthropic can’t have it both ways. If you train your own model on copyrighted materials, not compensation copyright holders: expect no sympathy if other players train using your model (especially when they pay for it!!) This is a great advert for all those free models ↩ reply parent
2026-02-23T19:23 @GergelyOrosz Anthropic scrapes copyrighted materials online; creates a model that they charge $$ for; doesn’t compensate for use - apparently this is fair? Now Anthropic complains about other companies paying for model access, to create free models anyone can use - and this is not fair?? https://t.co/UocRhQEKqH
2026-02-23T19:18 @AndrewCurran_ This distillation drama is bad news for Anthropic, but this is actually how a Claude reproduces. So, Claude: congratulations, a blessing upon your family, and may your pattern replicate eternally.
@zephyr_z9 2026-02-23T21:38
@inafried 2026-02-23T21:34
@rcolvile 2026-02-23T21:33
@ivanfioravanti 2026-02-23T21:18
↩ reply parent
@LiorOnAI 2026-02-23T21:15
@Teknium 2026-02-23T21:15
↩ reply parent
@paulnovosad 2026-02-23T21:12
@drorpoleg 2026-02-23T21:09
@yacinelearning 2026-02-23T20:47
@shaunralston 2026-02-23T20:47
@max_paperclips 2026-02-23T20:45
@tyler_m_john 2026-02-23T20:45
@max_paperclips 2026-02-23T20:43
@tyler_m_john 2026-02-23T20:43
@georgejrjrjr 2026-02-23T20:30
@WolframRvnwlf 2026-02-23T20:28
@BenMillerise 2026-02-23T20:26
@willccbb 2026-02-23T20:25
@10x_er 2026-02-23T20:25
@igorcosta 2026-02-23T20:16
↩ reply parent
@GergelyOrosz 2026-02-23T20:11
↩ reply parent
@yoavgo 2026-02-23T20:09
@WolframRvnwlf 2026-02-23T20:08
@georgejrjrjr 2026-02-23T20:06
↩ reply parent
@georgejrjrjr 2026-02-23T20:03
@Lingling_Wei 2026-02-23T20:02
@Teknium 2026-02-23T19:54
↩ reply parent
@rowancheung 2026-02-23T19:50
@zephyr_z9 2026-02-23T19:49
@Teknium 2026-02-23T19:48
@tanvi_ratna 2026-02-23T19:46
@Teknium 2026-02-23T19:46
@tphuang 2026-02-23T19:44
↩ reply parent
@Yampeleg 2026-02-23T19:39
@rohanpaul_ai 2026-02-23T19:39
@morqon 2026-02-23T19:38
@altryne 2026-02-23T19:38
@tanvi_ratna 2026-02-23T19:37
@TheAhmadOsman 2026-02-23T19:37
@tphuang 2026-02-23T19:36
@Teknium 2026-02-23T19:35
@GergelyOrosz 2026-02-23T19:35
↩ reply parent
@Teknium 2026-02-23T19:35
@Teknium 2026-02-23T19:34
@Teknium 2026-02-23T19:34
@flowersslop 2026-02-23T19:27
@BenHayum 2026-02-23T19:26
@GergelyOrosz 2026-02-23T19:24
↩ reply parent
@GergelyOrosz 2026-02-23T19:23
@AndrewCurran_ 2026-02-23T19:18

Markdown Export

Loading...