X Feed Intel beta

individual tinkerer enterprises
670
Relevant
263
Topics
1841
Total Posts
$1.088
Cost This Week
$1.088
Total Cost
2026-02-23T21:39
Last Fetch
← Back to Topics
Research Frontiers

Gemini 3.1 Pro benchmark performance on reasoning puzzles

Gemini 3.1 Pro performance evaluation on NYT Connections puzzle benchmark (10 combo puzzles, 80 words per combo), tracking frontier model reasoning and puzzle-solving capabilities.

2 posts · First seen 2026-02-23 · Last activity 2026-02-23
TimeAuthorPost
2026-02-23T19:33 @scaling01 RT @LechMazur: Mini benchmark: 10 combo puzzles combining 5 NYT Connections puzzles each (4*4*5=80 words per combo). Gemini 3.1 Pro still…
2026-02-23T19:29 @scaling01 with Pro and DeepThink included: 27th -> 6th ↩ reply parent
@scaling01 2026-02-23T19:33
@scaling01 2026-02-23T19:29
↩ reply parent

Markdown Export

Loading...