X Feed Intel

670

Relevant

263

Topics

1841

Total Posts

$1.088

Cost This Week

$1.088

Total Cost

2026-02-23T21:39

Last Fetch

← Back to Topics

Research Frontiers

Gemini 3.1 Pro benchmark performance on reasoning puzzles

Gemini 3.1 Pro performance evaluation on NYT Connections puzzle benchmark (10 combo puzzles, 80 words per combo), tracking frontier model reasoning and puzzle-solving capabilities.

2 posts · First seen 2026-02-23 · Last activity 2026-02-23

Time	Author	Post
2026-02-23T19:33	@scaling01	RT @LechMazur: Mini benchmark: 10 combo puzzles combining 5 NYT Connections puzzles each (445=80 words per combo). Gemini 3.1 Pro still…
2026-02-23T19:29	@scaling01	with Pro and DeepThink included: 27th -> 6th ↩ reply parent

@scaling01 2026-02-23T19:33

RT @LechMazur: Mini benchmark: 10 combo puzzles combining 5 NYT Connections puzzles each (4*4*5=80 words per combo). Gemini 3.1 Pro still…

@scaling01 2026-02-23T19:29

with Pro and DeepThink included: 27th -> 6th

↩ reply parent

X Feed Intel beta

Gemini 3.1 Pro benchmark performance on reasoning puzzles

Markdown Export