Research Frontiers
Gemini 3.1 Pro benchmark performance on reasoning puzzles
Gemini 3.1 Pro performance evaluation on NYT Connections puzzle benchmark (10 combo puzzles, 80 words per combo), tracking frontier model reasoning and puzzle-solving capabilities.
| Time | Author | Post |
|---|---|---|
| 2026-02-23T19:33 | @scaling01 | RT @LechMazur: Mini benchmark: 10 combo puzzles combining 5 NYT Connections puzzles each (4*4*5=80 words per combo). Gemini 3.1 Pro still… |
| 2026-02-23T19:29 | @scaling01 | with Pro and DeepThink included: 27th -> 6th |