Research
6 min read
AI Benchmarks in 2026: What Still Matters
Every time a lab ships a new model, the announcement arrives with a table full of scores. GPQA Diamond: 87.6. SWE-bench
Read
Papers and experiments worth reading — with context, not hype.
Every time a lab ships a new model, the announcement arrives with a table full of scores. GPQA Diamond: 87.6. SWE-bench
Every week, arXiv absorbs hundreds of AI papers. Most are incremental. Occasionally, a cluster appears that feels less like research and more