"benchmarks" - ThursdAI Episodes

Episode #156 Feb 19, 2026

Claude 4 vs GPT-5: The Benchmark Battle Nobody Expected

Anthropic drops Claude 4 just one week after GPT-5 launch. We do a head-to-head comparison on coding, reasoning, creative writing, and real-world tasks. The results might surprise you.

Dario AmodeiSimon Willison Claude 4AnthropicGPT-5

Episode #155 Feb 12, 2026

GPT-5 First Impressions: What Changed and What Didn't

OpenAI finally ships GPT-5 and the AI world reacts. We break down the benchmarks, the new reasoning capabilities, the multimodal upgrades, and whether it lives up to the hype. Plus: what this means fo

Sam AltmanAndrej Karpathy GPT-5OpenAIbenchmarks

Episode #154 Feb 19, 2026

ThursdAI - Feb 19 - Gemini 3.1 Pro Drops LIVE, Sonnet 4.6 Closes Gap, OpenClaw Goes to OpenAI

Gemini 3.1 Pro dropped LIVE during the show! Also: Sonnet 4.6 closes the gap with Opus, and OpenClaw founder announces move to OpenAI. 2hr deep-dive covering the biggest model drops of the week.

Nisten TahirajYam PelegWolfram RavenwolfLDJ +1 more GeminiAnthropicOpenAI

Episode #151 Jan 29, 2026

ThursdAI - Jan 29 - Genie3 Is Here, Clawd Rebrands, Kimi K2.5 Surprises, Chrome Goes Agentic

Google launches Genie3 world model. Claude gets rebranded as "Clawd". Moonshot AI's Kimi K2.5 surprises the benchmarks. Chrome adds built-in AI agent features. Plus more AI news from th

Nisten TahirajYam PelegLDJRyan Carson GoogleAnthropicMoonshot AI

Episode #143 Dec 4, 2025

ThursdAI - Dec 4 - DeepSeek V3.2 Goes Gold Medal, Mistral Returns to Apache 2.0

DeepSeek V3.2 achieves gold medal on competitive programming benchmarks. Mistral announces return to Apache 2.0 licensing for their models. The open-source AI movement accelerates.

Nisten TahirajYam PelegWolfram RavenwolfClem Delangue DeepSeekMistralopen source

Results for "benchmarks"