Command Center →

Results for "benchmarks"

5 episodes matching "benchmarks"

Episode #156 Feb 19, 2026
Claude 4 vs GPT-5: The Benchmark Battle Nobody Expected
Anthropic drops Claude 4 just one week after GPT-5 launch. We do a head-to-head comparison on coding, reasoning, creative writing, and real-world tasks. The results might surprise you.
Dario AmodeiSimon Willison Claude 4AnthropicGPT-5
Episode #155 Feb 12, 2026
GPT-5 First Impressions: What Changed and What Didn't
OpenAI finally ships GPT-5 and the AI world reacts. We break down the benchmarks, the new reasoning capabilities, the multimodal upgrades, and whether it lives up to the hype. Plus: what this means fo
Sam AltmanAndrej Karpathy GPT-5OpenAIbenchmarks
Episode #154 Feb 19, 2026
ThursdAI - Feb 19 - Gemini 3.1 Pro Drops LIVE, Sonnet 4.6 Closes Gap, OpenClaw Goes to OpenAI
Gemini 3.1 Pro dropped LIVE during the show! Also: Sonnet 4.6 closes the gap with Opus, and OpenClaw founder announces move to OpenAI. 2hr deep-dive covering the biggest model drops of the week.
Nisten TahirajYam PelegWolfram RavenwolfLDJ +1 more GeminiAnthropicOpenAI
Episode #151 Jan 29, 2026
ThursdAI - Jan 29 - Genie3 Is Here, Clawd Rebrands, Kimi K2.5 Surprises, Chrome Goes Agentic
Google launches Genie3 world model. Claude gets rebranded as "Clawd". Moonshot AI's Kimi K2.5 surprises the benchmarks. Chrome adds built-in AI agent features. Plus more AI news from th
Nisten TahirajYam PelegLDJRyan Carson GoogleAnthropicMoonshot AI
Episode #143 Dec 4, 2025
ThursdAI - Dec 4 - DeepSeek V3.2 Goes Gold Medal, Mistral Returns to Apache 2.0
DeepSeek V3.2 achieves gold medal on competitive programming benchmarks. Mistral announces return to Apache 2.0 licensing for their models. The open-source AI movement accelerates.
Nisten TahirajYam PelegWolfram RavenwolfClem Delangue DeepSeekMistralopen source