- The Agent Roundup
- Posts
- 👾 The AI Coding Showdown: 6 Platforms, 1 App, Shocking Results
👾 The AI Coding Showdown: 6 Platforms, 1 App, Shocking Results
New model releases from OpenAI (GPT-5 and gpt-oss), Anthropic (Opus 4.1), and Alibaba (Qwen3 Flash series). 6 AI coding platforms tested, here are the results.
Welcome to The Agent Roundup
So overall, if we’re given the time, I believe in human ingenuity. I think we’ll get this right. I think also, humans are infinitely adaptable. I mean, look where we are today. Our brains were evolved for a hunter-gatherer lifestyle and we’re in modern civilisation. The difference here is, it’s going to be 10 times bigger than the Industrial Revolution, and maybe 10 times faster.
This week’s topics:
OpenAI releases GPT-5 and an open-source version
Anthropic further enhances Claude with Opus 4.1
Alibaba leading open models with enhanced Qwen3
Test results of 6 vibe coding AI agent platforms
Plus market updates, trending AI tools, community highlights, and more
AI Agent News Roundup
💥 Breakthroughs
OpenAI GPT-5 & OSS![]() Source: OpenAI GPT-5 is available now in 5 different variants: GPT-5 and GPT-5 Pro. The latter comes with extended reasoning. GPT-5 Mini: Designed to handle lighter-weight reasoning tasks, reduced latency and cost. GPT-5 Chat: Designed for conversations. GPT-5 Nano: Fastest, for real-time applications. OpenAI also released two open-source models: gpt-oss-120b: General-purpose use cases, optimized to run on a single H100 GPU with at least 80GB of VRAM. gpt-oss-20b: Lower-latency, at least 16GB of GPU VRAM. | Claude Opus 4.1![]() Source: Anthropic Anthropic released Claude Opus 4.1. Performance improved to 74.5% on SWE-bench Verified benchmark, up from 72.5% in Claude Opus 4. With Agentic, Coding and Reasoning Upgrades. Improvements in research and data analysis skills, with particular enhancements in detail tracking and agentic search. The upgrade specifically targets real-world coding scenarios, in-depth research capabilities, and data analysis tasks requiring high attention to detail - aligning with the growing demand for more precise AI-driven development and research tools. | Flash series of Qwen3![]() Source: Alibaba Alibaba released: Qwen3-Coder-Flash: 30B parameter model with a massive 1M token context window. Qwen3-Coder-480B-A35B-Instruct. Qwen3-2507 in two variants: Qwen3-Instruct-2507 and Qwen3-Thinking-2507. Capabilities: AI-powered analysis and generation across entire codebases in a single prompt. Significant improvements in instruction following, logical reasoning, text comprehension, mathematics, science, coding and tool usage. Positioning Alibaba competitively in the coding AI space. |
📈 More Market Updates
🇺🇸 Domino: Approximately 80% of North American phone orders now processed through AI assistant featuring localized speech patterns, powered by San Francisco-based Rime Labs text-to-speech technology.
🇺🇸 Meta: $65 billion allocated for AI initiatives in 2025, targeting nearly 1 gigawatt of computing capacity supported by over 1.3 million GPUs.
🇺🇸 Microsoft: Over $30 billion in capital expenditures planned for current quarter to scale cloud platform and expand AI-focused data center infrastructure.
🇺🇸 Amazon: More than $100 billion committed to AWS AI development, described by CEO Andy Jassy as "once-in-a-lifetime" market opportunity.
🇨🇦 Shopify: New commerce-focused features including checkout integration tools, global product search capabilities, and universal cart functionality for AI agents.
An AI scheduling assistant that lives up to the hype.
Skej is an AI scheduling assistant that works just like a human. You can CC Skej on any email, and watch it book all your meetings. Skej handles scheduling, rescheduling, and event reminders. Imagine life with a 24/7 assistant who responds so naturally, you’ll forget it’s AI.
I Put 6 Vibe Coding Apps to The Ultimate Test – The Results Shocked Me

My vibe-coded investment portfolio tracker
Can AI really replace developers? I put this question to the ultimate test by building the same investment portfolio tracker across 6 different AI coding platforms. What I discovered will change how you think about AI-powered development.
🎯 The Challenge: One App, Six Platforms
I tasked each platform with building a comprehensive investment portfolio tracker with these exact requirements:
The Benchmark App:
Real-time data integration for Bitcoin, Ethereum, Nasdaq stocks, global equities, and ETFs
Interactive dashboard with account overview, 12-month performance tracking
Dual visualization via line charts (portfolio value over time) and pie charts (asset allocation)
Transaction management system with buy/sell functionality
Cash account integration with automatic balance updates
Complete transaction history in chronological order
Mobile-responsive design with modern dark theme
Each platform got the identical prompt. No technical specifications. No second chances.
🏆 The Results That Shocked Me
1st Place: Caffeine AI (7.7/10) 🥇
The dark horse that came out of nowhere! This platform absolutely nailed the user experience with fast generation, solid reliability and mostly functional app after the initial prompt.

Pros
Simple and easy-to-use interface
Entire app runs on blockchain which makes it more secure
All-in-one (persistent storage, authentication, one-click deployment)
Cons
Alpha version, not yet released to general public
Lacks some features (code editing/export, file upload)
UI design quality a bit behind competitors
2nd Place: Firebase Studio (7.2/10) 🥈
Google's offering brought moderate results after the initial prompt. A few features were working but needed more prompts to fix the remaining issues.

Pros
For beginners and advanced users
Full IDE support with Gemini models
Great customization and available features
Cons
More complex for beginners
3rd Place: v0 (6.3/10) 🥉
With Vercel's version, I needed three attempts and manual bug fixing to run the preview. UI quality is high but needs more work to make it fully functional.

Pros
Easy to use
Good app quality
Rich features (code export, database integrations, deployment on Vercel)
Cons
Very simple code editor
Slow code generation
4th Place: Replit (6.0/10)
The coding giant delivered solid fundamentals but lacked the polish for production-ready apps. After the initial prompt, none of the features worked but the app dashboard looked great.

Pros
Ease of use
Supported features (file uploads, prompt enhancer, deployment, database)
Great UI design in apps
Cons
Very slow code generation
Vendor lock in to Replit
None of the prompted features worked
5th Place: Lovable (5.5/10)
Despite the charming name, this platform struggled with prompt alignment and complex feature integration. Gave the simplest app version of the tested platforms.

Pros
Easy to use and has all the necessary features
Cons
Don’t allow code edit or export in free version
Slow code generation and low app quality
Apps are always public in free mode
6th Place: Bolt.new (FAIL)
The platform that made me almost rage quit. Complete breakdown on core functionality - couldn't spin up the app.

No rating. Consumed all the tokens for the day without a visual result.
📊 The Score Breakdown
Here's how each platform performed across critical developer metrics:
Metric | ||||||
---|---|---|---|---|---|---|
Ease of Use | 8 | 6 | 7 | ❌ | 7 | 9 |
Speed of Generation | 4 | 7 | 5 | ❌ | 5 | 9 |
App Quality | 6 | 6 | 5 | ❌ | 7 | 7 |
Supported Features | 7 | 9 | 7 | ❌ | 7 | 6 |
Customization/Flexibility | 5 | 8 | 4 | ❌ | 6 | 6 |
Scalability/Reliability | 6 | 7 | 5 | ❌ | 6 | 9 |
Total Average | 6.0 | 7.2 | 5.5 | FAIL | 6.3 | 7.7 |
💡 Key Takeaways
🔹 Features: Most platforms offer similar features, with Firebase Studio allowing editing the code in a web IDE.
🔹 The Reliability Gap: Security is nonexistent in most of the apps. Upgrading after its live is difficult without losing user data. Caffeine AI is a promising counterpart due to its decentralized architecture, integrated authentication system, and data persistence.
🔹 Great for Rapid Prototyping and MVPs: AI coding tools are rapidly evolving, but they’re not ready to replace seasoned developers. However, they’re powerful for rapid prototyping and MVP development.
Tool Spotlight
A 20B MMDiT image foundation model that delivers significant advances in complex text rendering and precise image editing, especially strong at creating stunning graphic posters with native text.
A general-purpose world model that can generate an unprecedented diversity of interactive environments from text prompts, creating dynamic worlds navigable in real time at 24 frames per second at 720p resolution.
ElevenLabs' latest AI Audio innovation featuring a studio-grade AI engine built from high-quality stems, delivering 44.1kHz audio that can closely follow lyrics, key, and BPM across complex, multi-layered prompts.
Google's asynchronous, agent-based coding tool powered by Gemini 2.5 Pro that integrates with GitHub, clones codebases into Google Cloud virtual machines, and is now officially out of beta after thousands of developers completed tens of thousands of tasks.
MiniMax's next-generation speech generation model featuring significantly enhanced multilingual performance, more realistic voice cloning accuracy, and expanded language support to over 40 languages.
Community Highlights
Happy to be there 🔥
— Tobias Wupperfeld (@tobiaswup)
1:41 AM • Aug 5, 2025
Here's my current vibe coding debugging framework.
When AI gives you code that doesn't work... again and again and your vibe coding is failing, you're missing one or more of these 7 things:
— Tobias Wupperfeld (@tobiaswup)
1:41 PM • Aug 10, 2025
Made this flappy bird clone today with one single prompt.
First version using a Chinese model (GLM-4.5 by Z .ai).
Second version using the new mighty GPT-5 by @OpenAI .
Same prompt, different outputs.Which version do you like more?
— Tobias Wupperfeld (@tobiaswup)
12:16 AM • Aug 9, 2025
More Resources
Blog: In-depth articles on AI workflows and practical strategies for growth
AI Tool Collection: Discover and compare validated AI solutions
Consultancy: Explore AI potential or make your team AI-fit
Agency: Production-ready AI implementation services
See you next time!
Tobias