Top AI Models Fail Enterprise Tests!

In partnership with

Welcome back, AI enthusiast.

IBM just released a benchmark where every frontier AI model scores below 50% on enterprise IT tasks.

Not one model cracked the halfway mark on real Kubernetes incident diagnosis. Is your "production-ready" AI actually ready for production?

Today in the “Incredible AI” world:

Frontier AI Models Struggle With ITBench
YouTube Just Unveiled Custom AI Feed
Perplexity Quietly Solved AI Bottlenecks
Make Your Voice AI Sound More Human
5 Super-Useful AI Tools You Should Try
Top AI-Generated Image of the Day

Read Time: 4 minutes

- LATEST DEVELOPMENTS -

Frontier AI Models Scores Below 50% on IT Tasks:

The smartest AI models on the planet just got a reality check. IBM and Artificial Analysis built a benchmark for real enterprise IT work, and every single frontier model failed to cross the halfway mark.

Things to Know:

The Benchmark Explained: IBM and Artificial Analysis launched ITBench-AA to test AI agents on real Site Reliability Engineering tasks. Models must diagnose live Kubernetes incidents by reading logs, tracing dependencies, and finding root causes.
The Scoreboard: Claude Opus 4.7 leads the pack at just 47%, with GPT-5.5 close behind at 46%. No frontier model broke 50%, making this one of the least cracked benchmarks out there right now.
More Turns, Worse Results: Gemini 3.1 Pro Preview took 83 turns per task and still only scored 30%. Gemma 4 31B finished in 58 turns and beat it at 37%. Overthinking in AI, turns out, is just as costly as overthinking in people.
The Cost Angle: Open-weight models are punching above their price tag here. Gemma 4 31B scores 37% at just $0.14 per task, while the top-scorer Claude Opus 4.7 costs $5.38 per task. The gap in performance barely justifies that price gap.

Enterprise AI is not as ready as the hype suggests. Real-world IT work is messy, multi-step, and unforgiving, and right now no model handles it with confidence. The race to 50% just got a lot more interesting.

YouTube’s AI Now Builds Your Feed for You:

YouTube just handed the remote control back to you, sort of. Instead of guessing what the algorithm wants, you can now just tell YouTube what you want. Wild concept, honestly.

Things to Know:

How It Works: You type a prompt on the Home tab, and YouTube generates a custom feed around it. You can describe a mood, a topic, or just ask for something new beyond your usual mix.
Where to Find It: A new chip labeled "Your Custom Feed" sits right next to the Home button at the top of the page. Tap it, type, and your personalized feed appears ready to scroll.
Who Gets It: The feature is rolling out now for signed-in U.S. viewers on both mobile and desktop. Your search and watch history must be enabled in account settings for the chip to show up.
Save and Edit Anytime: You can pin your custom feed as a saved chip and tweak the prompt whenever the vibe shifts. No need to rebuild from scratch every time something feels off.

The algorithm has been making choices for you since 2005, so this is a pretty big shift. Whether YouTube's AI actually nails your prompts is the real test. For now, it is worth trying just to see how well it knows you.

Talk to your AI tools the way you'd talk to a colleague.

You don't send a colleague a three-word brief. You explain the context, the constraints, what you've already tried. But typing all that into ChatGPT takes forever — so you don't.

Wispr Flow lets you speak your prompts instead. Talk through your thinking naturally and get clean, paste-ready text. No filler words. No cleanup. Just detailed prompts that actually get you useful answers on the first try.

Millions of users worldwide. Works system-wide on Mac, Windows, and iPhone.

Try Wispr Flow free

Perplexity Open-Sources a Faster Tokenizer:

Nobody talks about tokenization at parties, but it quietly slows down AI inference more than you think. Perplexity rebuilt theirs from scratch, cut CPU usage by 5x, and then just gave it away for free.

Things to Know:

The Problem: Small AI models like rerankers finish GPU work in single-digit milliseconds. CPU-side tokenization runs first on every input, making it the unexpected bottleneck nobody optimized.
The Fix: Perplexity rewrote the Unigram tokenizer in Rust using a double-array trie structure. It removed all heap allocations from the hot path, dropping latency from 349 µs to around 63 µs.
The Results: The new encoder is 5x faster than the Hugging Face tokenizers crate and 2x faster than SentencePiece in C++. In production, it cut CPU utilization by 5 to 6x and shaved double-digit milliseconds off reranker latency.
It Is Open Source: The code lives in Perplexity's pplx-garden repository on GitHub under an MIT license. Any team running embedding models or rerankers at scale can plug this in today.

This is one of those unglamorous infrastructure wins that actually matters at scale. Perplexity did the hard optimization work and then open-sourced it instead of sitting on it. That is a good look for the whole AI community.

- TRY THIS -

How to Add Emotional Intelligence to Voice AI:

Most voice AI feels robotic not because it sounds bad, but because it responds the same way regardless of how you are feeling. You could be frustrated, confused, or excited and it delivers the same flat, scripted reply every time. That disconnect is what makes AI voice experiences feel cold.

Hume AI is a developer toolkit that lets you build voice interfaces capable of detecting and responding to human emotions in real time. Its main product, EVI, is a speech-to-speech AI that reads emotional cues from your voice and adjusts its tone and responses accordingly.

Here's how to get started:

Sign up and open the playground: Create a free Hume account and head to the EVI playground. You can test the full experience without writing a single line of code first. Pick a character, start a call, and see how EVI responds to changes in your tone and mood in real time.
Set up your API key and choose your LLM: Once you are ready to build, grab your API key from the dashboard. EVI works with any major language model including Claude, GPT, Gemini, and Llama, so you are not locked into one provider. Swap models without changing your integration.
Install the SDK for your stack: Hume has SDKs for React, TypeScript, Python, Swift, and .NET. Install the package, connect your API key, and EVI is live inside your app. The first response typically streams back in around 300 milliseconds.
Design your voice and configure your agent: Choose from Hume's library of expressive voices, clone a voice from an audio sample, or describe the voice you want in plain language and let the system generate it. From there, define your agent's personality, instructions, and the tools it can access, like booking systems, order lookups, or knowledge bases.
Go live and access conversation data: Every conversation logs a full transcript with timestamps and emotion data attached. You can use this to understand how users are feeling during interactions and refine your agent's behavior over time.

If you are building a product that involves voice, whether that is a customer support agent, a learning app, or an AI companion, EVI adds a layer of emotional awareness that most voice APIs simply do not have.

👉 Try Hume AI from here.

The LA Mayor Odds Are Moving. June 2 Is the Deadline.

Bass at 68%, Pratt at 27% — and $21M already trading on the outcome. The best trades happen before the picture gets clear. Get in now and get $10 free to start.

Trade Before June 2

_{Trade responsibly.}

- BEST AI TOOLS -

Xembly: Automate meeting scheduling, note-taking, and task tracking with an AI executive assistant.

Creatopy AI: Generate ad banners and marketing visuals using AI-assisted design tools.

Verbatik: Convert text into natural-sounding voiceovers across multiple languages.

Sana Labs: Deliver personalized AI-powered learning experiences for teams and organizations.

SuperAI: Automate data labeling and model training pipelines with human-in-the-loop AI.

- DAILY POLL AND RESULT -

Today’s Poll:

Q) Is AI creating unrealistic productivity expectations?

Vote and find out the result tomorrow.

Yesterday’s Result:

Q) Should AI help plan marketing campaigns?

A) Yes, data optimization - 83% 👑
B) No, creativity diluted - 17%

The Best New App for Reading Newsletters:

Reading newsletters in the inbox is frustrating - it is noisy and easy to lose control of subscriptions.

That’s where Meco comes in as a savior.

Meco is a distraction-free app for reading newsletters outside the inbox. The app is packed with features designed to supercharge your learning from your favorite writers.

With Meco, you can enjoy your newsletters in an app built for reading, while giving your inbox space to breathe.

Add your newsletters in seconds and liberate your inbox today!

👉 Try the Meco App for FREE.

- AI IMAGE OF THE DAY -

The Rock Whale Serpent Hovers Over Cities:

Prompt:

"A colossal whale-serpent hybrid monster with a whale's head and a long serpent body and trunk, entire body made of green rocks crackling with red electricity, a long sweeping tail, floating high above a city skyline, dwarfing every building below, rainbow-colored liquid pooling on the ground, full body shot, ultra detailed illustration, ultra high definition."

Try this prompt in any decent AI image generator and let me know your result.

- DO YOURSELF A FAVOUR -

If this email landed in your Promotions or Spam tab, move it to your Primary inbox. Takes 2 seconds, and you will never miss an edition again.

I put this together every day so you do not have to spend your morning hunting down AI news yourself. Would be a shame if it ended up buried under discount codes and bank alerts.

Top AI Models Fail Enterprise Tests!

Welcome back, AI enthusiast.

Today in the “Incredible AI” world:

- LATEST DEVELOPMENTS -

Frontier AI Models Scores Below 50% on IT Tasks:

Things to Know:

YouTube’s AI Now Builds Your Feed for You:

Things to Know:

Talk to your AI tools the way you'd talk to a colleague.

Perplexity Open-Sources a Faster Tokenizer:

Things to Know:

- TRY THIS -

How to Add Emotional Intelligence to Voice AI:

Here's how to get started:

👉 Try Hume AI from here.

The LA Mayor Odds Are Moving. June 2 Is the Deadline.

- BEST AI TOOLS -

- DAILY POLL AND RESULT -

Today’s Poll:

Q) Is AI creating unrealistic productivity expectations?

Yesterday’s Result:

Q) Should AI help plan marketing campaigns?

The Best New App for Reading Newsletters:

👉 Try the Meco App for FREE.

- AI IMAGE OF THE DAY -

The Rock Whale Serpent Hovers Over Cities:

- DO YOURSELF A FAVOUR -

Keep Reading

INCREDIBLE AI