Multi-Agent Testing Is Here

Two weeks ago, I introduced you guys to Oswald, a voice agent I built for a coffee shop problem I kept hearing about across the Bay Area. He started as a single system prompt agent, one set of instructions trying to handle everything at once.
This week Oswald got smarter.
I rebuilt him inside ElevenLabs as a full multi-agent workflow. Instead of one prompt doing everything, Oswald now has dedicated agents for each part of the conversation. One for hours and location, one for the menu, one for catering, and one for events. Oswald has gone from being built on a single-prompt agent to a composable, multi-agent workflow that is ready for production!
Oswalds new brain inside of ElevenLabs
Once the workflow was built I pulled it straight into Bluejay, and this is where things got really interesting.
All I had to do was copy and paste my ElevenLabs API key into Bluejay. That was it. Bluejay instantly synced my agents, pulled in the entire workflow, and grabbed every sub-agent system prompt automatically. All I had to do from there was give it a name and jump straight into simulation. No manual setup. No rebuilding anything. Just paste, sync, and go.
Bluejay mapped out every single path through Oswald's workflow and automatically generated digital humans to test each one with every transfer condition and every routing scenario.
Once I was inside Bluejay I headed straight to the simulations tab and clicked Generate Simulation. From there I selected the option to generate from workflow, gave it a name, Oswald Simulation, and just like that Bluejay created digital humans and mapped out every single path through Oswald's workflow, every transfer condition and every routing scenario, all automatically.
When failures came up Bluejay AI told me exactly which node was breaking and why. I didn't have to dig through logs or figure it out myself. I simply typed into Bluejay AI:
"Make changes to fix the failing nodes."
That was it. Bluejay AI went through every single failure, made the changes, and handed me the fix. Once I accepted it Bluejay pushed the updated workflow straight back to ElevenLabs. The best part here was I never had to leave the platform.
Oswald went from a single prompt agent to a production ready multi-agent system, tested and deployed, all in one place. And what makes this really exciting is that this is just the beginning!
Multi-agent workflow testing is now live inside Bluejay starting with ElevenLabs, and every other provider on the platform is coming soon. No more manually checking transfer conditions. No more guessing which node broke. Bluejay handles all of it, and it is only going to keep growing from here.
As a recap:
- Bluejay is a testing and monitoring platform for Conversational AI agents. Companies ranging from Fortune 10 enterprises to fast-growth startups in the Silicon Valley use Bluejay to make sure their voice and text agents work in production (monitoring) and development (testing) environments.
- Our team, now seven strong, works around the clock to make sure your agent behaves when talking to customers.
- This newsletter is 100% human written. It always has been, and it always will be. Ask yourself about what you are consuming. If the writer hasn't read it, why should you?

Announcements
Heres what happened at Bluejay last week:
- The latest Skywatch episode just dropped featuring HappyRobot!
- The team pushed 105456 lines of code this week to make Conversational AI more reliable!
- We signed a lease on our new office, an entire floor in SOMA / Yerba Buena! More on this soon.
Skywatch Podcast Episode
Building the AWS of AI Work: HappyRobot CEO Pablo Palafox on Deploying AI Agents Across Enterprise
This week Rohan sat down with Pablo Palafox, co-founder and CEO of HappyRobot, to talk about what it actually takes to deploy voice agents at enterprise scale.
From starting in logistics to going horizontal across industries, Pablo shares the real story behind building one of the fastest growing agentic infrastructure companies out there.
The full episode available now on YouTube and Spotify now. 😄
Feature Spotlight: Multi-Agent Workflow Testing
If you are building voice agents seriously, you are probably already thinking beyond a single system prompt. Multi-agent workflows are how real production agents get built, where each stage of a conversation is handled by a dedicated agent that passes things forward when it is done.
The hard part has always been testing them!
How do you know every transfer condition works? How do you catch what breaks before a real customer does?
Bluejay now supports full multi-agent workflow testing, starting with ElevenLabs. Connect your API key and Bluejay pulls your entire workflow in. From there you can run simulations that automatically test every unique path, use Bluejay AI to fix the nodes that are failing, and push your updated workflow back to ElevenLabs with one click.
ElevenLabs workflow appearing inside Bluejay
Bluejay pushing the updated workflow back to ElevenLabs 😮
ElevenLabs is just the first provider. Support for all other providers are coming soon!
That's all for now. I'll see you next time!
Azfar Khan
Storyteller @ Bluejay
