Built in a Vibe – A Multi-LLM App

I recently explored the emerging practice of vibe coding—the concept of turning ideas into running software using nothing but natural language. It’s a shift in mindset from traditional programming toward conversational building. Less “code every line,” more “describe what you want.”

To test this approach, I created an account on Firebase Studio, Google’s AI-powered development environment, and started with a single prompt:

An app that allows a query to be directed at 3 LLMs and shows answers from all 3 in one screen.”

This is something I had been thinking about for some time – that it would be nice to be able compare responses from different Large Language Models or LLMs in one place and evaluate the similarities and differences. And not taking one LLMs word for it. So vibe coding my was chance to “prototype” this.

That’s all! I entered the prompt and waited.

What came back in seconds was a complete scaffold of an app built end-to-end (frontend and backend) without writing or editing a single line of code. Then I refined my prompts, added/deleted features, changed configuration, and made many requests for refinement – simply via prompts. Some of these would be contradictory to the previous. AI did not squeak. It came back with a frontend and the backend components to meet all those demands.

The speed was surreal. The entire layout, flow, backend calls, configuration, and critique logic were shelled instantly. I could run and test everything in one go. It was like watching your idea materialize in real-time. It was pure vibe coding in action.

What AI Built

In a matter of minutes, Firebase Studio generated what it called ‘LLM Showdown’. It was a sleek and interactive app designed to showcase query responses from multiple AI models in one screen.

Here’s what was included in the generated app:

  • Prompt Input: A simple text field where users can enter any query (pre-selected questions for now)
  • Query 3 LLMs: By default, it was configured to send the prompt to Gemini, GPT-4o, and Claude but there was room to swap in other models.
  • Display Responses: Answers from the three LLMs shown side by side in a clean, unified layout.
  • Consensus Generator + Critique: An AI-generated summary and side-by-side critique of all three responses.
  • API Key Configuration Screen: It prompted me to add a Google API key (I used my Gemini key) to access AI-powered critique features.

Firebase also styled the UI with split-screen layout, clean typography, and subtle transitions. Stack: TypeScript + NextJS + Tailwind CSS. This wasn’t just code generation, it was a complete and interactive prototype. Layout, components, integrations, and styling were already wired up. There were no syntax bugs, no misaligned elements, no unlinked APIs.

Testing the App

Once the app was generated, I asked it to answer this question in real time:

“What is the biggest risk of using Agentic AI in healthcare? List what you think is the most urgent risk”

With just a single click, the app fetched responses from all three models.

Google Gemini, OpenAI GPT-4o, and Anthorpic Claude each gave their responses—displayed side-by-side .

Followed by a concise consensus answer, along with automated critiques of each response by Gemini as requested as an add on feature. The critique logic was generated itself by Gemini, explaining what each LLM got right or overlooked.

I prompted the app to generate a configuration tab, where user could input their favorite LLMs to be used in this app for response and critiques. It obliged and created the frontend and the backend needed to operationalize this feature.

What does not work well

While the speed of building mock and functional prototypes is impressive, turning the design cycle on its head, deployment and “productionization” still takes effort, ingenunity, and multiple cycles of debugging and testing. Debugging LLM wrappers for external LLM calls written by AI can be challenging and requires understanding and ability to read the generated code, its structure, and fix problems. In that regard, to make this a real app to be launched, S/W engineering knowledge and best practices are still a must have. That being said, vibe coding provides an unbelievable head start, compared to starting from scratch and writing every single line of code.

What the hype is about?

The most striking part of this experiment wasn’t just what the app could do, it was how little I had to do to get there. The entire build experience was reduced to natural language prompts with refinement, a few clicks, and changing simple configuration steps.

This is what vibe coding can unlock. It bypasses boilerplate, config files, routing headaches, and data-fetching quirks. Instead, it focuses your energy on flow and feedback. You describe what you want, preview it in minutes, and refine as needed.

After building this app I figured I search for such apps that already exist, which I was quite sure about. Indeed they do. As a quick comparison, platforms like Poe and ChatLLM offer multi-LLM output as a product, similar to the result of my experiment, but they’re more limited in customization and tend to be paywalled. Yes, they are not for free.

Sure, not every auto-generated app will be production-ready. But for idea validation, prototypes, and early testing, vibe coding feels like a superpower. What once weeks can now take a just a couple of hours and one cup of coffee.

Related Tools

Although I used Google’s Firebase Studio for this project, the broader vibe coding ecosystem has various tools including:

  • Replit
  • Cursor
  • Bolt
  • Supabase Studio (backend builder with minimal code)
  • Appsmith (drag-and-drop AI dashboards)
  • And more…

Not to mention all the major LLMs that can help with vibe coding on their own. All these platforms have their strengths and weaknesses and there are many good articles that compare them, so I will not get into it here.

Summary: From Idea to Prototype

This project reminded me of something simple yet powerful: sometimes, the best way to build is to just describe it.

With vibe coding, your input becomes the blueprint, the PRD. The app frames itself, taking you from “I have an idea” to “I have a prototype” in record time. That momentum is everything, especially for fast-moving teams, solo devs, or anyone who wants to validate a concept. And all of a sudden solo startups don’t seem that far fetched.

Is the result perfect? No. Is it polished for production? No. But it’s real, and it’s immediate and allows you to “touch and feel” your idea. Which is often exactly what you need when developing prototypes or Proofs of Concepts. Consider this – it took me about the same time to write this article that it took to build a prototype of this app.

So that’s the promise. I will end with an original quote:

“In the age of AI, genius might be 99% inspiration”