Is Acedly running its own model, or is it using the same foundation models I'd pay for directly?

The same foundation models. Acedly routes between GPT, Claude, Gemini, DeepSeek, and Qwen at the frontier tier based on the question type detected from the transcript. The intelligence comes from the model vendors; Acedly's value is the wiring around them — sub-200ms latency, OS-level audio capture, screen-share exclusion, persistent grounding, and per-turn routing — that a chat UI structurally can't reach.

Why can't I just open ChatGPT or Claude in another window during the interview?

Three structural constraints. First, the chat UI is a normal application window, so it shows up in screen-share, the dock, Alt-Tab, and the cursor list. Second, you have to type the question to prompt it, which adds about 6 seconds of round-trip on every turn — past the 250ms threshold where conversation cadence breaks. Third, it doesn't capture the interviewer's audio, so you're transcribing the question yourself, eyes on the keyboard rather than on the call.

Is the model output meaningfully different between Acedly and a raw chat?

For the same model on the same prompt at the same temperature, no — it's the same output. The difference is the *prompt*, the *grounding*, and the *latency budget*. Acedly's system prompt is tuned to the round type, the grounding includes your résumé and JD without re-pasting, and the latency budget is one round trip rather than five. The model is the same; the question it answers is sharper.

What about latency on Acedly itself — isn't it constrained by the model vendor's API too?

Yes, and that's why the per-turn routing matters. The model with the lowest first-token latency on a coding question in May 2026 is not the same model that has the lowest latency on a behavioural question. Acedly's routing prefers the model with the best latency-times-quality tradeoff for the detected round type, refreshed as vendor performance shifts. The median end-to-end across all rounds in our production telemetry is ~98ms; the 95th percentile is under 200ms.

Does the interviewer see anything that would suggest I'm using AI?

Not from screen sharing — Acedly is excluded from window-capture APIs at the OS level. The risk is the same risk that exists with any prep aid: if you read model output verbatim, the cadence and the substance will not match how you sound when answering on your own. The product is built to discourage this — short responses on the overlay, latency low enough to keep you in the flow of the call, grounding in your own résumé so the answer sounds like you.

Can I just use a foundation-model chat for prep and Acedly for the live round?

Yes, and this is the workflow most of our power users settle into. A frontier chat is the strongest tool for pre-interview prep — rehearsing behavioural stories, thinking through system-design problems, working through a take-home. Acedly is the right tool for the live, on-the-clock round. The two are not in competition; they cover different stages of the same workflow.

What if a foundation model's API is down during my interview?

Multi-model routing is also a reliability layer. If GPT's API is degraded, the router falls through to Claude or Gemini for the next turn; if all the cloud vendors are degraded, Acedly maintains a local fallback for transcript and structure that keeps the overlay usable for a recall-and-structure assist even without inference. The honest claim is not that we never have outages — every model vendor has outages — but that no single vendor's outage takes the product down.

Comparison13 min read

Acedly AI vs Foundation Models (GPT, Claude, Gemini): Why a Specialized Interview AI Beats a Raw LLM

Why a real-time interview copilot beats a raw foundation model — latency, OS-level audio capture, screen-share exclusion, résumé grounding, multi-model routing, and the interview-specific prompting that a generic chat window can't reproduce.

Acedly AI

Editorial Team

Published May 15, 2026

The honest case for using a raw foundation model

Most candidates considering this question are already paying $20 a month for a frontier chat assistant they trust on every other task in their life. The case for not adding a second subscription is real and worth stating clearly.

The foundation models — GPT-5, Claude 4.7, Gemini 2.5, the DeepSeek and Qwen frontier releases — are smarter at the median question than any wrapper sitting on top of them. They write better code than they did a year ago, they reason about system design with fewer obvious gaps, and they have larger context windows than they had at this time last year. For substance, they are the strongest tools in the room.

A raw chat window also has zero coordination cost. You already know the keyboard shortcut to open it, you already trust how it phrases things, and you already know its failure modes. Adding a new product to your interview-day surface area is friction; the question is whether the friction is worth it.

For some rounds — and we'll name them at the end of this page — the honest answer is no. The chat window is enough.

Where a raw foundation model breaks down in a live interview

For the rounds where the answer is yes, the failure modes are mechanical, not philosophical. Foundation models lose to a specialized interview AI on five specific constraints that are hard to fix from outside the chat UI.

1. End-to-end latency from question to first token

Interviews happen at human conversation speed. The natural pause between an interviewer finishing their question and a candidate beginning to answer is about 250 milliseconds. Past that, the silence becomes audible and the candidate visibly falls behind.

A raw foundation-model chat workflow looks like this in steady state:

Interviewer finishes question. (t = 0ms)
Candidate Cmd-Tab to the chat window. (~400ms including human reaction)
Candidate types or pastes the question. Typing is the slow path; even fast typists take ~3 seconds for a 15-word question. (t = 3,500ms)
Model thinks. Frontier-model time-to-first-token on a short prompt is ~600–1,200ms depending on the day. (t = 4,500ms)
Candidate reads the first sentence of the answer, paraphrases it, and starts speaking. (t = 6,500ms)

The 6.5-second total budget is roughly 25× the conversational threshold. The interviewer has long since noticed.

Acedly's path collapses this to a single round trip:

Interviewer finishes question. (t = 0ms)
Audio transcription happens in real time during the question; end-of-utterance detection fires the model at the moment of the question's natural pause. (t = +30ms speech-to-text overhead)
Model returns the first answer token. Median end-to-end on Acedly is ~98ms; the 95th percentile is under 200ms. (t = ~130ms)
Candidate reads the first line and starts speaking. (t = ~600ms total)

The difference is not a percentage. It's an order of magnitude.

This is the constraint that's hardest to fix from a chat window. Every major foundation-model UI ships as a normal application window — visible in the macOS dock, visible in the Windows taskbar, visible in Alt-Tab and Cmd-Tab, and crucially, visible when the candidate shares their screen.

For technical rounds where the recruiter asks the candidate to share their entire screen — common at Meta, Google, and most coding panels — having a foundation-model chat window open is the same as having the answer written on a sticky note attached to the candidate's monitor. The recruiter sees it the moment the share starts.

Workarounds exist (run the chat on a separate device, share only a single window, hide the chat window behind the IDE) but each adds coordination tax and each has a failure mode where the chat surfaces accidentally — a notification, an Alt-Tab miscue, the cursor drifting onto the wrong monitor.

Acedly's overlay is excluded from window-capture APIs at the OS level: NSWindowSharingNone on macOS, SetWindowDisplayAffinity(WDA_EXCLUDEFROMCAPTURE) on Windows. The overlay is not in the dock, not in the taskbar, not in Alt-Tab, not in Activity Monitor under a recognisable brand, and not in any window-capture frame buffer the meeting client could possibly send. It is structurally invisible, not just visually small.

3. Audio capture and turn detection

The interviewer asks the question. In a raw foundation-model workflow, the candidate has to type the question into the chat to get an answer. Voice-to-text inside the chat UI exists for some vendors but is single-speaker — it captures the candidate's microphone, not the interviewer's audio through the meeting client.

Acedly subscribes to system audio at the OS level, captures the loopback audio that includes the interviewer's voice through the meeting client, and runs streaming speech-to-text with end-of-utterance detection so the model fires at the moment the question is actually complete. The candidate types nothing.

The downstream effect is significant: the candidate's hands are free during the question, so they can take notes, scroll through their own résumé on a second monitor, or simply maintain eye contact with the interviewer. The hands-free property is what makes the workflow not look like a candidate using a tool.

4. Grounding in the candidate's own résumé, JD, and knowledge base

A foundation-model chat that hasn't been primed will produce generic answers to behavioural questions. "Tell me about a time you led a difficult project" returns a smooth, content-empty STAR story that mentions no specific technology, no real team, no actual numbers. The follow-up question — which every credible interviewer asks — exposes the genericness immediately.

You can prime a chat by pasting your résumé and the JD into the conversation before the interview starts. This works, but every new conversation requires re-priming, and most candidates underestimate how much context drift the model accumulates across a 45-minute round. By question six, the chat has forgotten which company you applied to.

Acedly's grounding is persistent and structural. Your résumé, the JD, and any knowledge-base documents you've uploaded are part of the system context for every model call, refreshed at each turn. When the recruiter asks a behavioural question, the copilot surfaces your specific project from your résumé, in your voice. The grounding is what makes the answer defensible in the follow-up.

5. Multi-model routing

A coding round wants a model that's good at reasoning under tight constraints at low latency. A behavioural round wants a model that's good at structure and brevity. A system-design round wants a model that holds a long context window and produces a tree of trade-offs. A case interview wants a model that's good at structured reasoning under ambiguity.

No single foundation-model chat does all of these well. Locking into one — even the strongest — means accepting that some rounds get the wrong model. The performance gap between the right and wrong model on a specific round can be larger than the gap between the strongest and weakest frontier models on the average task.

Acedly routes between GPT, Claude, Gemini, DeepSeek, and Qwen based on the question type detected from the transcript. You don't pick the model; the system picks per turn. The user-visible effect is that the model never feels mismatched to the round.

Side-by-side on the constraints that actually matter

Acedly vs raw foundation-model chat (GPT, Claude, Gemini, DeepSeek)

Feature	Acedly	Raw foundation-model chat
Median end-to-end latency	~98ms	~6,500ms (type-the-question path)
Hidden from screen sharing	Yes — OS-level capture exclusion	No — normal window, visible on share
Hands-free during the question	Yes — audio capture at OS level	No — type or paste to prompt
Grounded in résumé and JD by default	Yes, persistent across turns	Only if you re-prime each conversation
Multi-model routing	Auto, per question type	Single model, manual switch
Coding-sandbox screen reading	Reads Coderpad / HackerRank / LeetCode	Manual copy-paste from the editor
Pricing surface	Flat plan, $69 / month or one-time	Per-vendor subscription stack
Setup time before a round	Open, go	Re-paste résumé and JD, reset context

The latency column is the most important and the most under-reported in the discourse. A foundation-model chat can produce a stronger answer than a wrapper if you give it enough time; the workflow simply does not give you enough time inside a live interview.

When a raw foundation-model chat is actually the better choice

There are three cases where we recommend skipping the specialized tool and using a chat window directly.

Pre-interview prep, not the round itself. Before the interview, when you're rehearsing a behavioural story or thinking through a system-design approach, the latency tax doesn't exist and the screen-share constraint doesn't apply. A frontier-model chat is genuinely the strongest tool for this work — its raw reasoning is at its sharpest when you have time to iterate.

Async screening (HireVue and similar). These are recorded, asynchronous video rounds where you have prep time before each prompt. A real-time copilot adds no value in this format; rehearsal with a frontier-model chat does. See our AI interview pillar for the full async preparation guide.

Long-form take-home assignments. A take-home is a multi-hour piece of work where the model's raw reasoning matters more than per-turn latency. Sit with a chat window, work through the problem deliberately, ship your own implementation. The same chat is also useful afterwards as a code-review pass on your submission.

For the live, on-the-clock round with a real recruiter on the other end, the specialized tool is in a different category. For everything else, the chat window you already pay for is fine.