Why Jarvis does not have a chat window — MingLLM Blog

A chat UI is the wrong default for a voice-first agent. We went looking for a better one.

The first prototype of Jarvis had a chat window. It was a mistake.

We spent six weeks watching beta users and kept seeing the same pattern: the user would say something to Jarvis, then look at the window, then read the response, then type a follow-up, then look at the window, then ... wait, why were they typing?

A chat window is a surface that rewards pecking. When you have one open, your hands go to the keyboard. Voice input is now an option, not a default. You have reconstructed the assistant-as-chatbot in exactly the posture we were trying to escape.

So we removed the window.

What replaced it

Jarvis now uses three overlays and no window.

1. A transient caption — like a subtitle overlay — that appears for ~1.8 seconds when Jarvis speaks.

2. The Orb, a small always-visible status indicator in the menu bar that pulses gently while Jarvis is listening or thinking.

3. A receipt drawer — a collapsed list of actions Jarvis has taken — that slides in only when you ask for it.

That is the whole UI. No chat history. No conversation list. No typing affordance by default.

What we learned

Removing the window forced us to answer questions we had been avoiding. "How does the user correct a misheard command?" turned out to have a much better answer than "scroll up and retype." (Answer: Jarvis volunteers the interpretation it is about to act on, and the user can say "no, I meant ..." during a 2-second grace period.)

"How does the user see what Jarvis did?" stopped being "reread the chat" and became "open the receipt drawer." The receipt drawer is more scannable than a chat log because it is organized by action, not by turn.

The voice-first principle is not just about input. It is about which posture the product rewards, which body it expects of you. Jarvis expects the posture of someone looking at the world, not at a screen.