warming up the orb
0%
MingLLM
Private beta On-device by default Apple Silicon native
Scroll
New paper
Against Scale. 4B model, single laptop, $0 cloud — matches frontier browser agents.
Read paper
The Future

An agent for every surface.

Voice on your desktop. A browser that browses for you. A coder that ships. One intelligence, three surfaces, running on your hardware.

Meet the family
01
Product 01 — Voice
Jarvis
A voice-first macOS assistant that actually acts. Talk to your machine; it drives your apps, reads your data, writes your code, and finishes the work. Current baseline: jarvis:saturday-4b. Next target: Gemma 4 27B MoE.
Get Jarvis
02
Product 02 — Web
Tensor
A Chrome extension that doesn't just read your tabs — it uses them. Install it once, open a side panel, tell it what you want done. It opens the pages, fills the forms, reads across tabs, and finishes the flow.
Add Tensor to Chrome
03
Research
Open research
Three papers on what it takes to build a local agent that actually works — memory, surfaces, and the loop that trains it on your laptop.
Paper I · Memory
The Orb Knows
MingLLM Research · 2026
A persistent, self-consolidating memory system for local agents. Writes while the agent acts; compresses and links on idle — the way sleep consolidates a day.
Paper II · Architecture
One Model, Many Surfaces
MingLLM Research · 2026
A single base model drives voice (Jarvis), web (Tensor), and code (Tensor Code). One mind, three hands — specialization by small adapters, not separate models.
Paper III · Position
Against Scale
MingLLM Research · 2026
A position paper. On bounded agent tasks, expert iteration on a 4B model — run on one laptop over a weekend — matches frontier cloud models. The moat is trajectories, not parameters.
04
Results
The numbers
?
End-to-end browser task pass rate. Pre-training base Gemma-4 achieved 15.3%. After iterative self-improvement, jarvis-gemma-v2-FINAL reached 94.9% on 59 real-world tasks.
0%
E2E Task Pass Rate
(was 15%)
?
Performance on WebArena-lite benchmark (35 tasks). Measures ability to navigate and complete multi-step web tasks in a controlled environment.
0%
WebArena-lite
?
Best reinforcement learning cycle (cycle 3) training pass rate. Demonstrates consistent improvement through expert iteration loops.
0%
Best RL Cycle Train
?
Held-out test set pass rate (10 tasks never seen during training). Shows the model generalizes beyond its training distribution.
0%
Held-out Pass
?
Selective risk at α=0.05 significance level from conformal prediction. Means: when the model does answer, it's wrong only 6.53% of the time.
0%
Selective Risk (α=0.05)
?
Total parameter count of the base model (Gemma-4 4B). Small enough to run on consumer Apple Silicon hardware with MLX framework.
0B
Parameters — Fits on a Mac
Model Size vs Performance
GPT-4 (1.8T)
96%
Cloud
Claude 3.5 (175B)
93%
Cloud
Jarvis ★ LOCAL
95%
4B Params
Llama 3.1 8B
72%
8B
Gemma 4 Base
15%
4B
End-to-end browser task pass rate · Lower parameter count = runs on your Mac
How it works
Hear. Plan. Act. Remember.
Every request you make runs the same four steps. The loop closes back on itself — each action sharpens the next.
01
Hear
Voice · text · screen
Jarvis captures what you said — or what's on your screen. Transcription and intent parsing happen on-device.
02
Plan
saturday-4b · router
The base model decomposes the request and the router picks the right surface: Jarvis, Tensor, or Tensor Code.
03
Act
Voice · web · code
The chosen surface executes — drives apps, clicks through the browser, edits and runs code. You see the work as it happens.
04
Remember
The orb · consolidates on idle
Events become episodes; episodes become facts. The next request starts with everything the last one learned.
MEMORY FEEDS THE NEXT LOOP 01 02 03 04 HEAR PLAN ACT REMEMBER
06
FAQ
Questions, answered
What MingLLM is, what ships, and what's next.
MingLLM builds local AI agents that run on your own hardware. Three products share one intelligence: Jarvis (voice-first macOS assistant), Tensor (Chrome extension that uses the web for you), and Tensor Code (a coding companion CLI).
A Chrome extension. You install it once, open a side panel, tell it what you need done on the web. It reads, clicks, fills, synthesizes — multi-tab flows without babysitting. Your browsing stays on your machine.
Jarvis: Apple Silicon (M1+), macOS 14+, 16GB RAM. Tensor: any Chromium-based browser (Chrome, Edge, Brave, Arc). Tensor Code runs anywhere a terminal runs — your keys, your models.
Five stages: (1) Interview — the agent asks you about a new capability, (2) Synth — it generates candidate training data, (3) Critic — a stronger model filters for quality, (4) MLX — a LoRA is trained on-device, (5) Hot-swap — the new adapter loads without a restart. Think of it as an orb that keeps getting sharper the more you use it.
jarvis:saturday-4b — declared the production baseline 2026-04-12. It doesn't get replaced unless the challenger beats it on real agentic flows. The next training target is Gemma 4 27B MoE.
On-device by default. Jarvis memory is a local SQLite database. Tensor keeps your browsing on your machine. Cloud calls are opt-in and visible — every outbound request is logged in the side panel.
A framework that gives the model a mathematically grounded "I don't know." Sample multiple drafts, measure agreement, abstain when uncertain — with calibrated coverage guarantees. Pairs naturally with the agent loop so Jarvis stops before doing something it isn't confident about.
Private beta now. Drop your email below. Access rolls out in waves — developers, researchers, and design engineers go first.
07
Voices
Early reactions
Notes from private-beta users. Anonymized.
Told Jarvis "clear my inbox, draft replies to the important ones, book the dentist." Came back to coffee. It had done it all — replies ready for review, appointment on the calendar.
Beta user — Research engineer
Tensor finished an entire vendor-onboarding flow — 23 pages of forms, dropdowns, file uploads — without a single error. Never left my machine. It doesn't feel like an extension anymore, it feels like the browser itself is smart.
Beta user — Ops lead
A 4B model that hits 95% on real browser tasks, trained on a laptop, running on a laptop. The expert-iteration loop is the first thing I've seen that actually compounds.
Beta user — ML researcher
Tensor Code is a claude-code-shaped thing that's mine. It has my context, my keys, my rules, and it ships without asking for permission every three seconds.
Beta user — Design engineer
08
Compare
The family
Three agents. One intelligence. Pick the surface you need.
JarvisTensorTensor Code
SurfaceVoice · macOSChrome extensionCLI / terminal
Voice input
Browser automationvia Tensor
Code authoringvia Tensor Code
Terminal execution
Calendar / Mail / Notes
Background tasks
Self-training loop
Persistent memory
Data leaves deviceOpt-inOpt-inOpt-in
Where it runsApple SiliconAny Chromium browserAny Unix shell
Modelsaturday-4bLocal + remoteYour choice
StatusPrivate betaPrivate betaBeta
09
Milestones
Roadmap
From research to production — our path to building local AI that matters.
MLX Framework
Gemma 4B
Apple Silicon
Conformal Prediction
Expert Iteration
WebKit Automation
SwiftUI
Chrome Extension API
MLX Framework
Gemma 4B
Apple Silicon
Conformal Prediction
Expert Iteration
WebKit Automation
SwiftUI
Chrome Extension API
Q1 2026 — Shipped
Foundation research
Conformal selective generation (CSG) and agentic fine-tuning research. Jarvis goes from 15% → 95% on end-to-end browser tasks on a single fine-tuned 4B model, published the training pipeline.
Shipped
Q2 2026 — Now
Jarvis + Tensor in private beta
Voice-first macOS agent (Jarvis) and Chrome extension (Tensor) both in private beta. Self-training loop live on Jarvis. Tensor Code CLI shipped for developers.
Beta
Q3 2026 — Next
Public launch · bigger brain
Jarvis and Tensor open to anyone. Jarvis upgrades to a larger MoE base for harder multi-step tasks. Tensor gets a research mode that spans 20+ tabs.
Upcoming
Q4 2026 — Horizon
Unified agent fabric
One intelligence across voice, browser, and terminal. Shared context, shared memory, shared tools. Optional Mingeta for Ray-Ban Display — suggestions routed through Telegram to bypass Meta's HUD SDK gate.
Vision
Flexibility
Bring your own model.
Gemma, Llama, Qwen, Phi, Mistral — anything that runs in MLX. Swap the base, the fine-tune, even the provider. Your hardware, your models, your rules.
Careers
Join Us
We're building the future of local AI. Small team, massive ambition. If you're passionate about making AI run everywhere, we want to hear from you.
Apply Now See open roles
Waitlist
Early access.
Jarvis, Tensor, Tensor Code. No spam — we write when we ship.
Get Jarvis Now
Available Soon on macOS
Join Waitlist
Jarvis · voice

A voice on your desk that actually does things.

You talk. It listens, thinks, and drives the apps on your machine. No wake-word theater. No "I can't help with that." It finishes the job.

What it sounds like

"clear my inbox and draft replies to the important ones"
"book the dentist for next Tuesday afternoon"
"find the cheapest flight to Tokyo next Friday"
"kill the dev server, I want to rebase"

What it can reach

Calendar
Your day, rearranged by asking.
Mail
Reads, triages, drafts in your voice.
Files
Finds, opens, edits — without you clicking.
Browser
Hands the task to Tensor and comes back.
Terminal
Runs shell, ships code via Tensor Code.
Background
Long jobs happen. You go live your life.

How it gets sharper

Jarvis watches how you work. When it notices a gap, it asks. What it learns becomes part of it — no restart, no cloud round-trip.

Runs where

Apple Silicon. On-device by default. The cloud is optional, and when it's on, you see it.

Tensor · web

A browser that browses for you.

A Chrome extension you install once. Ask it to do the thing — find it, click through it, fill the form, read the tabs, finish the flow. You don't watch. You check back.

What to hand it

"fill this onboarding flow with my info"
"research the top 3 CRMs under $50/seat, summarize"
"apply to every role on this careers page that matches my resume"
"book whichever flight is cheapest before 6pm"

Why it works

Zero-shot forms
Any site. No per-site config.
Multi-tab research
Reads across tabs, synthesizes, cites.
Side panel
Watches alongside; never blocks you.
Local
Your browsing stays on your machine.

Install where

Chrome. Edge. Brave. Arc. Anything Chromium. One extension, a plasma dot in your toolbar.

Tensor Code · terminal

A coder in your terminal that actually ships.

You type what you want. It reads the repo, plans, edits, runs the tests, fixes what it breaks, and tells you what it did. Proactive by default — it doesn't idle.

What to ask

"fix the failing tests"
"wire a Stripe checkout into the /pricing page"
"migrate this to React 19, keep everything green"
"the staging build is broken — look at the error, fix it, push"

Proactive by default

After the first instruction, it keeps moving. Reads, tries, fails, fixes, ships. You steer when you want to; you don't have to babysit.

Gets to know your repo

Every session adds to a local memory of how your codebase works — patterns it's seen, conventions it's respected, what the tests actually check. The next session starts where this one left off.

Install where

Anywhere a terminal runs. macOS, Linux, WSL. Your keys, your models, your rules.