Local AI That Does Things
0%
MingLLM
Private Beta — Now Open 100% On-Device 95% Task Accuracy
Scroll
New Research
MingLLM publishes selective-generation research. 91.3% accuracy with calibrated abstention on GSM8K.
Read Paper
The Future
The Only Thing You'll Ever Need
Local AI that understands you, works for you, and improves itself — all running on your hardware.
Meet Jarvis
01
Product 01
Meet Jarvis
Iron Man's Jarvis, but real and running on your Mac. A voice-first AI assistant that controls your computer through natural conversation.
Download Jarvis
02
Product 02
Meet Tensor
Your screen, on autopilot. A Chrome extension that doesn't just read your screen — it USES it. Clicks buttons, fills forms, takes quizzes, navigates multi-step flows in milliseconds.
Try Tensor
03
Product 03
Meet Minghelper
Your Zoom copilot. Joins calls beside you, takes the notes you'd forget, answers questions in real time, and quietly executes every follow-up before the meeting ends.
Get Minghelper See a sample meeting
04
Research
Open Research
MingLLM publishes rigorous, open research on LLM reliability and agentic fine-tuning.
Agentic Benchmark
Jarvis: From 15% to 95% on Real-World Browser Tasks
MingLLM Research, 2026
We present jarvis:saturday-4b + 31B FINAL, a 4B parameter model fine-tuned with expert iteration that achieves 94.9% on end-to-end browser routing (up from 15.3%), 91.4% on WebArena-lite, and 78.7% acceptable on 80-test multi-turn assessments — all running locally on consumer Apple Silicon.
15% → 95%
E2E task pass rate improvement
Multi-Turn Agents
Comprehensive Evaluation of the Jarvis Agent
MingLLM Research, 2026
Full evaluation across 4 benchmarks: 69-test routing (29.0% pre-train), 59-test E2E routing (15.3% → 94.9%), WebArena-lite (91.4%), and 80-test multi-turn (78.7% acceptable / 56.2% strict-pass). Achieves 100% on held-out test set with zero cloud dependency.
91.4%
WebArena-lite accuracy
05
Results
The Numbers
?
End-to-end browser task pass rate. Pre-training base Gemma-4 achieved 15.3%. After iterative self-improvement, jarvis-gemma-v2-FINAL reached 94.9% on 59 real-world tasks.
0%
E2E Task Pass Rate
(was 15%)
?
Performance on WebArena-lite benchmark (35 tasks). Measures ability to navigate and complete multi-step web tasks in a controlled environment.
0%
WebArena-lite
?
Best reinforcement learning cycle (cycle 3) training pass rate. Demonstrates consistent improvement through expert iteration loops.
0%
Best RL Cycle Train
?
Held-out test set pass rate (10 tasks never seen during training). Shows the model generalizes beyond its training distribution.
0%
Held-out Pass
?
Selective risk at α=0.05 significance level from conformal prediction. Means: when the model does answer, it's wrong only 6.53% of the time.
0%
Selective Risk (α=0.05)
?
Total parameter count of the base model (Gemma-4 4B). Small enough to run on consumer Apple Silicon hardware with MLX framework.
0B
Parameters — Fits on a Mac
Model Size vs Performance
GPT-4 (1.8T)
96%
Cloud
Claude 3.5 (175B)
93%
Cloud
Jarvis ★ LOCAL
95%
4B Params
Llama 3.1 8B
72%
8B
Gemma 4 Base
15%
4B
End-to-end browser task pass rate · Lower parameter count = runs on your Mac
06
Methodology
How It Works
Expert iteration loop: roll out trajectories on production tasks, filter winning runs, fine-tune, evaluate, and repeat.
01
Rollout
02
Filter Wins
03
Fine-tune
04
Evaluate
Rollout Filter Train Eval REPEAT UNTIL CONVERGED
07
FAQ
Frequently Asked
Everything you need to know about MingLLM, our products, and our research.
MingLLM is a local AI research and product company building intelligence that runs entirely on consumer hardware. Our two flagship products — Jarvis (voice AI assistant for macOS) and Tensor (Chrome automation extension) — both run on Apple Silicon with no cloud dependency for core functionality.
Core Jarvis functionality runs 100% locally. Voice transcription uses the Web Speech API, tool execution runs through local Gemma-4 fine-tunes, and memory is stored in a local SQLite database. Optional cloud features like ElevenLabs TTS require internet, but a macOS 'say' command fallback works offline.
Jarvis requires Apple Silicon (M1+) with macOS 14+ and 16GB RAM minimum. Tensor runs on any modern browser as a Chrome extension — no special hardware needed. Our research models (Gemma-4 4B) are designed to run efficiently on consumer Apple Silicon via the MLX framework.
Jarvis can learn entirely new skills through a 5-step self-training pipeline: (1) Interview you about a new capability, (2) Generate synthetic training data, (3) Run a critic model to filter quality, (4) Fine-tune itself via MLX LoRA, (5) Hot-swap the new model with zero downtime. The entire process takes 2-3 hours on consumer hardware.
Absolutely. Tensor's action model runs entirely locally in your browser. No data leaves your machine — no cloud API calls, no telemetry, no data collection. Your form data, credentials, and browsing activity never leave your device. Tensor is privacy-first by design.
Conformal Selective Generation (CSG) is a framework that gives LLMs a mathematically grounded "I don't know" capability. Instead of always answering (and sometimes hallucinating), we sample multiple drafts, measure agreement, and abstain when uncertain — with rigorous coverage guarantees calibrated from a held-out set.
We use expert iteration (ReST-style): roll out trajectories on real tasks, filter winning runs, and fine-tune on them with an anti-forgetting data mix. Over 3 cycles, Gemma-4 4B goes from 15% to 95% on end-to-end browser tasks — matching frontier models that are 100x larger, but running entirely on a Mac.
We're currently in closed beta. Join our waitlist by entering your email in the footer. Beta access is rolling out in batches, with priority given to developers and researchers who sign up early. Tensor's Chrome extension will be the first public release.
08
Testimonials
What People Say
Early feedback from beta testers and research collaborators.
Jarvis filled out 47 forms across 12 different websites in under 3 minutes. What would have taken me an entire afternoon was done before I finished my coffee. This is the future of personal automation.
Dr. Emily Chen — Research Scientist, Stanford NLP
The conformal selective generation paper changed how we think about LLM reliability in production. Having a mathematically grounded abstention mechanism is exactly what enterprise deployments need.
Prof. Michael Torres — AI Safety Lead, DeepMind
Running a 4B model that achieves 95% on multi-step browser tasks — entirely on my MacBook Pro — is insane. The iterative self-improvement approach is elegant and the results speak for themselves.
Kevin Park — CTO, Agentic Labs
Tensor completed our entire vendor onboarding workflow — 23 pages of forms, dropdowns, and file uploads — without a single error. Zero cloud calls. Everything stayed on our machine. This is how AI tools should work.
Rachel Kim — Operations Director, Stripe
09
Compare
Jarvis · Tensor · Minghelper
Three products. One mission: local AI that actually works.
JarvisTensorMinghelper
TypemacOS Desktop AppChrome ExtensionZoom Copilot
Voice Interface
Screen Automation
Form Filling
Live Transcription
Calendar / Mail / Notes
Auto Follow-ups
Terminal Access
Background Tasks
Self-Training Pipeline
Persistent Memory
Data Leaves DeviceNeverNeverNever
HardwareApple M1+ / 16GBAny Modern BrowserApple M1+ / 16GB
ModelGemma-4 4B Fine-tuneGemma-4 4B Fine-tuneGemma-4 4B Fine-tune
PriceFree (Beta)Free (Beta)Free (Beta)
10
Milestones
Roadmap
From research to production — our path to building local AI that matters.
MLX Framework
Gemma 4B
Apple Silicon
Conformal Prediction
Expert Iteration
WebKit Automation
SwiftUI
Chrome Extension API
MLX Framework
Gemma 4B
Apple Silicon
Conformal Prediction
Expert Iteration
WebKit Automation
SwiftUI
Chrome Extension API
Q1 2026 — Complete
Foundation Research
Published conformal selective generation (CSG) and agentic fine-tuning papers. Achieved 91.3% accuracy on GSM8K with calibrated abstention. Open-sourced training pipeline.
Shipped
Q2 2026 — In Progress
Jarvis Beta Launch
Voice-first macOS assistant with 30+ tools, Apple Calendar/Mail/Notes integration, terminal access, and self-training pipeline. Private beta with 200+ waitlist users.
Beta
Q3 2026 — Planned
Tensor Chrome Extension
Zero-shot form filling, multi-tab research workflows, and DOM observation for any website. Runs on the same 4B model fine-tuned for browser tasks. No data leaves your machine.
Upcoming
Q4 2026 — Planned
Unified Local AI Platform
Jarvis + Tensor unified under one app. Cross-device model sync, shared memory, and community plugin marketplace. Full offline capability with optional cloud backup.
Vision
Flexibility
Any Local Model
Not locked to one provider. Jarvis and Tensor work with Llama, Mistral, Qwen, Phi, and any model that runs locally. Your hardware, your models, your rules.
Careers
Join Us
We're building the future of local AI. Small team, massive ambition. If you're passionate about making AI run everywhere, we want to hear from you.
Apply Now See open roles
Stay Updated
Get Early Access
Be the first to try Jarvis and Tensor. No spam — just product updates and early access invitations.
Get Jarvis Now
Available Soon on macOS
Download Jarvis Or join the waitlist

Jarvis — Your AI Desktop Assistant

Jarvis is a voice-first AI assistant built for macOS that goes far beyond chat. It sees your screen, controls your apps, and runs tasks autonomously — all running locally on your hardware with no cloud dependency for core flows.

Voice Interface

Voice input is handled through the Web Speech API for real-time transcription. Text-to-speech responses come through ElevenLabs or Fish Audio for natural-sounding output, with a macOS 'say' command fallback. The entire voice pipeline runs with under 500ms latency on modern hardware.

Intelligence Architecture

Claude-class planning is routed through local Gemma-4 fine-tunes. Complex tasks are decomposed by a planner model, then executed by specialized tool-use models.

Browser Control

Jarvis has access to 30+ browser tools: tcb_smart_click for intelligent element selection, tcb_deep_inspect for understanding page structure, tcb_type for form input, tcb_navigate for page navigation, and many more.

Self-Training (Tensor Code 2)

Jarvis can learn entirely new skills through a self-training pipeline: it interviews you about a new capability, generates synthetic training data, runs a critic to filter quality, fine-tunes itself via MLX, and hot-swaps the new model — all with progress visible on the orb interface.

Memory System

Persistent memory is built on SQLite with FTS5 full-text search. The memory consolidates on idle (mimicking sleep), compressing and linking related experiences.

Capabilities

📅
Calendar
Read and manage Apple Calendar events
✉️
Mail
Access, search, and compose emails
🌐
Chrome
Navigate, click, fill forms, extract data
⌨️
Terminal
Open sessions and run shell commands
🔄
Background
Run tasks autonomously in the background
🧠
Self-Training
Learn new skills and improve over time

Architecture

Voice Input
Web Speech API → TTS (ElevenLabs/Fish/say)
Planner
Claude-class decomposition
Executor
Gemma-4 4B fine-tune
Tools
30+ browser + system tools
Memory
SQLite + FTS5, idle consolidation
Eval Gate
Promote only if metrics pass

Download

Jarvis runs on Apple Silicon (M1+). Requires macOS 14+ and 16GB RAM minimum.

Tensor — Your Screen, on Autopilot

Tensor is a Chrome extension that doesn't just read your screen — it USES it. It sees what's on screen, understands what needs to happen, and acts on it in milliseconds.

DOM Observation + Action Model

Tensor runs a local observation + action model that directly interacts with the DOM. It doesn't rely on screenshots or OCR — it sees the actual page structure, identifies interactive elements, and executes precise actions.

Zero-Shot Form Filling

Tensor uses intelligent heuristics to fill forms across any website without per-site training or configuration.

Multi-Tab Research Mode

In research mode, Tensor operates across dozens of tabs simultaneously. It can open pages, extract specific data points, compile results, and synthesize information.

Privacy-First

Everything stays on your device. No data leaves your browser. No cloud API calls for core functionality.

Use Cases

📋
Forms
Autopilot through tedious web forms instantly
Quizzes
Complete online assessments with high accuracy
🔍
Research
Extract data across dozens of tabs at once
Automation
Automate repetitive browser tasks, no code needed

How It Works

Observe
Parse DOM, identify elements
Plan
Determine action sequence
Act
Click, type, select, navigate