MingLLM Weekly Changelog (Backend Focus)
February 9-12, 2026
Coverage: chat, picker, stream lambda, judge, canvas, leaderboard, signin, tutorial
Status: shipped + deployed
Monday, February 9, 2026
Backend foundation and contract expansion
RouterAction contract expanded and normalized
- Consolidated the POST router contract so chat, sharing, presets, and history management all resolve through explicit
action branches.
- Kept error surfaces deterministic for malformed payloads (
bad_request, missing_conversation_id, missing_messages) so frontend recovery paths can branch cleanly.
- Ensured the same CORS envelope is applied on success and failure responses to avoid browser-level ambiguity when upstream calls fail.
- Added stronger request-shape checks for slot arrays, stream flags, and judge-only payload variants.
HistoryConversation persistence stabilized in DynamoDB snapshot flow
- Hardened write path for
store_history snapshots with normalized role/content filtering and bounded title derivation.
- Improved list hydration path so user threads are recovered by prefix convention and sorted by activity timestamp.
- Added safer delete semantics for thread teardown without blocking the active UI shell.
- Improved fallback loading behavior so empty remote sync no longer wipes usable local history.
PickerCustom mode made the practical default
- Re-centered compare setup around custom provider/model slots, instead of requiring users to fight preset defaults.
- Preserved slot choices across roundtrips so cards stay stable through re-runs and background state updates.
- Aligned card readiness markers with actual stream lifecycle state, not just optimistic request dispatch.
- Reduced accidental hidden coupling between picker controls and auto-judge mode toggles.
ShareThread sharing and retrieval contracts tightened
- Maintained explicit
share_create and share_get routes with deterministic 404 behavior for missing shares.
- Protected share titles with bounded lengths and server-side defaults for incomplete client payloads.
- Kept shared-message shape aligned with first-party conversation storage to simplify UI rendering paths.
CanvasInitial execution pipeline hardened
- Standardized canvas command surface (
Run, Copy, Clear, Close) and tab-scoped document storage.
- Improved extraction of code payloads from streamed model content so runnable HTML reaches preview consistently.
- Prepared no-code fallback path for responses that should not execute, preventing blank-canvas dead states.
StateConversation lifecycle and history sync safeguards
- Added stronger queueing around delayed history sync so message appends are batched and not lost under bursty sends.
- Improved remote/local merge behavior to avoid destructive overwrite when remote returns temporarily empty responses.
- Hydration logic now favors URL-selected conversation IDs while still preserving empty-draft landing behavior.
- Expanded delete-all flow to recreate a fresh draft thread immediately after destructive actions, preventing blank shells.
Tuesday, February 10, 2026
Streaming architecture and judge synchronization
StreamingMoved from full-dump behavior to incremental chunk delivery
- Primary response path now renders incrementally so users see movement immediately, instead of waiting for full completion dumps.
- First-token policy prioritizes a very small early chunk to hit visible response start quickly.
- Chunk stitching logic was tightened to reduce spacing/punctuation artifacts across chunk boundaries.
- Provider errors now preserve partial text where possible, rather than discarding all progress.
Judge GateMingJudge now waits for active model streams to finish
- Auto-judge sequencing no longer races ahead when one or more model slots are still streaming.
- Final arbitration waits for terminal candidate states, reducing premature winner selection.
- Judge-only payload path remains available for explicit arbitration requests, but default compare flow now prioritizes completeness.
- Final judge answer rendering was switched to instant paste behavior once decision is made.
FallbackWarm fallback windows remain hidden until swap criteria are met
- Fallback requests can pre-run in background, but hidden-answer policy prevents leakage before swap.
- Swap logic now checks primary stream progress and candidate validity before replacing output.
- Fallback retry scheduling honors attempt caps and provider-rotation constraints.
- Telemetry captures whether a final card came from primary stream or fallback stream.
RuntimeStream timeout handling and non-stream emergency retry
- Streaming path keeps a deadline budget; if exhausted, runtime now emits partial output and marks truncation.
- If stream parsing fails near deadline, runtime attempts a bounded non-stream retry before surfacing failure.
- Error payloads were normalized to
upstream_unavailable with consistent detail transport.
- History persistence continues when fallback succeeds, preserving conversation continuity.
TypingAdaptive output pacing strategy introduced
- First chunk can animate slowly while network latency is still unknown.
- When model completion is known, typing speed jumps to instant so finished answers are not artificially delayed.
- Chunk-arrival cadence is estimated and used to adjust the live typing slope while stream is active.
- Judge output bypasses slow typing and commits immediately when arbitration is final.
Stream LambdaDedicated streaming endpoint behavior clarified
- Streaming endpoint now rejects non-stream action payloads explicitly with
unsupported_action_on_stream_endpoint.
- Provider/model parsing supports prefix contracts and fallback defaults in one place.
- OpenAI GPT-5 compatibility path uses responses API fallback when chat-completions mismatch is detected.
- Output writer now emits raw text chunks directly for low-latency client rendering and simpler parser contracts.
Streaming + Timeout Tuning (Detailed)
How streaming works now, with exact timeout and retry behavior
FlowEnd-to-end streaming path after this week’s changes
- Client sends streaming requests with explicit timeout fields (
timeout_s, request_timeout_s, req_timeout_s) and high token ceiling hints.
- Stream transport accepts SSE, NDJSON, and plain-text envelopes, then normalizes delta extraction into a single token stream for the UI typer.
- Provider adapters emit incremental text immediately, not full dumps at the end.
- If stream body is missing, client gracefully degrades to plain text body handling and still completes the response lifecycle.
First ChunkHow first paint is forced early
- Backend chunk batching uses phased thresholds (
first, second, next) and punctuation-aware flushing so users see content quickly.
- Client typer starts immediately on first delta and keeps a small lead buffer to avoid stutter while new chunks arrive.
- Long upstream deltas are split into smaller pieces before rendering to avoid frame hitches and giant single-frame DOM writes.
- When stream completes, typer fast-forwards to final text so completion is immediate rather than artificially delayed.
Backend Timeout ModelServer-side stream budgeting and deadline clamp
- Requested timeout is parsed from
timeout_s/request_timeout_s/req_timeout_s and clamped to 10s min / 840s max.
- Effective stream budget is derived against configured stream timeout ceiling, then bounded again inside min/max guards.
- Hard stream deadline is set with an output margin (
stream_deadline = now + max(2.0, budget - 0.8)) so responses can serialize before gateway hard limits.
- If deadline is reached mid-stream, backend returns partial text (trimmed/sanitized) instead of failing the whole response.
Fallback Timeout ModelNon-stream salvage behavior on stream failure
- If streaming path throws, backend attempts bounded non-stream fallback before returning error.
- Fallback timeout is derived from remaining stream budget and clamped by requested timeout and stream ceiling.
- Fallback is skipped when remaining budget is too small (
<1s) to avoid guaranteed late failures.
- Fallback responses still return in chunked format so UI keeps normal streaming semantics and history persistence.
Client Timeout MatrixCurrent runtime values in this release
- Per-request timeout sent from client: 600s (
REQUEST_TIMEOUT_SECONDS = 600).
- Primary card client timeout: 600000ms (Anthropic also 600000ms in this build).
- Fallback stream hard timeout: 22000ms.
- Fallback orchestration windows: arm at 2500ms, swap check at 9500ms, hard window at 24000ms.
Retry StrategyRetries, backoff, and retry-after handling
- Stream fetch layer retries up to 4 attempts on retryable upstream classes (notably 429/502/503/504 and equivalent payload signals).
- Backoff uses parsed
retry-after where available, with computed fallback delay otherwise.
- Grok-specific rate-limit retry in compare flow performs bounded local retries (2 max) with jittered delays.
- Retries preserve runtime state so card UI can surface “retrying/rate limited” instead of jumping directly to hard error.
Compare BackupStatus of warm fallback path in this build
- Parallel warm-swap fallback engine remains implemented with strict controls (max parallel fallback requests, attempt caps, failed-provider suppression).
- Current shipping flag keeps backup auto-swap disabled by default (
COMPARE_BACKUP_ENABLED = false) while primary streaming reliability is stabilized.
- When enabled, fallback results stay hidden until swap policy conditions pass, preventing answer leakage before replacement.
- Timeout reason metadata (
none/slow/error) is captured per card for analytics and post-run diagnostics.
Judge StreamingWhy judge no longer races early
- Judge wait gating now holds arbitration until active candidate streams settle to terminal states.
- This avoids winner selection based on partial candidates when a slower provider is still streaming useful content.
- Once judge result is final, output is rendered instantly and not throttled by slow typer animation.
- Fallback/local judge reasoning remains as guarded safety path when strict judge output is unavailable.
Wednesday, February 11, 2026
Reliability hardening: 504, CORS, provider and auth paths
Reliability504 timeout handling and retry behavior tightened
- Adjusted upstream timeout budgeting to reduce runaway retry storms when gateway 504 errors repeat.
- Added stronger guardrails around fallback scheduling so retries do not keep opening indefinitely.
- Surfaced clearer request error detail to frontend debugging surfaces for faster triage.
- Reduced cases where repeated timeout loops consumed UI interactivity.
CORSOrigin header duplication corrected for Lambda URL path
- Resolved duplicate
Access-Control-Allow-Origin emission that produced browser hard failures.
- Normalized response path so one origin value is emitted for allowed sites and omitted otherwise.
- Maintained
OPTIONS handling for preflight while preserving explicit content-type behavior on stream responses.
- Reduced false-negative network errors caused by browser CORS policy rejection of otherwise 200 responses.
xAIGrok provider path refreshed for current model family and rate limits
- Provider parsing now maps
xai aliases into grok consistently across routes.
- Model normalization was aligned with active Grok model names and token caps.
- Rate-limit failures (
429 Too Many Requests) now bubble up with explicit upstream context.
- Fallback logic avoids silently pretending success when provider quota is exhausted.
SecretsCredential retrieval and provider key caching improved
- Consolidated Secrets Manager lookups by provider and cached resolved key values to reduce repeated fetch overhead.
- Preferred-key extraction supports provider-specific key names plus generic
api_key fallbacks.
- Startup no longer depends on every optional provider key if that provider is not selected for the current request.
ConsoleSTOP warning and MingLLM banner logging deduplicated
- Console warning sequence now emits once per boot instead of repeating every few seconds.
- Retained clear anti-phishing STOP text and build id visibility without noisy log spam.
- Ensured banner appears again only on true fresh load boundaries.
ErrorsHistory hydration failure modes clarified
- Improved behavior around
list_conversations, get_conversation, and shared-thread lookup when server returns empty or not-found.
- Front-end hydration path now degrades safely to local conversations rather than collapsing the entire chat view.
- Protected selected-thread behavior while background sync is retrying.
Auth/RoutesMissing token and bad-route diagnostics improved
- Clarified behavior for
Missing Authentication Token responses by separating route-not-found failures from authorization failures.
- Improved malformed action/request responses so 400/404 classes are easier to differentiate in browser console traces.
- Reduced ambiguity between upstream 500 errors and local request-shape errors during hydration calls.
- Added better fallback-to-local posture when authenticated fetch routes return non-terminal failures.
Thursday, February 12, 2026
Runtime polish, canvas reliability, and release rollout
Typing UXFinal-pass pacing tuned for realism without delay tax
- Kept early streaming visually legible with slower initial cadence while chunks are still sparse.
- When a model reaches done state, remaining text now resolves instantly to avoid artificial waiting.
- Spacing rules were adjusted so chunk joins no longer inject extra spaces while still preserving word boundaries where needed.
- Judge final card is now instant-commit by default.
CanvasPreview pipeline repaired and empty-state behavior upgraded
- Fixed broken canvas render path where valid HTML was not appearing in preview.
- Added reliable runtime capture for iframe errors and surfaced them in the canvas terminal panel.
- When no runnable code exists, canvas now shows an animated, slightly opaque MingLLM idle state instead of a blank pane.
- Preserved code editor contents and tab metadata in local storage for cross-reload continuity.
NotificationsCross-chat completion toasts added for background runs
- If a user starts a new thread while another run is still active, completion is now announced with a toast.
- Toast interaction supports jumping directly to the finished conversation context.
- Notification settings were wired through top-level preferences to reduce surprise behavior.
NavigationStandalone changelog route shipped across all core surfaces
- Added dedicated changelog destination (
/changelog and /changelog.html) and linked from main sidebar/header.
- Added changelog entry points in tutorial, music, picker, leaderboard, and sign-in pages.
- Added sitemap entry for changelog discoverability and search indexing consistency.
- Added route fallback redirect behavior when static index fallback serves
/changelog.
DeployScramble + deploy flow completed and verified in production
- Regenerated production-dist HTML with scramble pipeline for changelog and index assets.
- Uploaded both pretty route object and html route object to S3 with no-cache headers.
- Issued CloudFront invalidation and verified route health using direct HTTP checks.
- Confirmed live navigation from application sidebar into standalone changelog page.
Release GovernanceCross-page change accounting made explicit
- Documented shipped changes by page and backend component in a single weekly artifact for fast operational review.
- Aligned sidebar and top-nav links so release notes are discoverable from core product surfaces.
- Added route-level checks to verify both
/changelog and /changelog.html stay healthy after deploy.
- Expanded changelog content style to include system behavior, not only UI text-level diffs.
Backend Deep Dive (What changed under the hood)
API layer, storage, stream runtime, and deployment details
API SurfaceMain Lambda route map and responsibilities
action=list_conversations: returns normalized conversation list for sidebar hydration.
action=store_history: writes snapshot with title, counts, and normalized message payloads.
action=get_conversation: loads private or shared thread, returning consistent message arrays.
action=delete_conversation: removes snapshot records and returns deterministic delete status.
action=chat_multi: executes multi-slot compare calls and can emit NDJSON stream updates.
action=judge_only: runs strict arbitration on provided candidates and returns final answer plus reason.
action=list_models/list_presets/save_preset/delete_preset: powers picker customization.
StorageDynamoDB write/read strategy for conversation durability
- Snapshots are persisted with bounded payload shape and safe defaults to prevent malformed history entries.
- Title derivation uses first-user-message fallback when explicit title is missing.
- Load path supports both direct conversation rows and shared rows with backward compatibility checks.
- Delete flow is explicit and returns deleted-count for UI reconciliation.
Streaming CorePrimary stream + bounded fallback execution model
- Primary provider starts immediately and emits incremental chunks for visible progress.
- Fallback provider can run in parallel under strict conditions but remains hidden until swap policy passes.
- Swap policy checks include: primary chunk presence, fallback validity, attempt ceilings, and per-provider failure history.
- If stream terminates unexpectedly, bounded non-stream retry path attempts to salvage completion inside remaining budget.
ProvidersAdapter normalization across OpenAI, Anthropic, Grok, Gemini, DeepSeek
- Provider parsing accepts model prefixes (
openai:model, grok:model) and normalizes aliases like xai to grok.
- OpenAI path can pivot between chat-completions streaming and responses API for model compatibility mismatches.
- Anthropic, DeepSeek, and Grok paths stream through compatible SSE parsing to unify chunk handling.
- Token ceilings are clamped by provider/model constraints to reduce invalid-request failures.
CORS/AuthCross-origin and identity behavior cleanup
- Single-origin CORS emission restored for allowed frontends, preventing duplicate-origin browser rejection.
- Preflight
OPTIONS flow remains explicit with controlled method/header allow lists.
- Signed-request checks and auth-aware routes now fail fast with deterministic status codes.
- OAuth redirect and token callback routes remain isolated from stream/chat routes.
ObservabilityError detail and debug signaling improved
- Normalized upstream error payloads help distinguish timeout, truncated, and unsupported-provider classes.
- Stream path now records source attribution (primary vs fallback) for final answers.
- Reduced redundant console noise while preserving explicit STOP warning and build-id visibility.
DeploymentStatic route deployment and cache invalidation workflow
- Source pages are maintained in
prod_*.html and transformed to dist_*.html with deterministic scramble step.
- Both route forms are uploaded: pretty path object (for
/changelog) and html object (for /changelog.html).
- CloudFront invalidation is run after upload to force edge refresh of changed routes.
- Route health is validated with direct
curl -I checks and browser navigation checks.
How The Leap Works (Backend + Runtime Flow)
Scheduling, API sequence, live progress, scoring, and UI integration
Step 1Drop scheduling in Pacific Time
- Weekly drop target is computed as Saturday 12:00 PM PT.
- Monthly drop target is computed as the first Saturday of the month at 12:00 PM PT.
- PT/PDT transitions are handled by zone-aware date math (
America/Los_Angeles) so countdowns stay correct across DST boundaries.
Step 2Client requests newest run id
- The Leap client first calls
GET /runs/latest?kind=weekly|monthly&include_incomplete=true.
- If dynamic API is unavailable, client falls back to static objects at
/runs/live/{kind} and /runs/latest/{kind}.
- Cache-busting query values are appended on fetch to avoid stale edge/browser data.
Step 3Live run state and progress polling
- For active runs, client calls
GET /runs/{id} and displays status, scored count, failed count, and total tasks.
- Polling runs every 5 seconds while status is not terminal.
- Pace and ETA are calculated from scored progress and elapsed runtime to estimate completion time in PT.
- Live leader preview can show current champion before final aggregation completes.
Step 4Final leaderboard payload retrieval
- Once run status is completed, client requests
GET /runs/{id}/leaderboard.
- Payload is adapted into a normalized weekly-run shape with overall metrics and per-category metrics.
- If dynamic endpoint is absent, static latest-run payload is adapted from
/runs/latest/{kind}.
- If no valid models are present yet, UI keeps progress state and explicit not-ready messaging.
Step 5Ranking and tie-break behavior
- Overall rank ordering prioritizes overall score descending, then accuracy descending, then coverage descending, then latency ascending.
- Category winner logic (coding, writing, school/general) uses category score first, then category accuracy, then overall score.
- A top-three group extraction chooses unique leaders across coding/writing/general to avoid duplicate picks.
- These leaders are used to inform compare defaults and highlight strongest models by task shape.
Step 6Result caching and resilience fallback
- Successful leaderboard payloads are cached in local storage per run-kind key (
mingllm:leaderboard:weekly/monthly).
- If live fetch fails, cached data can be used as controlled fallback so The Leap page remains functional.
- Cache is cleared when live run is active to avoid mixing in-progress status with stale final boards.
Step 7How Leap data feeds product UX
- The Leap surface is not only a static leaderboard page; it informs model confidence framing across compare and evaluation workflows.
- Drop cadence gives predictable checkpoints for users who want fresh benchmark snapshots rather than ad hoc score drift.
- Live progress mode improves trust during long benchmark runs by exposing completion metrics and ETA.
- Download links for JSON/CSV payloads are exposed when backend includes downloadable artifacts.
API Contract Appendix (Extended)
Request envelopes, response semantics, stream behavior, and error taxonomy
EnvelopeCanonical request body shape for chat and compare
- Primary request contract supports
messages, model, history, streaming, and optional compare-slot payloads.
- Multi-slot compare contracts carry slot metadata and runtime controls, including fallback-related flags for internal orchestration.
- Judge mode can be requested explicitly by action, model prefix, or dedicated judge-only route.
- Input-size guards are enforced for image bytes, file text, and user prompt content to avoid runaway payload failures.
ValidationDefensive checks at router entry
- Invalid JSON, malformed message arrays, or missing required IDs return deterministic 400-class responses.
- Unsupported media types are rejected early with explicit content-type requirement messaging.
- Signed request and auth-aware routes perform upfront authorization checks before downstream processing.
- Request-body filtering rejects unknown high-risk shapes without crashing the whole invocation.
ResponseConsistent JSON and stream response contracts
- Standard JSON responses return
ok and route-specific data fields for simple action calls.
- Compare-streaming path can emit NDJSON updates for progressive slot state delivery.
- Stream endpoint uses chunked text output and finishes with deterministic terminal semantics.
- Error responses preserve CORS envelope to prevent browser-side silent drops.
ErrorsOperational error classes exposed to client
upstream_unavailable: provider call failed due to timeout/rate limit/network or provider-side error.
upstream_truncated: stream budget reached and output ended before full completion.
unsupported_provider/unsupported_feature: invalid provider/model usage path.
missing_conversation_id/missing_messages: storage and thread operations missing required inputs.
HistoryConversation and share storage contract details
- Snapshots persist normalized role/content messages with bounded title and count metadata.
- List results return user-facing conversation summaries suitable for sidebar rendering without loading full message bodies.
- Shared conversations have dedicated create/get operations and explicit 404 semantics when absent.
- Delete operations return deleted-count metrics for optimistic UI reconciliation.
Model RegistryProvider/version discovery and preset storage
list_models exposes provider registry payload used by picker UI to populate valid options.
list_presets, save_preset, and delete_preset maintain user compare-team templates.
- Version normalization maps ambiguous user inputs into provider-safe model identifiers.
- Preset save path validates name and slot shape before persistence to prevent corrupted compare layouts.
CORSCross-origin contract expectations
- Exactly one
Access-Control-Allow-Origin value is emitted for accepted origins.
- Preflight contract advertises supported methods and headers consistently across JSON and stream routes.
Access-Control-Expose-Headers includes live-data signal headers used by frontend runtime.
- Credentialed requests remain enabled for auth-bound flows while preserving explicit origin checks.
SecuritySecrets and provider key handling
- Provider API keys are fetched from Secrets Manager and cached in-memory for subsequent invocations.
- Secret extraction supports provider-specific keys and safe generic fallback keys.
- Provider calls fail fast when keys are unavailable instead of silently routing to a wrong provider.
- Auth-sensitive routes and share routes keep explicit checks separated from stream transport logic.
Leap Internals (Expanded Runtime Notes)
Data model, state transitions, scoring math, and fallback topology
Data ModelCore payload objects used by The Leap
LatestRunPayload: carries run id, run kind (weekly/monthly), status hints, and display label.
RunDetailsPayload: carries status, progress counters, timestamps, and live leaderboard snapshot fields.
LeaderboardPayload: carries final ranking rows, per-category metrics, and optional download links.
- Static fallback payloads are adapted into the same in-memory shape so UI components do not fork heavily by source.
State MachineRun lifecycle in UI
- Boot state: clear old run state, fetch newest run pointer, and determine dynamic vs static source path.
- Running state: poll progress every 5 seconds and render live cards with completion/ETA indicators.
- Completed state: fetch leaderboard payload and render sortable ranking table plus champions.
- Error/degraded state: show fallback cached results or explicit unavailable messaging when no cache exists.
Progress MathHow pace and ETA are estimated
- Effective completion uses scored count when available, falling back to completed count otherwise.
- Pace is computed as effective completed items per minute over elapsed run time.
- ETA derives from remaining items divided by observed pace and is displayed in PT.
- All displayed timestamps are zone-normalized through PT conversion utilities for consistency.
RankingChampion and category winner derivation
- Overall champion comes from sorted rows by score, then accuracy, then coverage, then lower latency.
- Category winners apply category-specific score and accuracy before using overall score as tie-break.
- General group uses school category metrics as proxy signal for broad baseline competence.
- Top-three unique picks ensure category highlights are not all dominated by one repeated model row.
FallbackDynamic API to static artifact downgrade flow
- When dynamic routes return 404/unavailable signals, client marks dynamic API as missing and switches to static artifacts.
- Static live route (
/runs/live/{kind}) supports in-progress UI even without dynamic API host.
- Static latest route (
/runs/latest/{kind}) provides final leaderboard data when available.
- Fallback switch is sticky during that page lifecycle to prevent oscillation and request thrash.
CachingLocal resilience strategy
- Final leaderboard payloads are saved by run kind in local storage for rapid cold-start and outage resilience.
- Cache entries include both run data and metadata (status, n-per-category, downloads).
- When run status is in-progress, stale cache for that kind is removed to avoid stale-finished illusion.
- Cache fallback can be disabled in production by environment flags when strict live-only behavior is desired.
Product CouplingHow Leap affects compare behavior
- Leap top performers by group are surfaced as candidate guidance for compare defaults and experimentation flows.
- Category-specific winners make it easier to choose slots by task shape (coding vs writing vs general).
- Leaderboard freshness signals help users decide when to update a previously saved compare preset.
- This creates a loop: production usage informs benchmarks, and benchmarks inform better production defaults.
UX GuaranteesWhat Leap promises at runtime
- PT-correct drop labeling for both weekly and monthly tracks.
- Readable live-progress view while runs are still scoring.
- Clear not-ready messaging when latest run exists but final leaderboard is not yet available.
- Deterministic fallback behavior so page remains useful during backend maintenance windows.
Operations + Deployment Notes
Build pipeline, cache behavior, verification checklist, and known limits
BuildChangelog artifact production flow
- Source page is edited in
prod_changelog.html for readable maintenance.
- Deploy variant is generated as
dist_changelog.html through scramble_html.py.
- Route mirror
changelog.html is kept in workspace for parity testing and local checks.
- Build id seed from current commit is applied when generating scrambled distribution output.
PublishS3 object strategy for route compatibility
- Pretty route object
s3://.../changelog is uploaded for clean URL navigation.
- HTML route object
s3://.../changelog.html is uploaded for direct file path compatibility.
- Both objects use
cache-control: no-cache to reduce stale-edge persistence during rapid iteration.
- CloudFront invalidation targets route-specific paths for faster propagation and lower blast radius.
VerifyRelease validation checklist used this week
- HTTP status checks on both route forms (
/changelog and /changelog.html).
- Content checks for expected anchor headings and update counters after invalidation.
- Navigation checks from main app sidebar into changelog destination.
- Anchor integrity checks for timeline buttons vs section ids to guarantee in-page navigation.
IncidentsReliability incidents addressed in this release train
- Repeated 504 timeout bursts on API gateway paths under high retry pressure.
- Duplicate CORS origin headers causing browser policy hard-fail despite otherwise successful backend responses.
- xAI rate-limit upstream failures surfacing as user-visible provider errors in compare cards.
- Canvas blank-render failures when runnable code extraction failed or preview state desynced.
GuardrailsOperational protections kept in place
- Bounded stream deadlines and fallback budgets to avoid infinite retry chains.
- Input size limits on chat/file/image payloads to protect runtime memory and response latencies.
- Provider-specific token clamping to reduce invalid request patterns and quota waste.
- Deterministic error envelopes to keep frontend retry logic predictable under failure conditions.
NextKnown limits and follow-up work
- Add broader per-provider circuit-breaker metrics to proactively disable unstable providers during active incidents.
- Move more route health checks into automated synthetic monitoring to catch CORS/config regressions earlier.
- Add deeper stream chunk integrity tests for punctuation/spacing edge cases across multilingual responses.
- Expand The Leap public payload metadata with run provenance and benchmark-version identifiers.
| Surface |
Coverage this week |
Status |
/ (main chat) |
Incremental streaming, first-chunk visibility, adaptive typing speed, judge wait gating, chunk spacing normalization, STOP dedupe, background completion toasts |
Updated |
streamApi() client transport |
Unified parsing across SSE/NDJSON/plain payloads, 4-attempt retry strategy, retry-after aware backoff, robust delta extraction |
Updated |
lambda_function.py (core API) |
Action routing hardening, conversation snapshot behavior, share endpoints, judge path tightening, fallback budget handling, unified error envelopes |
Updated |
stream_lambda/index.mjs |
Provider/model normalization, key-cache usage, streamified response path, OpenAI Responses fallback for non-chat models, Grok adapter updates |
Updated |
| Timeout controls (front + back) |
Client request timeout 600s, server clamp 10-840s, stream deadline margining, fallback timeout bounded by remaining budget, per-card local timeout handling |
Updated |
/runs/latest?kind=* |
Primary Leap run-pointer route used to locate latest weekly/monthly run ids including in-progress runs |
Updated |
/runs/{id} |
Live run status/pacing source for progress cards, ETA calculations, and terminal-state transition logic |
Updated |
/runs/{id}/leaderboard |
Final normalized leaderboard payload with category metrics and optional artifact download links |
Updated |
/runs/live/{kind} (static fallback) |
Static live-state fallback used when dynamic API host is unavailable |
Updated |
/runs/latest/{kind} (static fallback) |
Static final leaderboard payload fallback for S3/CloudFront-only deployments |
Updated |
/pick |
Custom-mode-first behavior, slot persistence, judge-ready card state alignment, fallback visibility tuning |
Updated |
/leaderboard (The Leap) |
Dynamic and static run fetching, live progress polling, PT-correct drop labels, cache fallback strategy, ranking/tie-break normalization |
Updated |
/tutorial |
Added changelog discovery links and reinforced compare/judge operational guidance references |
Updated |
/music |
Cross-surface changelog navigation for release visibility |
Updated |
/signin |
Added direct changelog pill entry for release transparency before login |
Updated |
/faq |
Cross-linked changelog discovery path for product transparency from support/documentation entry points |
Updated |
/changelog |
Standalone route, long-form backend notes, Leap architecture section, route fallback compatibility, sitemap inclusion |
New + Updated |
/changelog.html |
Scrambled deploy mirror of changelog with no-cache headers for rapid weekly update propagation |
Updated |
sitemap.xml |
Added changelog URL for crawler discovery and index consistency |
Updated |
scramble_html.py |
Used to produce deterministic distribution HTML variant with build-seeded transforms for production upload |
Used |
cdk/config/prod_deploy_scripts.txt |
Includes changelog-specific scramble/upload/invalidation commands for repeatable release workflow |
Updated |