Current Status

Idle

session active

Session

14m 32s

Tokens used

8,421

Context load

62%

Agents active

3 / 5

Recent Transcript

You: What's the weather like today?

Gwen: It's 72 and sunny in Palo Alto - a perfect California afternoon.

You: Set a reminder for 3pm.

Gwen: Done. Reminder set for 3pm to review the design draft.

[14:32:01] ASR stream opened - sample rate 16kHz

[14:32:03] Fast Planner cache miss - routing to GwenBrain

[14:32:05] GwenBrain plan generated in 212ms

[14:32:06] Kokoro audio pipeline initialized

[14:32:08] Executor dispatched: weather_api

[14:32:10] High ambient noise detected on mic

[14:32:12] Context budget at 62% - 8,421/16,384 tokens

[14:32:15] ASR confidence: 0.89 - partial match

Spec Simulator v0.4

Gwen HUD

High-fidelity interface spec for the Gwen voice assistant - notch controls, memory allocation, and latency benchmarking, live in your browser.

01Resource Allocation

Memory & Context Budget

Token allocation across static cache and dynamic session context

8,420/ 16,384 tokens

~8.2 GB / 16 GB RAM

Static Cache
Dynamic Context
25%
50%
75%
100%
Static cache: 4,800 tokens (59% of 8K)Dynamic context: 3,620 tokens (44% of 8K)Available: ~7.8 GB RAM
02Performance Tuning

Latency Sandbox

Adjust each stage to simulate and compare cold vs warm latency

ASR Conversion
1.8s
1.8s3.2s
Cold: 1.8sTarget: 45ms
Mid
Fast Planner Lag
400ms
400ms1.2s
Cold: 400msTarget: 12ms
Mid
GwenBrain Smart Plan
3.5s
3.5s6.5s
Cold: 3.5sTarget: 85ms
Mid
Kokoro First Audio
2.2s
2.2s4.0s
Cold: 2.2sTarget: 35ms
Mid
Executor Actions
600ms
600ms1.5s
Cold: 600msTarget: 18ms
Mid

Cold Launch Estimate

15.0s

First-time initialization - all models loading from scratch

Warm Target

<195ms

Hot cache - models resident, sub-100ms pipeline goal

Current valueWarm targetScale: 0ms - 7.0s

Gwen HUD Spec Simulator

Modeled for 16GB unified memory architecture