Chrome AI Prompt Chat Playground
Test the experimental Chrome built-in Prompt API (LanguageModel) directly in your browser. Send prompts to Gemini Nano on-device, configure a system prompt and sampling parameters, watch responses stream in real time, and monitor model download and context window usage — no server required.
Readme
What is the Prompt API?
The Prompt API is an experimental web platform proposal from the W3C Web Machine Learning Community Group that exposes a browser-provided large language model to JavaScript through a window.LanguageModel interface. Pages create a session with LanguageModel.create(), optionally configure it with a system prompt, and then call prompt() or promptStreaming() to get a response.
Unlike calling a hosted LLM API, the model runs on the user's device. That means inputs never leave the machine, the page works offline once the model is cached, and there are no per-request costs. In Chrome the underlying model is Gemini Nano, but the API is intentionally model-agnostic so other browsers can plug in different implementations.
A session is stateful: it tracks the conversation history within a context window measured in tokens. When the window fills up, the oldest non-system messages are evicted automatically (a contextoverflow event fires), and you can inspect session.contextUsage and session.contextWindow at any time to see how much space is left.
Tool description
This playground is an interactive chat surface wired to window.LanguageModel. It lets you set a system prompt, send messages, and watch the model stream tokens back in real time. A progress bar shows model availability and download progress, and a token-usage bar reports how much of the session's context window is being consumed.
The session is created lazily on first send and reused across messages until you change the system prompt, at which point it is destroyed and a fresh one is created with the new instructions.
Features
- Streaming responses — uses
promptStreaming()so tokens appear in the chat as they are produced. - System prompt editor — define a persistent role or behavior that conditions every response.
- Stop button — abort an in-flight prompt with an
AbortControllerwithout tearing down the session. - Live context usage — visualizes
contextUsageagainstcontextWindowso you can see when the conversation is about to overflow. - Availability and download progress — surfaces
availability()state anddownloadprogressevents while the model is being fetched.
Use cases
- Trying prompts locally — iterate on system prompts and few-shot patterns without paying for a cloud API.
- Testing on-device AI feasibility — verify that the Prompt API is available, see how big the context window is, and benchmark response speed on your hardware.
- Privacy-sensitive drafting — brainstorm or rephrase text that should not be sent to a third-party server.
Requirements
- A browser that implements the Prompt API. Chrome 138+ exposes it experimentally; in older versions you may need to enable it via
chrome://flags/#prompt-api-for-gemini-nanoand have the on-device model downloaded. - A secure context (HTTPS or
localhost). - A device that meets the model's hardware requirements (sufficient disk space, RAM, and a supported GPU/CPU). On unsupported devices
availability()returnsunavailable. - Initial bandwidth to download the model. Subsequent sessions reuse the cached model.
How it works
- On mount, the tool checks
typeof window.LanguageModel. If absent, a warning replaces the chat input. LanguageModel.availability()reports one ofavailable,downloadable,downloading, orunavailable. The result is shown in the progress bar.- The first time you press Send, the tool calls
LanguageModel.create()with amonitorthat streamsdownloadprogressevents to the UI. If a system prompt is set, it is passed viainitialPrompts: [{ role: "system", content: ... }]. - The user message is sent through
session.promptStreaming(text, { signal }). The returnedReadableStream<string>is consumed chunk by chunk and appended to the assistant message. - After each response,
session.contextUsageandsession.contextWindoware read and reflected in the token-usage bar. - Pressing Stop calls
controller.abort(), which rejects the in-flight stream with anAbortErrorwhile leaving the session alive for the next prompt. - Editing the system prompt invalidates the cached session: the existing one is
destroy()-ed and the next send creates a new session with the updated instructions.
Options explained
- System prompt — a free-form instruction passed as the first
system-role message. It conditions all subsequent turns. Leaving it blank creates a session without a system message. - Send / Stop — Send submits the input as a
usermessage. Stop aborts the streaming response without deleting prior messages. - Token usage —
used / totaltokens for the current session. Whenusedapproachestotal, older user/assistant pairs will be evicted on the next prompt. - Model status — combined readout of
availability()anddownloadprogress. While the model downloads, the bar animates and shows a percentage.
Limitations
- The Prompt API is experimental. Method names, options (e.g.
inputUsagevscontextUsage), and event semantics may change between Chrome releases. - Output quality, factuality, and instruction-following depend entirely on the browser-provided model and are not guaranteed.
- The context window is small compared to hosted LLMs. Long conversations will overflow and silently drop the earliest turns.
- The API is not exposed to web workers and may require Permissions Policy (
language-model) delegation on cross-origin iframes. - This tool intentionally does not expose
temperature,topK, tool use, multimodal inputs, structured output (responseConstraint), or session cloning. They are part of the spec but kept out of the playground to stay focused on plain chat.
FAQ
Why does it say the API is unsupported?
window.LanguageModel is undefined in your browser. Try the latest Chrome on desktop and, if needed, enable the Prompt API flag and wait for the on-device model to download via chrome://components.
Why is the first response slow? The first call may trigger a model download (watch the progress bar) and a session-creation step. Later prompts reuse the same session and start streaming almost immediately.
Does my prompt leave the device? No. The model runs locally. Your text is not sent to any server by this tool.
What happens when the context fills up?
The session fires a contextoverflow event and evicts the oldest non-system turns to make room. The system prompt is preserved.
Why did changing the system prompt clear the response style mid-conversation? Editing the system prompt destroys the current session and creates a new one on the next send. The new session has no memory of previous turns.