Ollama cURL Request Generator
Generate cURL commands for Ollama API endpoints. Configure model, prompt, temperature, and other parameters for generate, chat, and embeddings requests.
Input
Output
Readme
What is the Ollama API?
Ollama is an open-source tool for running large language models (LLMs) locally on your machine. It provides a REST API that accepts HTTP requests, allowing you to interact with models like Llama 3, Mistral, Gemma, and many others directly from your terminal or application code. The API follows a simple JSON-based request/response pattern and supports text generation, multi-turn chat conversations, and text embeddings.
cURL is the most common way to test and interact with the Ollama API. However, constructing the correct cURL command with all the right parameters, headers, and JSON body formatting can be tedious and error-prone, especially when tuning model options like temperature and top-k sampling.
Tool description
This tool generates ready-to-use cURL commands for Ollama API endpoints. Select an endpoint, configure your model and parameters, and get a properly formatted cURL command instantly. The generated command includes all necessary headers, JSON body structure, and model options — ready to paste into your terminal.
Examples
Basic text generation:
curl -X POST "http://localhost:11434/api/generate" \
-H "Content-Type: application/json" \
-d '{
"model": "llama3",
"prompt": "Explain quantum computing in simple terms",
"stream": true
}'Chat with system prompt and custom temperature:
curl -X POST "http://localhost:11434/api/chat" \
-H "Content-Type: application/json" \
-d '{
"model": "llama3",
"messages": [
{ "role": "system", "content": "You are a helpful coding assistant." },
{ "role": "user", "content": "Write a Python function to reverse a string" }
],
"stream": false,
"options": {
"temperature": 0.3
}
}'Generate embeddings:
curl -X POST "http://localhost:11434/api/embeddings" \
-H "Content-Type: application/json" \
-d '{
"model": "llama3",
"prompt": "The quick brown fox jumps over the lazy dog",
"stream": false
}'Features
- Supports all three main Ollama endpoints:
/api/generate,/api/chat, and/api/embeddings - Configurable model options: temperature, top-p, top-k, max tokens, repeat penalty, and seed
- System prompt support for generate and chat endpoints
- JSON response format option for structured output
- Download generated command as a
.shfile
Options explained
| Option | Description | Default | Range |
|---|---|---|---|
| Temperature | Controls randomness of output. Lower values produce more focused text, higher values increase creativity. | 0.7 | 0–2 |
| Top P | Nucleus sampling threshold. The model considers tokens whose cumulative probability reaches this value. | 0.9 | 0–1 |
| Top K | Limits token selection to the K most likely candidates at each step. | 40 | 1–100 |
| Max tokens | Maximum number of tokens to generate in the response. Set to -1 for unlimited. | 128 | -1–4096 |
| Repeat penalty | Penalizes repeated tokens. Values above 1.0 discourage repetition. | 1.1 | 0–2 |
| Seed | Fixed seed for reproducible output. Leave empty for random results. | — | Any integer |
| Response format | Set to JSON to force the model to return valid JSON output. | None | None / JSON |
| Stream | When enabled, the response is streamed token by token. Disable to receive the full response at once. | On | On / Off |
Use cases
- Quickly prototyping and testing Ollama API calls from the terminal without manually writing JSON
- Generating cURL commands to share with teammates or include in documentation
- Experimenting with different model parameters to find optimal settings for your use case