Step 1: Create and export an API key
Before you begin, create an API key in the Fireworks dashboard. Click Create API key and store it in a safe location. Once you have your API key, export it as an environment variable in your terminal:- macOS / Linux
- Windows
export FIREWORKS_API_KEY="your_api_key_here"
setx FIREWORKS_API_KEY "your_api_key_here"
Step 2: Make your first Serverless API call
- Python (Fireworks SDK)
- Python (OpenAI SDK)
- Python (Anthropic SDK)
- JavaScript (OpenAI SDK)
- JavaScript (Anthropic SDK)
- curl
Install the Fireworks Python SDK:Then make your first Serverless API call:
The SDK is currently in alpha. Use the
--pre flag when installing to get the latest version.pip install --pre fireworks-ai
from fireworks import Fireworks
client = Fireworks()
response = client.chat.completions.create(
model="accounts/fireworks/models/deepseek-v3p1",
messages=[{
"role": "user",
"content": "Say hello in Spanish",
}],
)
print(response.choices[0].message.content)
Fireworks provides an OpenAI compatible endpoint. Install the OpenAI Python SDK:Then make your first Serverless API call:
pip install openai
import os
from openai import OpenAI
client = OpenAI(
api_key=os.environ.get("FIREWORKS_API_KEY"),
base_url="https://api.fireworks.ai/inference/v1"
)
response = client.chat.completions.create(
model="accounts/fireworks/models/deepseek-v3p1",
messages=[{
"role": "user",
"content": "Say hello in Spanish",
}],
)
print(response.choices[0].message.content)
Fireworks provides an Anthropic compatible endpoint. Install the Anthropic Python SDK:Then make your first Serverless API call:
pip install anthropic
import os
import anthropic
client = anthropic.Anthropic(
api_key=os.environ.get("FIREWORKS_API_KEY"),
base_url="https://api.fireworks.ai/inference"
)
response = client.messages.create(
model="accounts/fireworks/models/deepseek-v3p1",
max_tokens=1024,
messages=[{
"role": "user",
"content": "Say hello in Spanish",
}],
)
print(response.content[0].text)
Fireworks provides an OpenAI compatible endpoint. Install the OpenAI JavaScript / TypeScript SDK:Then make your first Serverless API call:
npm install openai
import OpenAI from "openai";
const client = new OpenAI({
apiKey: process.env.FIREWORKS_API_KEY,
baseURL: "https://api.fireworks.ai/inference/v1",
});
const response = await client.chat.completions.create({
model: "accounts/fireworks/models/deepseek-v3p1",
messages: [
{
role: "user",
content: "Say hello in Spanish",
},
],
});
console.log(response.choices[0].message.content);
Fireworks provides an Anthropic compatible endpoint. Install the Anthropic JavaScript / TypeScript SDK:Then make your first Serverless API call:
npm install @anthropic-ai/sdk
import Anthropic from "@anthropic-ai/sdk";
const client = new Anthropic({
apiKey: process.env.FIREWORKS_API_KEY,
baseURL: "https://api.fireworks.ai/inference",
});
const response = await client.messages.create({
model: "accounts/fireworks/models/deepseek-v3p1",
max_tokens: 1024,
messages: [
{
role: "user",
content: "Say hello in Spanish",
},
],
});
console.log(response.content[0].text);
curl https://api.fireworks.ai/inference/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $FIREWORKS_API_KEY" \
-d '{
"model": "accounts/fireworks/models/deepseek-v3p1",
"messages": [
{
"role": "user",
"content": "Say hello in Spanish"
}
]
}'
"¡Hola!"
Common use cases
Streaming responses
Stream responses token-by-token for a better user experience:- Python (Fireworks SDK)
- Python (OpenAI SDK)
- Python (Anthropic SDK)
- JavaScript (OpenAI SDK)
- JavaScript (Anthropic SDK)
- curl
from fireworks import Fireworks
client = Fireworks()
stream = client.chat.completions.create(
model="accounts/fireworks/models/deepseek-v3p1",
messages=[{"role": "user", "content": "Tell me a short story"}],
stream=True
)
for chunk in stream:
if chunk.choices[0].delta.content:
print(chunk.choices[0].delta.content, end="")
import os
from openai import OpenAI
client = OpenAI(
api_key=os.environ.get("FIREWORKS_API_KEY"),
base_url="https://api.fireworks.ai/inference/v1"
)
stream = client.chat.completions.create(
model="accounts/fireworks/models/deepseek-v3p1",
messages=[{"role": "user", "content": "Tell me a short story"}],
stream=True
)
for chunk in stream:
if chunk.choices[0].delta.content:
print(chunk.choices[0].delta.content, end="")
import os
import anthropic
client = anthropic.Anthropic(
api_key=os.environ.get("FIREWORKS_API_KEY"),
base_url="https://api.fireworks.ai/inference"
)
with client.messages.stream(
model="accounts/fireworks/models/deepseek-v3p1",
max_tokens=1024,
messages=[{"role": "user", "content": "Tell me a short story"}],
) as stream:
for text in stream.text_stream:
print(text, end="", flush=True)
import OpenAI from "openai";
const client = new OpenAI({
apiKey: process.env.FIREWORKS_API_KEY,
baseURL: "https://api.fireworks.ai/inference/v1",
});
const stream = await client.chat.completions.create({
model: "accounts/fireworks/models/deepseek-v3p1",
messages: [{ role: "user", content: "Tell me a short story" }],
stream: true,
});
for await (const chunk of stream) {
process.stdout.write(chunk.choices[0]?.delta?.content || "");
}
import Anthropic from "@anthropic-ai/sdk";
const client = new Anthropic({
apiKey: process.env.FIREWORKS_API_KEY,
baseURL: "https://api.fireworks.ai/inference",
});
const stream = client.messages.stream({
model: "accounts/fireworks/models/deepseek-v3p1",
max_tokens: 1024,
messages: [{ role: "user", content: "Tell me a short story" }],
});
stream.on("text", (text) => {
process.stdout.write(text);
});
await stream.finalMessage();
curl https://api.fireworks.ai/inference/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $FIREWORKS_API_KEY" \
-d '{
"model": "accounts/fireworks/models/deepseek-v3p1",
"messages": [
{
"role": "user",
"content": "Tell me a short story"
}
],
"stream": true
}'
Function calling
Connect your models to external tools and APIs:- Python (Fireworks SDK)
- Python (OpenAI SDK)
- Python (Anthropic SDK)
- JavaScript (OpenAI SDK)
- JavaScript (Anthropic SDK)
- curl
from fireworks import Fireworks
client = Fireworks()
response = client.chat.completions.create(
model="accounts/fireworks/models/kimi-k2-instruct-0905",
messages=[
{"role": "user", "content": "What's the weather in Paris?"}
],
tools=[
{
"type": "function",
"function": {
"name": "get_weather",
"description": "Get the current weather for a location",
"parameters": {
"type": "object",
"properties": {
"location": {
"type": "string",
"description": "City name, e.g. San Francisco",
}
},
"required": ["location"],
},
},
},
],
)
print(response.choices[0].message.tool_calls)
import os
from openai import OpenAI
client = OpenAI(
api_key=os.environ.get("FIREWORKS_API_KEY"),
base_url="https://api.fireworks.ai/inference/v1",
)
response = client.chat.completions.create(
model="accounts/fireworks/models/kimi-k2-instruct-0905",
messages=[
{"role": "user", "content": "What's the weather in Paris?"}
],
tools=[
{
"type": "function",
"function": {
"name": "get_weather",
"description": "Get the current weather for a location",
"parameters": {
"type": "object",
"properties": {
"location": {
"type": "string",
"description": "City name, e.g. San Francisco",
}
},
"required": ["location"],
},
},
},
],
)
print(response.choices[0].message.tool_calls)
import os
import anthropic
client = anthropic.Anthropic(
api_key=os.environ.get("FIREWORKS_API_KEY"),
base_url="https://api.fireworks.ai/inference"
)
response = client.messages.create(
model="accounts/fireworks/models/kimi-k2-instruct-0905",
max_tokens=1024,
messages=[
{"role": "user", "content": "What's the weather in Paris?"}
],
tools=[
{
"name": "get_weather",
"description": "Get the current weather for a location",
"input_schema": {
"type": "object",
"properties": {
"location": {
"type": "string",
"description": "City name, e.g. San Francisco",
}
},
"required": ["location"],
},
},
],
)
for block in response.content:
if block.type == "tool_use":
print(f"Tool: {block.name}, Input: {block.input}")
import OpenAI from "openai";
const client = new OpenAI({
apiKey: process.env.FIREWORKS_API_KEY,
baseURL: "https://api.fireworks.ai/inference/v1",
});
const tools = [
{
type: "function",
function: {
name: "get_weather",
description: "Get the current weather for a location",
parameters: {
type: "object",
properties: {
location: {
type: "string",
description: "City name, e.g. San Francisco",
},
},
required: ["location"],
},
},
},
];
const response = await client.chat.completions.create({
model: "accounts/fireworks/models/kimi-k2-instruct-0905",
messages: [{ role: "user", content: "What's the weather in Paris?" }],
tools: tools,
});
console.log(response.choices[0].message.tool_calls);
import Anthropic from "@anthropic-ai/sdk";
const client = new Anthropic({
apiKey: process.env.FIREWORKS_API_KEY,
baseURL: "https://api.fireworks.ai/inference",
});
const response = await client.messages.create({
model: "accounts/fireworks/models/kimi-k2-instruct-0905",
max_tokens: 1024,
messages: [{ role: "user", content: "What's the weather in Paris?" }],
tools: [
{
name: "get_weather",
description: "Get the current weather for a location",
input_schema: {
type: "object",
properties: {
location: {
type: "string",
description: "City name, e.g. San Francisco",
},
},
required: ["location"],
},
},
],
});
for (const block of response.content) {
if (block.type === "tool_use") {
console.log(`Tool: ${block.name}, Input:`, block.input);
}
}
curl https://api.fireworks.ai/inference/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $FIREWORKS_API_KEY" \
-d '{
"model": "accounts/fireworks/models/kimi-k2-instruct-0905",
"messages": [
{
"role": "user",
"content": "What'\''s the weather in Paris?"
}
],
"tools": [
{
"type": "function",
"function": {
"name": "get_weather",
"description": "Get the current weather for a location",
"parameters": {
"type": "object",
"properties": {
"location": {
"type": "string",
"description": "City name, e.g. San Francisco"
}
},
"required": ["location"]
}
}
}
]
}'
Structured outputs (JSON mode)
Get reliable JSON responses that match your schema:- Python (Fireworks SDK)
- Python (OpenAI SDK)
- Python (Anthropic SDK)
- JavaScript (OpenAI SDK)
- JavaScript (Anthropic SDK)
- curl
from fireworks import Fireworks
client = Fireworks()
response = client.chat.completions.create(
model="accounts/fireworks/models/deepseek-v3p1",
messages=[
{
"role": "user",
"content": "Extract the name and age from: John is 30 years old",
}
],
response_format={
"type": "json_schema",
"json_schema": {
"name": "person",
"schema": {
"type": "object",
"properties": {
"name": { "type": "string" },
"age": { "type": "number" }
},
"required": ["name", "age"],
},
},
},
)
print(response.choices[0].message.content)
import os
from openai import OpenAI
client = OpenAI(
api_key=os.environ.get("FIREWORKS_API_KEY"),
base_url="https://api.fireworks.ai/inference/v1",
)
response = client.chat.completions.create(
model="accounts/fireworks/models/deepseek-v3p1",
messages=[
{
"role": "user",
"content": "Extract the name and age from: John is 30 years old",
}
],
response_format={
"type": "json_schema",
"json_schema": {
"name": "person",
"schema": {
"type": "object",
"properties": {"name": {"type": "string"}, "age": {"type": "number"}},
"required": ["name", "age"],
},
},
},
)
print(response.choices[0].message.content)
import os
import anthropic
client = anthropic.Anthropic(
api_key=os.environ.get("FIREWORKS_API_KEY"),
base_url="https://api.fireworks.ai/inference"
)
response = client.messages.create(
model="accounts/fireworks/models/deepseek-v3p1",
max_tokens=1024,
output_config={
"format": {
"type": "json_schema",
"schema": {
"type": "object",
"properties": {
"name": { "type": "string" },
"age": { "type": "number" }
},
"required": ["name", "age"],
},
}
},
messages=[
{
"role": "user",
"content": "Extract the name and age from: John is 30 years old",
}
],
)
print(response.content[0].text)
import OpenAI from "openai";
const client = new OpenAI({
apiKey: process.env.FIREWORKS_API_KEY,
baseURL: "https://api.fireworks.ai/inference/v1",
});
const response = await client.chat.completions.create({
model: "accounts/fireworks/models/deepseek-v3p1",
messages: [
{
role: "user",
content: "Extract the name and age from: John is 30 years old",
},
],
response_format: {
type: "json_schema",
json_schema: {
name: "person",
schema: {
type: "object",
properties: {
name: { type: "string" },
age: { type: "number" },
},
required: ["name", "age"],
},
},
},
});
console.log(response.choices[0].message.content);
import Anthropic from "@anthropic-ai/sdk";
const client = new Anthropic({
apiKey: process.env.FIREWORKS_API_KEY,
baseURL: "https://api.fireworks.ai/inference",
});
const response = await client.messages.create({
model: "accounts/fireworks/models/deepseek-v3p1",
max_tokens: 1024,
output_config: {
format: {
type: "json_schema",
schema: {
type: "object",
properties: {
name: { type: "string" },
age: { type: "number" },
},
required: ["name", "age"],
},
},
},
messages: [
{
role: "user",
content: "Extract the name and age from: John is 30 years old",
},
],
});
console.log(response.content[0].text);
curl https://api.fireworks.ai/inference/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $FIREWORKS_API_KEY" \
-d '{
"model": "accounts/fireworks/models/deepseek-v3p1",
"messages": [
{
"role": "user",
"content": "Extract the name and age from: John is 30 years old"
}
],
"response_format": {
"type": "json_schema",
"json_schema": {
"name": "person",
"schema": {
"type": "object",
"properties": {
"name": {
"type": "string"
},
"age": {
"type": "number"
}
},
"required": ["name", "age"]
}
}
}
}'
Reasoning
Some models support reasoning, where the model shows its thought process before giving the final answer:- Python (Fireworks SDK)
- Python (OpenAI SDK)
- Python (Anthropic SDK)
- JavaScript (OpenAI SDK)
- JavaScript (Anthropic SDK)
- curl
from fireworks import Fireworks
client = Fireworks()
response = client.chat.completions.create(
model="accounts/fireworks/models/deepseek-v3p2",
messages=[
{"role": "user", "content": "What is 25 * 37? Show your work."}
],
reasoning_effort="medium",
)
msg = response.choices[0].message
if msg.reasoning_content:
print("Reasoning:", msg.reasoning_content)
print("Answer:", msg.content)
import os
from openai import OpenAI
client = OpenAI(
api_key=os.environ.get("FIREWORKS_API_KEY"),
base_url="https://api.fireworks.ai/inference/v1",
)
response = client.chat.completions.create(
model="accounts/fireworks/models/deepseek-v3p2",
messages=[
{"role": "user", "content": "What is 25 * 37? Show your work."}
],
extra_body={"reasoning_effort": "medium"},
)
msg = response.choices[0].message
# Reasoning content is returned in a separate field
reasoning = getattr(msg, "reasoning_content", None)
if reasoning is None and hasattr(msg, "model_extra"):
reasoning = msg.model_extra.get("reasoning_content")
if reasoning:
print("Reasoning:", reasoning)
print("Answer:", msg.content)
The Anthropic SDK uses the
thinking parameter to enable reasoning:import os
import anthropic
client = anthropic.Anthropic(
api_key=os.environ.get("FIREWORKS_API_KEY"),
base_url="https://api.fireworks.ai/inference"
)
response = client.messages.create(
model="accounts/fireworks/models/deepseek-v3p2",
max_tokens=16000,
thinking={"type": "enabled", "budget_tokens": 4096},
messages=[
{"role": "user", "content": "What is 25 * 37? Show your work."}
],
)
for block in response.content:
if block.type == "thinking":
print("Thinking:", block.thinking)
elif block.type == "text":
print("Answer:", block.text)
import OpenAI from "openai";
const client = new OpenAI({
apiKey: process.env.FIREWORKS_API_KEY,
baseURL: "https://api.fireworks.ai/inference/v1",
});
const response = await client.chat.completions.create({
model: "accounts/fireworks/models/deepseek-v3p2",
messages: [
{ role: "user", content: "What is 25 * 37? Show your work." },
],
reasoning_effort: "medium",
});
const msg = response.choices[0].message;
if (msg.reasoning_content) {
console.log("Reasoning:", msg.reasoning_content);
}
console.log("Answer:", msg.content);
The Anthropic SDK uses the
thinking parameter to enable reasoning:import Anthropic from "@anthropic-ai/sdk";
const client = new Anthropic({
apiKey: process.env.FIREWORKS_API_KEY,
baseURL: "https://api.fireworks.ai/inference",
});
const response = await client.messages.create({
model: "accounts/fireworks/models/deepseek-v3p2",
max_tokens: 16000,
thinking: { type: "enabled", budget_tokens: 4096 },
messages: [
{ role: "user", content: "What is 25 * 37? Show your work." },
],
});
for (const block of response.content) {
if (block.type === "thinking") {
console.log("Thinking:", block.thinking);
} else if (block.type === "text") {
console.log("Answer:", block.text);
}
}
curl https://api.fireworks.ai/inference/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $FIREWORKS_API_KEY" \
-d '{
"model": "accounts/fireworks/models/deepseek-v3p2",
"messages": [
{
"role": "user",
"content": "What is 25 * 37? Show your work."
}
],
"reasoning_effort": "medium"
}'
Vision models
Analyze images with vision-language models:- Python (Fireworks SDK)
- Python (OpenAI SDK)
- Python (Anthropic SDK)
- JavaScript (OpenAI SDK)
- JavaScript (Anthropic SDK)
- curl
from fireworks import Fireworks
client = Fireworks()
response = client.chat.completions.create(
model="accounts/fireworks/models/qwen2p5-vl-32b-instruct",
messages=[
{
"role": "user",
"content": [
{"type": "text", "text": "What's in this image?"},
{
"type": "image_url",
"image_url": {
"url": "https://storage.googleapis.com/fireworks-public/image_assets/fireworks-ai-wordmark-color-dark.png"
},
},
],
}
],
)
print(response.choices[0].message.content)
import os
from openai import OpenAI
client = OpenAI(
api_key=os.environ.get("FIREWORKS_API_KEY"),
base_url="https://api.fireworks.ai/inference/v1"
)
response = client.chat.completions.create(
model="accounts/fireworks/models/qwen2p5-vl-32b-instruct",
messages=[
{
"role": "user",
"content": [
{"type": "text", "text": "What's in this image?"},
{
"type": "image_url",
"image_url": {
"url": "https://storage.googleapis.com/fireworks-public/image_assets/fireworks-ai-wordmark-color-dark.png"
},
},
],
}
],
)
print(response.choices[0].message.content)
The Anthropic SDK uses its native image format with
type: "image" and a source object:import os
import anthropic
client = anthropic.Anthropic(
api_key=os.environ.get("FIREWORKS_API_KEY"),
base_url="https://api.fireworks.ai/inference"
)
response = client.messages.create(
model="accounts/fireworks/models/qwen2p5-vl-32b-instruct",
max_tokens=1024,
messages=[
{
"role": "user",
"content": [
{"type": "text", "text": "What's in this image?"},
{
"type": "image",
"source": {
"type": "url",
"url": "https://storage.googleapis.com/fireworks-public/image_assets/fireworks-ai-wordmark-color-dark.png",
},
},
],
}
],
)
for block in response.content:
if block.type == "text":
print(block.text)
import OpenAI from "openai";
const client = new OpenAI({
apiKey: process.env.FIREWORKS_API_KEY,
baseURL: "https://api.fireworks.ai/inference/v1",
});
const response = await client.chat.completions.create({
model: "accounts/fireworks/models/qwen2p5-vl-32b-instruct",
messages: [
{
role: "user",
content: [
{ type: "text", text: "What's in this image?" },
{
type: "image_url",
image_url: {
url: "https://storage.googleapis.com/fireworks-public/image_assets/fireworks-ai-wordmark-color-dark.png",
},
},
],
},
],
});
console.log(response.choices[0].message.content);
import Anthropic from "@anthropic-ai/sdk";
const client = new Anthropic({
apiKey: process.env.FIREWORKS_API_KEY,
baseURL: "https://api.fireworks.ai/inference",
});
const response = await client.messages.create({
model: "accounts/fireworks/models/qwen2p5-vl-32b-instruct",
max_tokens: 1024,
messages: [
{
role: "user",
content: [
{ type: "text", text: "What's in this image?" },
{
type: "image",
source: {
type: "url",
url: "https://storage.googleapis.com/fireworks-public/image_assets/fireworks-ai-wordmark-color-dark.png",
},
},
],
},
],
});
for (const block of response.content) {
if (block.type === "text") {
console.log(block.text);
}
}
curl https://api.fireworks.ai/inference/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $FIREWORKS_API_KEY" \
-d '{
"model": "accounts/fireworks/models/qwen2p5-vl-32b-instruct",
"messages": [
{
"role": "user",
"content": [
{
"type": "text",
"text": "What'\''s in this image?"
},
{
"type": "image_url",
"image_url": {
"url": "https://storage.googleapis.com/fireworks-public/image_assets/fireworks-ai-wordmark-color-dark.png"
}
}
]
}
]
}'
Serverless model lifecycle
Serverless models are managed by the Fireworks team and may be updated or deprecated as new models are released. We provide at least 2 weeks advance notice before removing any model, with longer notice periods for popular models based on usage. For production workloads requiring long-term model stability, we recommend using on-demand deployments, which give you full control over model versions and updates.Make sure to add a payment method to access higher rate limits up to 6,000 RPM. Without a payment method, you’re limited to 10 RPM.
The 6,000 RPM figure is the maximum ceiling enforced by our spike arrest policy. Your actual limit scales dynamically with sustained usage, so short-lived spikes may be throttled below that cap. For predictable throughput needs, consider on-demand deployments or requesting a rate review.
Next steps
Ready to scale to production, explore other modalities, or customize your models?Deploy and autoscale on Dedicated GPUs
Deploy with high performance on dedicated GPUs with fast autoscaling and minimal cold starts
Fine-tune Models
Improve model quality with supervised and reinforcement learning
Speech to Text
Real-time or batch audio transcription
Embeddings & Reranking
Use embeddings & reranking in search & context retrieval
Batch Inference
Run async inference jobs at scale, faster and cheaper
Browse 100+ Models
Explore all available models across modalities
API Reference
Complete API documentation