2025-11-24

Evaluator Improvements, Kimi K2 Thinking on Serverless, and New API Endpoints

Improved Evaluator Creation Experience

The evaluator creation workflow has been significantly enhanced with GitHub template integration. You can now:

Fork evaluator templates directly from GitHub repositories
Browse and preview templates before using them
Create evaluators with a streamlined save dialog
View evaluators in a new sortable and paginated table

MLOps & Observability Integrations

New documentation for integrating Fireworks with MLOps and observability tools:

Weights & Biases (W&B) integration for experiment tracking during fine-tuning
MLflow integration for model management and experiment logging

✨ New Models

Kimi K2 Thinking is now available in the Model Library
KAT Dev 32B is now available in the Model Library
KAT Dev 72B Exp is now available in the Model Library

☁️ Serverless

Kimi K2 Thinking is now available on serverless

📚 New REST API Endpoints

New REST API endpoints are now available for managing Reinforcement Fine-Tuning Steps and deployments:

Bug Fixes & Minor Improvements

Deployment Region Selector: Added GPU accelerator hints to the region selector, with Global set as default for optimal availability (Web App)
Preference Fine-Tuning (DPO): Added to the Fine-Tuning page for training models with human preference data (Web App)
Redeem Credits: Credit code redemption is now available to all users from the Billing page (Web App)
Model Library Search: Improved fuzzy search with hybrid matching for better model discovery (Web App)
Cogito Models: Added Cogito namespace to the Model Library for easier discovery (Web App)
Custom Model Editing: You can now edit display name and description inline on custom model detail pages (Web App)
Loss Curve Charts: Fixed an issue where loss curves were not updating in real-time during fine-tuning jobs (Web App)
Deployment Shapes: Fixed deployment shape selection for fine-tuned models (PEFT and live-merge) (Web App)
Usage Charts: Fixed replica calculation in multi-series usage charts (Web App)
Session Management: Removed auto-logout on inactivity for improved user experience (Web App)
Onboarding: Updated onboarding survey with improved profile and questionnaire flow (Web App)
Fine-Tuning Form: Max context length now defaults to and is capped by the selected base model’s context length (Web App)
Secrets for Evaluators: Added documentation for using secrets in evaluators to securely call external services (Docs)
Region Selection: Deprecated regions are now filtered from deployment options (Web App)
Playground: Embedding and reranker models are now filtered from playground model selection (Web App)
LoRA Rank: Updated valid LoRA rank range to 4-32 in documentation (Docs)
SFT Documentation: Added documentation for batch size, learning rate warmup, and gradient accumulation settings (Docs)
Direct Routing: Added OpenAI SDK code examples for direct routing (Docs)
Recommended Models: Updated model recommendations with migration guidance from Claude, GPT, and Gemini (Docs)

2025-11-12

☀️ Sunsetting Build SDK

The Build SDK is being deprecated in favor of a new Python SDK generated directly from our REST API. The new SDK is more up-to-date, flexible, and continuously synchronized with our REST API. Please note that the last version of the Build SDK will be 0.19.20, and the new SDK will start at 1.0.0. Python package managers will not automatically update to the new SDK, so you will need to manually update your dependencies and refactor your code.Existing codebases using the Build SDK will continue to function as before and will not be affected unless you choose to upgrade to the new SDK version.The new SDK replaces the Build SDK’s LLM and Dataset classes with REST API-aligned methods. If you upgrade to version 1.0.0 or later, you will need to migrate your code.

🚀 Improved RFT Experience

We’ve drastically improved the RFT experience with better reliability, developer-friendly SDK for hooking up your existing agents, support for multi-turn training, better observability in our Web App, and better overall developer experience.See Reinforcement Fine-Tuning for more details.

2025-08-22

Supervised Fine-Tuning

We now support supervised fine tuning with separate thinking traces for reasoning models (e.g. DeepSeek R1, GPT OSS, Qwen3 Thinking etc) that ensures training-inference consistency. An example including thinking traces would look like:

  {
    "messages": [
      {"role": "system", "content": "You are a helpful assistant."},
      {"role": "user", "content": "What is the capital of France?"}, 
      {"role": "assistant", "content": "Paris.", "reasoning_content": "The user is asking about the capital city of France, it should be Paris."}
    ]
  }
  {
    "messages": [
      {"role": "user", "content": "What is 1+1?"},
      {"role": "assistant", "content": "2", "weight": 0, "reasoning_content": "The user is asking about the result of 1+1, the answer is 2."},
      {"role": "user", "content": "Now what is 2+2?"},
      {"role": "assistant", "content": "4", "reasoning_content": "The user is asking about the result of 2+2, the answer should be 4."}
    ]
  }

We are also properly supporting multi-turn fine tuning (with or without thinking traces) for GPT OSS model family that ensures training-inference consistency.

2025-08-10

Supervised Fine-Tuning

We now support Qwen3 MoE model (Qwen3 dense models are already supported) and GPT OSS models for supervised fine-tuning. GPT OSS model fine tunning support is single-turn without thinking traces at the moment.

2025-07-29

🎨 Vision-Language Model Fine-Tuning

You can now fine-tune Vision-Language Models (VLMs) on Fireworks AI using the Qwen 2.5 VL model family. This extends our Supervised Fine-tuning V2 platform to support multimodal training with both images and text data.Supported models:

Qwen 2.5 VL 3B Instruct
Qwen 2.5 VL 7B Instruct
Qwen 2.5 VL 32B Instruct
Qwen 2.5 VL 72B Instruct

Features:

Fine-tune on datasets containing both images and text in JSONL format with base64-encoded images
Support for up to 64K context length during training
Built on the same Supervised Fine-tuning V2 infrastructure as text models

See the VLM fine-tuning documentation for setup instructions and dataset formatting requirements.

🔧 Build SDK: Deployment Configuration Application Requirement

The Build SDK now requires you to call .apply() to apply any deployment configurations to Fireworks when using deployment_type="on-demand" or deployment_type="on-demand-lora". This change ensures explicit control over when deployments are created and helps prevent accidental deployment creation.Key changes:

.apply() is now required for on-demand and on-demand-lora deployments
Serverless deployments do not require .apply() calls
If you do not call .apply(), you are expected to set up the deployment through the deployment page at https://app.fireworks.ai/dashboard/deployments

Migration guide:

Add llm.apply() after creating LLM instances with deployment_type="on-demand" or deployment_type="on-demand-lora"
No changes needed for serverless deployments
See updated documentation for examples and best practices

This change improves deployment management and provides better control over resource creation.

This applies to Python SDK version >=0.19.14.

2025-07-23

🚀 Bring Your Own Rollout and Reward Development for Reinforcement Learning

You can now develop your own custom rollout and reward functionality while using Fireworks to manage the training and deployment of your models. This gives you full control over your reinforcement learning workflows while leveraging Fireworks’ infrastructure for model training and deployment.See the new LLM.reinforcement_step() method and ReinforcementStep class for usage examples and details.

2025-07-16

Supervised Fine-Tuning V2

We now support Llama 4 MoE model supervised fine-tuning (Llama 4 Scout, Llama 4 Maverick, Text only).

2025-07-10

🏗️ Build SDK `LLM` Deployment Logic Refactor

Based on early feedback from users and internal testing, we’ve refactored the LLM class deployment logic in the Build SDK to make it easier to understand.Key changes:

The id parameter is now required when deployment_type is "on-demand"
The base_id parameter is now required when deployment_type is "on-demand-lora"
The deployment_display_name parameter is now optional and defaults to the filename where the LLM was instantiated

A new deployment will be created if a deployment with the same id does not exist. Otherwise, the existing deployment will be reused.

2025-07-02

🚀 Support for Responses API in Python SDK

You can now use the Responses API in the Python SDK. This is useful if you want to use the Responses API in your own applications.See the Responses API guide for usage examples and details.

2025-07-01

Support for LinkedIn authentication

You can now log in to Fireworks using your LinkedIn account. This is useful if you already have a LinkedIn account and want to use it to log in to Fireworks.To log in with LinkedIn, go to the Fireworks login page and click the “Continue with LinkedIn” button.You can also log in with LinkedIn from the CLI using the firectl login command.How it works:

Fireworks uses your LinkedIn primary email address for account identification
You can switch between different Fireworks accounts by changing your LinkedIn primary email
See our LinkedIn authentication FAQ for detailed instructions on managing email addresses

2025-06-30

Support for GitHub authentication

You can now log in to Fireworks using your GitHub account. This is useful if you already have a GitHub account and want to use it to log in to Fireworks.To log in with GitHub, go to the Fireworks login page and click the “Continue with GitHub” button.You can also log in with GitHub from the CLI using the firectl login command.

🚨 Document Inlining Deprecation

Document Inlining has been deprecated and is no longer available on the Fireworks platform. This feature allowed LLMs to process images and PDFs through the chat completions API by appending #transform=inline to document URLs.Migration recommendations:

For image processing: Use Vision Language Models (VLMs) like Qwen2.5-VL 32B Instruct
For PDF processing: Use dedicated PDF processing libraries combined with text-based LLMs
For structured extraction: Leverage our structured responses capabilities

For assistance with migration, please contact our support team or visit our Discord community.

2025-06-24

🎯 Build SDK: Reward-kit integration for evaluator development

The Build SDK now natively integrates with reward-kit to simplify evaluator development for Reinforcement Fine-Tuning (RFT). You can now create custom evaluators in Python with automatic dependency management and seamless deployment to Fireworks infrastructure.Key features:

Native reward-kit integration for evaluator development
Automatic packaging of dependencies from pyproject.toml or requirements.txt
Local testing capabilities before deployment
Direct integration with Fireworks datasets and evaluation jobs
Support for third-party libraries and complex evaluation logic

See our Developing Evaluators guide to get started with your first evaluator in minutes.

Added new Responses API for advanced conversational workflows and integrations

Continue conversations across multiple turns using the previous_response_id parameter to maintain context without resending full history
Stream responses in real time as they are generated for responsive applications
Control response storage with the store parameter—choose whether responses are retrievable by ID or ephemeral

See the Response API guide for usage examples and details.

2025-06-13

Supervised Fine-Tuning V2

Supervised Fine-Tuning V2 released.Key features:

Supports Qwen 2/2.5/3 series, Phi 4, Gemma 3, the Llama 3 family, Deepseek V2, V3, R1
Longer context window up to full context length of the supported models
Multi-turn function calling fine-tuning
Quantization aware training

More details in the blogpost.

Reinforcement Fine-Tuning (RFT)

Reinforcement Fine-Tuning released. Train expert models to surpass closed source frontier models through verifiable reward. More details in blospost.

2025-05-20

Diarization and batch processing support added to audio inference

See our blog post for details.

2025-05-19

🚀 Easier & faster LoRA fine-tune deployments on Fireworks

You can now deploy a LoRA fine-tune with a single command and get speeds that approximately match the base model:

firectl create deployment "accounts/fireworks/models/<MODEL_ID of lora model>"

Previously, this involved two distinct steps, and the resulting deployment was slower than the base model:

Create a deployment using firectl create deployment "accounts/fireworks/models/<MODEL_ID of base model>" --enable-addons
Then deploy the addon to the deployment: firectl load-lora <MODEL_ID> --deployment <DEPLOYMENT_ID>

For more information, see our deployment documentation.

This change is for dedicated deployments with a single LoRA. You can still deploy multiple LoRAs on a deployment or deploy LoRA(s) on some Serverless models as described in the documentation.

​Evaluator Improvements, Kimi K2 Thinking on Serverless, and New API Endpoints

​Improved Evaluator Creation Experience

​MLOps & Observability Integrations

​✨ New Models

​☁️ Serverless

​📚 New REST API Endpoints

​☀️ Sunsetting Build SDK

​🚀 Improved RFT Experience

​Supervised Fine-Tuning

​Supervised Fine-Tuning

​🎨 Vision-Language Model Fine-Tuning

​🔧 Build SDK: Deployment Configuration Application Requirement

​🚀 Bring Your Own Rollout and Reward Development for Reinforcement Learning

​Supervised Fine-Tuning V2

​🏗️ Build SDK LLM Deployment Logic Refactor

​🚀 Support for Responses API in Python SDK

​Support for LinkedIn authentication

​Support for GitHub authentication

​🚨 Document Inlining Deprecation

​🎯 Build SDK: Reward-kit integration for evaluator development

​Added new Responses API for advanced conversational workflows and integrations

​Supervised Fine-Tuning V2

​Reinforcement Fine-Tuning (RFT)

​Diarization and batch processing support added to audio inference

​🚀 Easier & faster LoRA fine-tune deployments on Fireworks

Evaluator Improvements, Kimi K2 Thinking on Serverless, and New API Endpoints

Improved Evaluator Creation Experience

MLOps & Observability Integrations

✨ New Models

☁️ Serverless

📚 New REST API Endpoints

☀️ Sunsetting Build SDK

🚀 Improved RFT Experience

Supervised Fine-Tuning

Supervised Fine-Tuning

🎨 Vision-Language Model Fine-Tuning

🔧 Build SDK: Deployment Configuration Application Requirement

🚀 Bring Your Own Rollout and Reward Development for Reinforcement Learning

Supervised Fine-Tuning V2

🏗️ Build SDK `LLM` Deployment Logic Refactor

🚀 Support for Responses API in Python SDK

Support for LinkedIn authentication

Support for GitHub authentication

🚨 Document Inlining Deprecation

🎯 Build SDK: Reward-kit integration for evaluator development

Added new Responses API for advanced conversational workflows and integrations

Supervised Fine-Tuning V2

Reinforcement Fine-Tuning (RFT)

Diarization and batch processing support added to audio inference

🚀 Easier & faster LoRA fine-tune deployments on Fireworks