Two ways to fine-tune
Fireworks offers two fundamentally different approaches to fine-tuning. Choose the one that fits your needs:Managed Fine-Tuning
Give Fireworks your data and configuration. The platform handles scheduling, training, checkpointing, and model output. No custom code required.Best for teams that want fast results with standard objectives (SFT, DPO, RFT).
Training SDK (Tinker compatible)
Write custom Python training loops. You control the loss function, optimizer step, checkpointing, and weight sync. Fireworks handles the distributed GPU infrastructure.Best for research teams needing custom objectives, full-parameter tuning, or inference-in-the-loop evaluation.
| Managed Fine-Tuning | Training SDK | |
|---|---|---|
| Control | Configuration-driven | Full Python loop control |
| Objectives | Built-in SFT, DPO, RFT | Any custom loss function |
| Tuning method | LoRA | Full-parameter or LoRA |
| Inference during training | Not available | Hotload + sample mid-training |
| Interface | UI, firectl, REST API | Python SDK |
| Best for | Production fine-tuning with standard methods | Research, custom RL, hybrid losses |
When to use SFT vs. RFT
In supervised fine-tuning, you provide a dataset with labeled examples of “good” outputs. In reinforcement fine-tuning, you provide a grader function that can be used to score the model’s outputs. The model is iteratively trained to produce outputs that maximize this score. Supervised fine-tuning (SFT) works well for many common scenarios, especially when:- You have a sizable dataset (~1000+ examples) with high-quality, ground-truth labels.
- The dataset covers most possible input scenarios.
- Tasks are relatively straightforward, such as:
- Classification
- Content extraction
- Your dataset is small.
- You lack ground-truth outputs (a.k.a. “golden generations”).
- The task requires multi-step reasoning.
When to use the Training SDK instead
Move from managed fine-tuning to the Training SDK when you need:- Custom loss functions — hybrid GRPO + DPO, custom reward shaping, or any non-standard objective
- Full-parameter tuning — update all model weights instead of a LoRA adapter
- Inference-in-the-loop evaluation — hotload checkpoints onto a serving deployment and sample mid-training
- Per-step control — custom gradient accumulation, dynamic learning rate schedules, or algorithm research
Detailed capability comparison
| Capability | Managed RFT | Training SDK |
|---|---|---|
| Launch training | CLI or UI | Python script |
| Loss functions | grpo, dapo, gspo-token (built-in) | Any custom loss via forward_backward_custom |
| Training loop | Fully managed | You write the loop |
| Per-step diagnostics | Dashboard (reward, loss, rollouts) | Full Python access to all metrics |
| Zero-variance filtering | Automatic | You implement |
| Checkpoint management | Automatic | You control via save_weights_for_sampler_ext |
Migrating from managed flow to SDK
If you’ve been using managed RFT and want more control — custom loss functions, richer diagnostics, or algorithm experimentation — the Training SDK lets you implement your own training loop while keeping the same GPU infrastructure.A streamlined migration path from managed RFT jobs to Training SDK scripts is on the roadmap. In the meantime, you can replicate a managed RFT configuration in the SDK by matching the loss method, learning rate, and rollout parameters in your custom loop.
MoE models and Routing Replay
For Mixture-of-Experts (MoE) models like Kimi K2 (384 experts), training stability benefits from Routing Replay — caching the expert routing assignments from the reference policy’s forward pass and replaying them during the training forward pass. This ensures that the same experts process the same tokens in both the reference and policy models, reducing gradient noise from routing changes. Routing Replay is available in the Training SDK via theloss_fn_inputs mechanism — you can pass routing matrices from the reference forward pass into the training datum.
Routing Replay support in the managed RFT flow has not been confirmed. If you are training MoE models and need routing stability guarantees, use the Training SDK where you have full control over the forward pass inputs.