Skip to main content

InfraConfig

GPU, region, and training shape settings. Wraps TrainerJobConfig fields:
from training.utils import InfraConfig

infra = InfraConfig(
    training_shape_id="accounts/fireworks/trainingShapes/qwen3-8b-128k-h200",
    ref_training_shape_id="accounts/fireworks/trainingShapes/qwen3-8b-128k-h200-forward",
)
Use training_shape_id for every launched trainer in the cookbook. In normal usage, this is the only shape-specific value you set. In most cases, pass the full shared path accounts/fireworks/trainingShapes/<shape>. The fireworks account is the public shared shape catalog. Add ref_training_shape_id when the recipe also launches a reference trainer.
FieldTypeDefaultDescription
training_shape_idstr | NoneNoneRequired full training-shape ID for the policy trainer, typically accounts/fireworks/trainingShapes/<shape>. The cookbook resolves the versioned reference for you and auto-populates shape-owned infra.
ref_training_shape_idstr | NoneNoneOptional full training-shape ID for the reference trainer, also typically under accounts/fireworks/trainingShapes/<shape>. When unset, rl_loop skips reference-model provisioning.
regionstr | NoneNoneRegion override
trainer_timeout_sfloat3600Timeout for trainer provisioning / readiness waits
extra_argslist[str] | NoneNoneExtra trainer arguments
accelerator_type, accelerator_count, node_count, and custom_image_tag are internal development fields automatically configured by the training shape. They are not user-configurable.

DeployConfig

Deployment settings for sampling and weight sync. Wraps DeploymentConfig fields:
from training.utils import DeployConfig

deploy_cfg = DeployConfig(
    deployment_id="grpo-serving",
    tokenizer_model="Qwen/Qwen3-8B",
)
When deployment_shape is set (the recommended path), the shape owns deployment hardware and serving configuration.
FieldTypeDefaultDescription
deployment_idstr | NoneNoneDeployment identifier. If unset, the cookbook auto-derives one from the base model name.
tokenizer_modelstr | NoneNoneHuggingFace model name for client-side tokenization. Required for RL sampling.
deployment_shapestr | NoneNoneDeployment shape resource name. When set, the shape owns GPU type and serving config.
deployment_regionstr | NoneNoneRegion override for the deployment
hot_load_bucket_typestr"FW_HOSTED"Weight-sync storage backend
deployment_timeout_sfloat5400Timeout for deployment provisioning / readiness waits
deployment_extra_argslist[str] | NoneNoneExtra serving arguments
sample_timeoutint600HTTP read timeout for sampling completions
disable_speculative_decodingboolTrueDisable speculative decoding for hotload compatibility
extra_valuesdict[str, str] | NoneNoneExtra deployment Helm values
replica_countint | NoneNoneIf set, pin the deployment to a fixed replica count (sets both min and max).
deployment_accelerator_type is an internal development field automatically configured by the deployment shape. It is not user-configurable.

WeightSyncConfig

Checkpoint and weight-sync intervals:
from training.utils import WeightSyncConfig

weight_sync = WeightSyncConfig(
    weight_sync_interval=1,
    dcp_save_interval=10,
)
dcp_save_interval defaults to 0 (off). Without setting it to a positive value, no DCP checkpoints are saved and training cannot be resumed. If you need checkpoint-based resume, explicitly set dcp_save_interval (e.g. dcp_save_interval=50).
FieldTypeDefaultDescription
dcp_save_intervalint0Save DCP checkpoints for resume every N steps. 0 disables DCP saves. Set to a positive value to enable resume.
weight_sync_intervalint1Save + sync sampler weights every N optimizer steps. 0 disables weight sync.
dcp_timeoutint2700Timeout for DCP save/load operations
first_checkpoint_typestr"base"First sampler checkpoint type passed to WeightSyncer
weight_sync_before_trainingboolFalseSave a base checkpoint and hotload it before the first training step
weight_sync_timeoutint600Timeout for hotload_and_wait

WandBConfig

Weights & Biases logging settings:
from training.utils import WandBConfig

wandb = WandBConfig(
    entity="my-team",
    project="grpo-experiment",
    run_name="qwen3-8b-v1",
)
FieldTypeDefaultDescription
entitystr | NoneNoneW&B team or user name
projectstr | NoneNoneW&B project name
run_namestr | NoneNoneRun name (auto-generated if omitted)

ReconnectableClient

Blocking convenience wrapper around FiretitanTrainingClient. All cookbook recipes use this as their training client — it dispatches each call and blocks until the result is ready or the timeout expires. Failures propagate to the caller so the training loop can crash cleanly and resume from the last DCP checkpoint.
from training.utils import ReconnectableClient

client = ReconnectableClient(
    rlor_mgr=rlor_mgr,
    job_id=endpoint.job_id,
    base_model="accounts/fireworks/models/qwen3-8b",
    lora_rank=0,
    fw_api_key=api_key,
)

result = client.forward_backward_custom(datums, loss_fn)
client.optim_step(tinker.AdamParams(...))
ParameterTypeDefaultDescription
rlor_mgrTrainerJobManagerManager used to connect to the trainer
job_idstrRLOR trainer job ID
base_modelstrBase model name
lora_rankint0LoRA rank (0 for full-parameter)
fw_api_keystr | NoneNoneFireworks API key (falls back to FIREWORKS_API_KEY env var)
default_timeoutint600Timeout in seconds for forward/backward/optim calls
endpointTrainerServiceEndpoint | NoneNonePre-resolved endpoint (skips wait_for_existing on init)
Properties:
PropertyTypeDescription
innerFiretitanTrainingClientThe underlying SDK client (for advanced use)
endpointTrainerServiceEndpointThe resolved trainer endpoint (base_url, job_id, job_name)
job_idstrThe trainer job ID
Methods:
MethodDescription
forward(data, loss_fn)Forward pass, blocks until complete
forward_backward(data, loss_fn, loss_fn_config)Forward + backward pass
forward_backward_custom(data, loss_fn)Forward + backward with custom loss function
optim_step(params, grad_accumulation_normalization)Optimizer step
save_state(name, timeout)Save DCP checkpoint (default timeout: 2700s)
load_state_with_optimizer(path, timeout)Load DCP checkpoint (default timeout: 2700s)
save_weights_for_sampler_ext(name, checkpoint_type, timeout)Save sampler checkpoint for promotion
resolve_checkpoint_path(name, source_job_id)Resolve cross-job checkpoint path
list_checkpoints()List available DCP checkpoints

Checkpoint utilities

For checkpointing, resume, and promote — see the dedicated Checkpoints and Resume page.

Gradient accumulation normalization

Recipe configs expose grad_accumulation_normalization, which is passed to optim_step(...):
client.optim_step(adam_params, grad_accumulation_normalization="num_loss_tokens")
See Loss Functions for how to choose the mode and avoid double-normalization.

Recipe defaults

RecipeDefaultRationale
SFTNoneThe SFT loss is already normalized client-side.
GRPO / RL"num_loss_tokens"RL losses use server-side per-token normalization by default.
DPONoneThe DPO loss is already normalized client-side.
ORPONoneThe ORPO loss is already normalized client-side.
The cookbook reference documents the config surface and defaults. The conceptual guidance for loss reduction vs. server-side normalization now lives in Loss Functions.