> ## Documentation Index
> Fetch the complete documentation index at: https://docs.fireworks.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# Developing Evaluators

The Build SDK natively integrates
[reward-kit](https://github.com/fw-ai-external/reward-kit) to make it easy to
develop Evaluators for [RFT](/fine-tuning/reinforcement-fine-tuning-models) in Python.

<Note>
  The reward-kit functionality is available as an optional dependency. You'll need to install `fireworks-ai[reward-kit]` to use the evaluator features described in this guide.
</Note>

## Prerequisites

You can install the Fireworks Build SDK using pip. For developing evaluators, you'll need to install the SDK with the reward-kit optional dependency:

```bash theme={null}
pip install --upgrade fireworks-ai[reward-kit]
```

Make sure to set the `FIREWORKS_API_KEY` environment variable to your Fireworks API key:

```bash theme={null}
export FIREWORKS_API_KEY=<API_KEY>
```

You can create an API key in the [Fireworks AI web UI](https://app.fireworks.ai/settings/users/api-keys) or by installing the [firectl](/tools-sdks/firectl/firectl) CLI tool and running:

```bash theme={null}
firectl signin
firectl api-key create --key-name <Your-Key-Name>
```

## Your first evaluator

For this tutorial, we'll create a new project using `uv`.

```shell theme={null}
$ uv init
Initialized project `my-project`
$ uv add fireworks-ai[reward-kit]
```

You should now have a project with a `pyproject.toml` file and a `uv.lock` file.

```shell theme={null}
% tree
.
├── main.py
├── pyproject.toml
├── README.md
└── uv.lock

1 directory, 4 files
```

To create your first evaluator, create a new file at `my_first_evaluator/main.py`:

<Warning>
  Evaluators must be in their own directory because the Build SDK automatically
  recursively packages all sibling and child files from the directory containing
  the imported reward function.
</Warning>

```shell theme={null}
mkdir -p my_first_evaluator
touch my_first_evaluator/main.py
```

Add the following code to `my_first_evaluator/main.py`:

```python my_first_evaluator/main.py theme={null}
from fireworks import reward_function


@reward_function(id="my-first-evaluator")
def evaluate(messages, **kwargs):
    """
    This is a simple reward function that returns a score of 1.0 if the message contains the word "fireworks" and 0.0 otherwise.
    """
    # Extract the content from the messages structure
    content = messages[0]["content"]
    score = 1.0 if "fireworks" in content else 0.0

    return {"score": score}
```

To test your evaluator locally, you can simply call the function itself. Replace the contents of `main.py` with the following code:

```python main.py theme={null}
from my_first_evaluator.main import evaluate

print(evaluate(messages=[{"role": "user", "content": "Hello, world!"}]))
print(evaluate(messages=[{"role": "user", "content": "Hello, fireworks!"}]))
```

Let's run the script and see what happens:

```shell theme={null}
% uv run python main.py
score=0.0 is_score_valid=True reason=None metrics={} step_outputs=None error=None
score=1.0 is_score_valid=True reason=None metrics={} step_outputs=None error=None
```

You should see that the first message returns a score of 0.0 and the second message returns a score of 1.0, showing that our evaluator is working as expected.

### Evaluating on a dataset

Now that we've created and tested our first evaluator, we can use it to evaluate
on Fireworks infrastructure using a dataset uploaded on Fireworks.

To do this, we'll create a
[Dataset](/tools-sdks/python-client/sdk-reference#dataset) object and call
`create_evaluation_job`. Create a new file called `run_first_evaluator.py` at
the root of your project and add the following code:

```python run_first_evaluator.py theme={null}
from my_first_evaluator.main import evaluate
from fireworks import Dataset
import random

dataset = Dataset.from_list(
    data=[
        {"messages": [{"role": "user", "content": "Hello, fireworks!" if random.random() < 0.5 else "Hello, world!"}]}
        for _ in range(100)
    ]
)

job = dataset.create_evaluation_job(evaluate)
print(job.url)
job.wait_for_completion()
print(job.output_dataset.url)
```

Let's run the script and see what happens:

```shell theme={null}
% uv run python run_first_evaluator.py
https://app.fireworks.ai/dashboard/evaluation-jobs/wqfvyv90yzfv9q95
```

When the script first runs, you should see a URL for the evaluation job. You can
go to the URL to see the evaluation job in the Fireworks AI web UI.

<Frame caption="Running evaluation job in the UI">
  <img src="https://mintcdn.com/fireworksai/XAK4ji8XrlzPoITj/tools-sdks/python-client/assets/my-first-evaluator-running.png?fit=max&auto=format&n=XAK4ji8XrlzPoITj&q=85&s=75f9ae915e93da04c6721644b798c3fd" alt="Running evaluation job" width="3000" height="1712" data-path="tools-sdks/python-client/assets/my-first-evaluator-running.png" />
</Frame>

After some time, the evaluation job will be completed and you should see a URL
for the output dataset. You can go to the URL to see the results in the
Fireworks AI web UI.

<Frame caption="Completed evaluation job in the UI">
  <img src="https://mintcdn.com/fireworksai/XAK4ji8XrlzPoITj/tools-sdks/python-client/assets/my-first-evaluator-completed.png?fit=max&auto=format&n=XAK4ji8XrlzPoITj&q=85&s=c18e9c813708f3e6058229f099e675e8" alt="Completed evaluation job" width="3000" height="1712" data-path="tools-sdks/python-client/assets/my-first-evaluator-completed.png" />
</Frame>

After the job is completed, the script will also print the URL for the output
dataset.

```shell focus={3} theme={null}
% uv run python run_first_evaluator.py 
https://app.fireworks.ai/dashboard/evaluation-jobs/wqfvyv90yzfv9q95
https://app.fireworks.ai/dashboard/datasets/2025-06-24-18-16-54-101727
```

You can go to the URL to see the output dataset in the Fireworks AI web UI.

<Frame caption="Results in the UI">
  <img src="https://mintcdn.com/fireworksai/XAK4ji8XrlzPoITj/tools-sdks/python-client/assets/my-first-evaluator-results.png?fit=max&auto=format&n=XAK4ji8XrlzPoITj&q=85&s=335bc6a6fe86e5595def194a9d59a986" alt="Result dataset" width="3000" height="1712" data-path="tools-sdks/python-client/assets/my-first-evaluator-results.png" />
</Frame>

## Creating your second evaluator

Let's create a more complex evaluator that imports a third-party library to calculate the score. Let's add the `textblob` library to our project:

```shell theme={null}
% uv add textblob
```

The Build SDK will automatically pick up dependencies found from `pyproject.toml`
or `requirements.txt` files in your project. Alternatively you can specify a
list of strings as you would in a `requirements.txt` file directly in the
`@reward_function` decorator itself.

Now, let's create a new evaluator under `my_second_evaluator/main.py`:

```shell theme={null}
% mkdir -p my_second_evaluator
% touch my_second_evaluator/main.py
```

Copy-paste the following code into `my_second_evaluator/main.py`:

```python my_second_evaluator/main.py theme={null}
from fireworks import reward_function
from textblob import TextBlob


@reward_function(id="my-second-evaluator")
def evaluate(messages, **kwargs):
    """
    This is a reward function that demonstrates the use of third-party dependencies.
    It returns a normalized score between 0 and 1.0 based on the sentiment polarity of the message.
    """
    # Extract the content from the messages structure
    content = messages[0]["content"]

    # Use the third-party dependency (TextBlob) for sentiment analysis
    blob = TextBlob(content)
    sentiment_score = blob.sentiment.polarity  # type: ignore

    # Normalize sentiment score from [-1, 1] to [0, 1]
    # sentiment_score ranges from -1 to 1, so we add 1 to get [0, 2], then divide by 2 to get [0, 1]
    normalized_score = (sentiment_score + 1) / 2

    # Ensure the score is clamped between 0 and 1
    normalized_score = max(0.0, min(1.0, normalized_score))

    # Return the format expected by the framework
    return {"score": normalized_score}

```

Download the
[random\_phrases.jsonl](https://storage.googleapis.com/fireworks-public/tutorial/random_phrases.jsonl)
file and save it to the root of your project. The `random_phrases.jsonl` file
should be at the root of your project like this:

```shell highlight={9} theme={null}
% tree -I "__pycache__"
.
├── main.py
├── my_first_evaluator
│   └── main.py
├── my_second_evaluator
│   └── main.py
├── pyproject.toml
├── random_phrases.jsonl
├── README.md
├── run_first_evaluator.py
└── uv.lock
```

Create a new file called `run_second_evaluator.py` and add the following code:

```python run_second_evaluator.py theme={null}
from fireworks import Dataset
from my_second_evaluator.main import evaluate

dataset = Dataset.from_file("random_phrases.jsonl")
job = dataset.create_evaluation_job(evaluate)
print(job.url)
job.wait_for_completion()
print(job.output_dataset.url)
```

Once the script is done running, you can click on the URL for the evaluation job and see the results in the Fireworks AI web UI.

<Frame caption="Results of the second evaluator in the UI">
  <img src="https://mintcdn.com/fireworksai/XAK4ji8XrlzPoITj/tools-sdks/python-client/assets/my-second-evaluator-completed.png?fit=max&auto=format&n=XAK4ji8XrlzPoITj&q=85&s=b07ae13c28464819135519678bf12545" alt="Result dataset" width="3000" height="1712" data-path="tools-sdks/python-client/assets/my-second-evaluator-completed.png" />
</Frame>

🎉 Congratulations! You've now created and evaluated your first two evaluators.
If you have any questions, please reach out to us on
[Discord](https://discord.gg/fireworks-ai).
