> ## Documentation Index
> Fetch the complete documentation index at: https://docs.fireworks.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# Performance optimization

> Guidelines for optimizing performance and benchmarking Fireworks.ai deployments.

## Performance improvement

**Q: What are the techniques to improve performance?**

To optimize model performance, consider the following techniques:

1. **Quantization**
2. **Check model type**: Determine whether the model is **GQA** (Grouped Query Attention) or **MQA** (Multi-Query Attention).
3. **Increase batch size** to improve throughput.

***

## Benchmarking

**Q: How can we benchmark?**

There are multiple ways to benchmark your deployment’s performance:

* Use our [open-source load-testing tool](https://github.com/fw-ai/benchmark)
* Develop custom performance testing scripts
* Integrate with monitoring tools to track metrics

***

## Model latency

**Q: What’s the latency for small, medium, and large LLM models?**

Model latency and performance depend on various factors:

* **Input/output prompt lengths**
* **Model quantization**
* **Model sharding**
* **Disaggregated prefill processes**
* **Hardware configuration**
* **Multiple layers of caching**
* **Fire optimizations**
* **LoRA adapters** (Low-Rank Adaptation)

Our team specializes in personalizing model performance. We work with you to understand your traffic patterns and create customized deployment templates that maximize performance for your use case.

***

## Performance factors

**Q: What factors affect model latency and performance?**

Key factors that impact latency and performance include:

* **Model architecture and size**
* **Hardware configuration**
* **Network conditions**
* **Request patterns**
* **Batch size settings**
* **Caching implementation**

***

## Best practices

**Q: What are the best practices for optimizing performance?**

For optimal performance, follow these recommendations:

1. **Choose an appropriate model size** for your specific use case.
2. **Implement batching strategies** to improve efficiency.
3. **Use quantization** where applicable to reduce computational load.
4. **Monitor and adjust scaling parameters** to meet demand.
5. **Optimize prompt lengths** to reduce processing time.
6. **Implement caching** to minimize repeated calculations.

***

## Additional resources

* **Discord Community**: [discord.gg/fireworks-ai](https://discord.gg/fireworks-ai)
* **Email Support**: [inquiries@fireworks.ai](mailto:inquiries@fireworks.ai)
* **Documentation**: [Fireworks.ai docs](https://fireworks.ai/docs)
