Recipes¶

This section collects end-to-end recipes for the most common tasks. Each recipe is a small, complete program with a one-line summary of what it does, the full source, and a discussion of the trade-offs.

Performance tuning

Measure tokens per second, find the bottleneck, and tune n_threads, n_gpu_layers, batch size, and sampler chain to maximise throughput on your hardware.
Choosing a model

Quant size vs. accuracy vs. speed vs. memory. A short guide to picking the right GGUF for the job.
Building a chatbot

From the 80-line REPL to a deployable agent: state machines, tool calls, history trimming, and session persistence.
Building a RAG pipeline

Embed → store → retrieve → re-rank → answer. The full end-to-end pattern.

Reading order¶

The recipes are independent — pick the one that matches your current task. If you are new to llama-crab, the Performance tuning page is a good starting point because it teaches you how to measure before you optimise.

flowchart LR
    A[Quickstart] --> B[Performance tuning]
    A --> C[Choosing a model]
    A --> D[Chatbot]
    A --> E[RAG]