Features¶
This section documents every public feature of the llama-crab API.
Each page goes end-to-end: the why, the how, the pitfalls, and a
complete runnable example.
-
Plain prompt → text. Stop sequences, log probabilities, streaming, best-of-N, and FIM (fill-in-the-middle) for code.
-
Role-based messages, 14 built-in Jinja2 templates, a Jinja2 subset renderer, the tool-call parser for ChatML / Mistral / Llama 3 / Functionary, and incremental parsing.
-
Pair a text GGUF with an
mmprojprojector, decode local images intoMtmdBitmap, evaluate multimodal chunks, and continue generation with the normal sampler chain. -
Mean / CLS / Last pooling, L2-normalised vectors, semantic search, and the cross-encoder
Llama::rerankhelper. -
Convert a JSON Schema 2020-12 document into a GBNF grammar and use the grammar sampler to force the model to emit only valid output. The most reliable way to get structured data out of a model.
-
The
PromptLookupDecodingdraft model (no extra weights) and theDraftModeltrait for plugging in your own draft. Thespeculative_decodefree function drives the verify step. -
Multi-turn chat with a growing history. Template auto-detection, history trimming strategies, and session persistence.
Picking the right feature for the job¶
flowchart TD
Q{What are you building?}
Q -->|Code completion| A[Text completion]
Q -->|Chatbot / agent| B[Chat & tool calling]
Q -->|Image / audio Q&A| C[Multimodal]
Q -->|Search / clustering| D[Embeddings & reranking]
Q -->|Extracting structured data| E[Grammars]
Q -->|Long-form generation| F[Speculative decoding]
Q -->|Persistent assistant| G[Stateful chat]
If you're not sure where to start, the quickstart example
walks through plain completion, chat, FIM and embeddings in a single
~80-line program.