Examples¶
The repository ships with 14 self-contained example crates in
[examples/], one per public feature. Each is a standalone Cargo
crate you can copy from.
-
The smallest end-to-end program: load, tokenize, complete, chat, FIM. ~80 lines, fully annotated.
-
One-shot text completion with a custom sampler chain.
-
Token-by-token output through the high-level callback API.
-
A two-turn chat using a built-in template.
-
Interactive multi-turn REPL with
/clear,/save, EOF handling. -
Multimodal image + text with the high-level
MtmdContextAPI. -
Lower-level mtmd.h API: bitmap → chunks → eval.
-
Embedding extraction with L2 normalisation.
-
BGE-small + cosine ranking over a small corpus.
-
Bi-encoder reranker demo.
-
ToolDefinition+ fiveToolParserformats. -
JSON-Schema → GBNF → constrained JSON output.
-
PromptLookupDecodingdraft decoding.
One-command runner¶
Every example is wrapped by examples/run.sh, which downloads the
right model on first run and is idempotent afterwards:
./examples/run.sh quickstart # ~400 MB — text only, smallest demo
./examples/run.sh chat # same model — interactive REPL
./examples/run.sh stateful_chat # multi-turn REPL with /clear, /save
./examples/run.sh embeddings # ~30 MB — BGE-small embedding
./examples/run.sh embedding_search # BGE-small + cosine ranking
./examples/run.sh reranker # bi-encoder scoring
./examples/run.sh vision gemma4 # ~5 GB — vision + text chat
./examples/run.sh vision lfm-vl # ~1 GB — smaller vision model
./examples/run.sh mtmd gemma4 # raw mtmd.h API
./examples/run.sh tools # function calling
./examples/run.sh structured # JSON-schema grammar
./examples/run.sh speculative # prompt-lookup draft decoding
Without arguments, the script lists every available example.
Full table¶
| Example | Model | Size | What it shows |
|---|---|---|---|
quickstart |
Qwen2.5-0.5B-Instruct-GGUF |
~400 MB | Load → tokenize → complete → chat → FIM |
simple |
any text GGUF | varies | Plain text completion |
streaming |
same as quickstart |
~400 MB | High-level token-by-token output |
chat |
instruct GGUF | varies | One-shot chat with a builtin template |
stateful_chat |
same as quickstart |
~400 MB | REPL with growing history, /clear, /save |
vision |
Gemma 4 or LFM2.5-VL + mmproj | ~1–5 GB | High-level MtmdContext vision chat |
mtmd |
Gemma 4 + mmproj | ~5 GB | Raw mtmd.h API: bitmap → chunks → eval |
embeddings |
bge-small-en-v1.5-gguf |
~30 MB | Embedding extraction + L2 norm |
embedding_search |
bge-small-en-v1.5-gguf |
~30 MB | Semantic search with cosine ranking |
reranker |
embedding GGUF | varies | Bi-encoder ranking by cosine similarity |
tools |
tool-aware instruct GGUF | varies | ToolDefinition + 5 ToolParser formats |
structured |
any text GGUF | varies | json_schema_grammar() + JSON parsing |
speculative |
any text GGUF | varies | prompt-lookup n-gram draft |
Passing a different model¶
Every example accepts the GGUF path as the first positional argument:
Vision examples take <text.gguf> <mmproj.gguf> <image>.
Adding a new example¶
The boilerplate for a new example crate is ~15 lines:
[package]
name = "my_example"
version.workspace = true
edition.workspace = true
rust-version.workspace = true
publish = false
[[bin]]
name = "run_my_example"
path = "src/main.rs"
[dependencies]
llama-crab = { path = "../../llama-crab", version = "0.1.0" }
anyhow = "1"
use anyhow::Result;
use llama_crab::{Llama, LlamaParams};
fn main() -> Result<()> {
let mut llama = Llama::load(LlamaParams::new("models/your.gguf"))?;
let resp = llama.create_completion("Hello!", 32)?;
print!("{}", resp.text);
Ok(())
}
Then add examples/my_example to the members = [...] list in the
root Cargo.toml and a row to the table on this page.
Where to next?¶
- Quickstart — the smallest end-to-end program.
- Streaming — the most common request from app developers.
- Vision (mtmd) — if you want to feed images to a model.
- Chatbot recipe — when a single example isn't enough and you need to wire a full agent.