Crate layout¶
The llama-crab workspace contains two library crates and one
binary crate. This page is the map you need to navigate the source
tree.
llama-crab/
├── llama-crab-sys/ # Raw FFI (bindgen + CMake)
├── llama-crab/ # 100 % safe Rust API
│ ├── backend # LlamaBackend + NumaStrategy
│ ├── model # LlamaModel + LlamaModelParams
│ ├── context # LlamaContext + params + embeddings + session
│ ├── batch # LlamaBatch
│ ├── sampling # LlamaSampler + SamplerChain (17 strategies)
│ ├── chat # ChatMessage + templates + tool calling
│ ├── speculative # PromptLookupDecoding + speculative_decode
│ ├── multimodal # MtmdContext + MtmdBitmap (feature mtmd)
│ ├── cache # RamCache + DiskCache
│ ├── json_schema # JSON-Schema → GBNF
│ ├── high_level # Llama orchestrator + create_completion
│ ├── error # LlamaError enum
│ └── log # tracing integration
└── llama-crab-server/ # HTTP binary built on top of llama-crab
llama-crab-sys¶
The low-level FFI package. Contains:
- The
wrapper.hheader that selects the llama.cpp C API to expose. - The
build.rsthat runsbindgenagainst the wrapper andcmakeagainst the bundledllama.cpp/source tree. - The generated
bindings.rs(do not edit — it is regenerated on every build). - A handful of safe wrappers for the most-used FFI calls, so consumers of the safe crate can stay in safe Rust.
Most applications should depend on llama-crab instead. Use
llama-crab-sys only when you need direct access to a llama.cpp
symbol that the safe crate does not (yet) wrap.
llama-crab¶
The safe high-level API. Each module is documented on docs.rs/llama-crab; this page gives the high-level responsibilities.
| Module | Responsibility |
|---|---|
backend |
LlamaBackend, the GGML backend handle, NUMA placement. |
model |
LlamaModel — the loaded weights + tokenizer + metadata. |
context |
LlamaContext — the KV cache + the forward-pass driver. |
batch |
LlamaBatch — the typed batch builder for decode. |
sampling |
LlamaSampler + SamplerChain — 17 sampling strategies. |
chat |
ChatMessage, Role, BuiltinTemplate, render_builtin, the tool-call parser. |
speculative |
PromptLookupDecoding, DraftModel trait, speculative_decode. |
multimodal (feature mtmd) |
MtmdContext, MtmdBitmap, MtmdInputText, chunks.eval. |
cache |
RamCache, DiskCache (feature disk-cache), the Cache trait. |
json_schema |
The JSON-Schema → GBNF converter. |
high_level |
Llama, the orchestrator that owns the model + context + sampler state. |
error |
LlamaError — the single error type for the safe API. |
log |
tracing integration — info/warn logs around model load. |
The Llama orchestrator¶
The most-used type in the safe API. It owns:
- A
LlamaBackendguard (initialised onLlama::load). - A
LlamaModel(the weights + tokenizer). - A
LlamaContext(the KV cache). - A default
SamplerChain(greedy by default, configurable throughLlama::create_*_with_sampler).
The high-level methods (create_completion, create_chat_completion,
embed, rerank, complete_infill) hide the loop illustrated in
the architecture guide behind a
single function call.
llama-crab-server¶
A thin HTTP binary built on top of the safe API. It keeps inference inside the Rust binding and uses a worker thread that owns the model and context. See the server guide for the runtime shape and the API surface.
Examples¶
The examples/
directory contains 14 self-contained Cargo crates that exercise
every public feature. Each one is a [[bin]] of its own crate and
can be copied into another project without modification. See the
examples index for the table.
Integration tests¶
The llama-crab/tests/
directory contains the same examples in test form. They skip cleanly
when the model is not on disk, so a fresh clone can build the test
binary without owning the model.
| Test | Model | What it verifies |
|---|---|---|
gemma4_text.rs |
Gemma 4 (text-only) | Text generation, no vision. |
gemma4_vision.rs |
Gemma 4 + mmproj + test image | The high-level MtmdContext API. |
lfm_vl_vision.rs |
LFM2.5-VL + mmproj + test image | Multimodal on a smaller model. |
Where to next?¶
- Architecture — the data flow inside a single forward pass.
- Cargo features — what each feature toggles.
- API on docs.rs — the auto-generated rustdoc.