Cargo features¶
llama-crab exposes a rich set of Cargo features so you only compile
what you actually need. The default feature set covers the most
common cases — CPU via OpenMP, plus Metal on Apple Silicon — but
production binaries should always pin the exact feature combination
that matches the target environment.
Default features¶
Expands to:
This is enough to run every example and most chatbots on a laptop. For production you almost always want to be explicit:
[dependencies]
llama-crab = { version = "0.1", default-features = false, features = ["metal", "openmp"] }
Feature matrix¶
Compute backends¶
| Feature | What it adds | Notes |
|---|---|---|
openmp |
CPU backend with OpenMP. | Default. |
metal |
Apple GPU backend. | Default on aarch64-apple-darwin. |
cuda |
NVIDIA CUDA backend. | Mutually exclusive with cuda-no-vmm. |
cuda-no-vmm |
CUDA without virtual memory management. | Use on systems where CUDA VMM is restricted. |
vulkan |
Vulkan / SPIR-V backend. | Works on most GPUs (NVIDIA, AMD, Intel, Apple). |
rocm |
AMD ROCm/HIP backend. | Requires a recent ROCm toolchain. |
opencl |
OpenCL backend, primarily for Android Adreno and Arm64 devices. | Requires OpenCL headers and an ICD loader. |
kleidiai |
KleidiAI CPU kernels for Arm mobile targets. | Pairs with openmp or opencl. |
dynamic-link |
Links llama.cpp as a shared object instead of static. | Cuts build time; requires a prebuilt libllama.so/.dylib/.dll. |
dynamic-backends |
Loads GGML backends dynamically. | Useful for plugin architectures. |
system-ggml |
Uses a system GGML installation instead of the bundled copy. | Skips the GGML build step. |
Optional subsystems¶
| Feature | What it adds |
|---|---|
mtmd |
Multimodal support through mtmd.h; enables image and audio helpers. Required for vision. |
common |
Builds llama.cpp's common utilities used by chat and grammar helpers. Required for JSON-Schema → GBNF and the grammar sampler. |
llguidance |
Enables the llguidance sampler integration. Faster and more flexible than the GBNF sampler for complex grammars. |
hf-tokenizer |
Enables Hugging Face tokenizers crate integration. Use when you load a model from a tokenizer.json instead of the GGUF-embedded tokenizer. |
disk-cache |
Enables the persistent sled-backed prompt cache. |
Mobile / Android-only¶
| Feature | What it adds |
|---|---|
shared-stdcxx |
Uses c++_shared for Android builds. |
static-stdcxx |
Uses c++_static for Android builds (the historical default). |
These two are mutually exclusive. If neither is set, Android keeps
the legacy c++_static behaviour.
Recommended combinations¶
Detecting which features are active¶
The compiled LlamaBackend exposes a few capability probes you can
call at runtime:
use llama_crab::LlamaBackend;
let backend = LlamaBackend::init()?;
println!("GPU offload : {}", backend.supports_gpu_offload());
println!("mmap : {}", backend.supports_mmap());
println!("mlock : {}", backend.supports_mlock());
println!("RPC : {}", backend.supports_rpc());
These are particularly useful for diagnostics in binaries that ship to multiple targets.
What about default features in CI?¶
CI pins the feature combinations it actually exercises:
| CI matrix row | Features |
|---|---|
linux-cpu |
openmp |
linux-cuda |
cuda, openmp |
linux-vulkan |
vulkan, openmp |
linux-rocm |
rocm, openmp |
macos-metal |
metal, openmp |
macos-cpu |
openmp |
windows-cpu |
openmp |
This guarantees that the code paths each backend exposes keep working release after release. If you want a backend to be officially supported in CI, open an issue and propose the matrix addition.
Where to next?¶
- Mobile distribution — the iOS / Android
recipes and the
MobilePresetdefaults. - Backends & GPU offload — how to pick a
backend and how
n_gpu_layersworks. - Cargo features reference — the same table, with the long-form description of each feature.