Cargo features¶

The canonical list of Cargo features, with the long-form description of each one. See the getting started guide for a shorter, task- oriented overview.

Default features¶

[dependencies]
llama-crab = "0.1"

Expands to:

features = ["openmp"]
# On `aarch64-apple-darwin`, also enables "metal".

Compute backends¶

Feature	Description
`openmp`	CPU backend with OpenMP. Enabled by default.
`metal`	Apple Metal backend. Enabled by default on `aarch64-apple-darwin`.
`cuda`	NVIDIA CUDA backend.
`cuda-no-vmm`	NVIDIA CUDA backend without virtual memory management.
`vulkan`	Vulkan / SPIR-V backend.
`rocm`	AMD ROCm / HIP backend.
`opencl`	OpenCL backend, primarily for Android Adreno and Arm64 devices.
`kleidiai`	KleidiAI CPU kernels for Arm mobile targets.
`dynamic-link`	Links llama.cpp as a shared object instead of static.
`dynamic-backends`	Loads GGML backends dynamically.
`system-ggml`	Uses a system GGML installation instead of the bundled copy.

Optional subsystems¶

Feature	Description
`mtmd`	Multimodal support through `mtmd.h`; enables image and audio helpers. Required for vision.
`common`	Builds llama.cpp's `common` utilities used by chat and grammar helpers. Required for JSON-Schema → GBNF and the `grammar` sampler.
`llguidance`	Enables the `llguidance` sampler integration. Faster and more flexible than the GBNF sampler for complex grammars.
`hf-tokenizer`	Enables Hugging Face `tokenizers` crate integration. Use when you load a model from a `tokenizer.json` instead of the GGUF-embedded tokenizer.
`disk-cache`	Enables the persistent `sled`-backed prompt cache.

Mobile / Android-only¶

Feature	Description
`shared-stdcxx`	Uses `c++_shared` for Android builds.
`static-stdcxx`	Uses `c++_static` for Android builds. The historical default.

These two are mutually exclusive. If neither is set, Android keeps the legacy c++_static behaviour.

Mutually exclusive groups¶

Group	Pick at most one
CUDA variant	`cuda`, `cuda-no-vmm`
Android C++ runtime	`shared-stdcxx`, `static-stdcxx`

The crate's build.rs will fail the build with a clear error if two mutually exclusive features are enabled together.

Recommended combinations¶

Apple Silicon (macOS)Linux + NVIDIALinux + AMDCross-vendor (Vulkan)CPU onlyiOS appAndroid (Snapdragon / Adreno)Vision-language workload

[dependencies]
llama-crab = { version = "0.1", default-features = false, features = ["metal", "openmp"] }

[dependencies]
llama-crab = { version = "0.1", default-features = false, features = ["cuda", "openmp"] }

[dependencies]
llama-crab = { version = "0.1", default-features = false, features = ["rocm", "openmp"] }

[dependencies]
llama-crab = { version = "0.1", default-features = false, features = ["vulkan", "openmp"] }

[dependencies]
llama-crab = { version = "0.1", default-features = false, features = ["openmp"] }

[dependencies]
llama-crab = { version = "0.1", default-features = false, features = ["metal"] }

Build with the dedicated profile:

cargo build --profile release-perf --target aarch64-apple-ios \
    --no-default-features --features metal

[dependencies]
llama-crab = { version = "0.1", default-features = false, features = ["openmp", "kleidiai", "shared-stdcxx"] }

Build with the size-optimised profile:

cargo build --profile release-size --target aarch64-linux-android \
    --no-default-features --features openmp,kleidiai,shared-stdcxx

[dependencies]
llama-crab = { version = "0.1", default-features = false, features = ["metal", "openmp", "mtmd"] }

Detecting which features are active¶

The compiled LlamaBackend exposes a few capability probes you can call at runtime:

use llama_crab::LlamaBackend;

let backend = LlamaBackend::init()?;
println!("GPU offload : {}", backend.supports_gpu_offload());
println!("mmap        : {}", backend.supports_mmap());
println!("mlock       : {}", backend.supports_mlock());
println!("RPC         : {}", backend.supports_rpc());

What about default features in CI?¶

CI pins the feature combinations it actually exercises:

CI matrix row	Features
`linux-cpu`	`openmp`
`linux-cuda`	`cuda`, `openmp`
`linux-vulkan`	`vulkan`, `openmp`
`linux-rocm`	`rocm`, `openmp`
`macos-metal`	`metal`, `openmp`
`macos-cpu`	`openmp`
`windows-cpu`	`openmp`

If you want a backend to be officially supported in CI, open an issue and propose the matrix addition.

Where to next?¶

Cargo features (getting started) — the task-oriented overview.
Backends & GPU offload — the runtime configuration.
Mobile distribution — the iOS and Android recipes.