embeddings — Embedding extraction¶
A minimal example that loads an embedding GGUF, tokenizes text, runs
a single forward pass with with_embeddings(true) and prints the
L2-norm and a small preview of the resulting vector.
Run¶
Downloads bge-small-en-v1.5-q4_k_m.gguf (~30 MB).
What it does¶
use llama_crab::{Llama, LlamaParams};
let mut llama = Llama::load(
LlamaParams::new("models/bge-small-en-v1.5-q4_k_m.gguf")
.with_n_ctx(512)
.with_embeddings(true),
)?;
let text = "Hello, world!";
let embedding = llama.embed(text, true)?; // true = L2-normalize
let norm = embedding.iter().map(|v| v * v).sum::<f32>().sqrt();
println!("dim={}", embedding.len());
println!("l2_norm={norm:.6}");
The L2-normalised vector has norm = 1.0 (within float precision),
so the dot product of two vectors equals their cosine similarity.
Expected output¶
text: Hello, world!
embedding_dim: 384
embedding_l2_norm: 1.000000
embedding_preview: [0.012345, -0.006789, ...]
Pooling type¶
BGE / GTE / E5 expect CLS pooling — the first token (BOS) is the summary. Use:
use llama_crab::context::params::PoolingType;
use llama_crab::{Llama, LlamaParams};
let mut llama = Llama::load(
LlamaParams::new("bge-small-en-v1.5-q4_k_m.gguf")
.with_n_ctx(512)
.with_embeddings(true)
.with_pooling_type(PoolingType::Cls),
)?;
| Pooling | When to use it |
|---|---|
PoolingType::None |
Token-level embeddings (no pooling). |
PoolingType::Mean |
Default. Sentence-transformers-style models. |
PoolingType::Cls |
BGE / GTE / E5. |
PoolingType::Last |
Last non-pad token. |
PoolingType::Rank |
Cross-encoder rerankers. |
Batch embeddings¶
For multi-document workloads, use embed_texts:
let texts = vec!["Rust is memory-safe.", "Python is dynamic."];
let embeddings = llama.embed_texts(&texts, true)?;
println!("dim={}", embeddings[0].len());
Common variations¶
Pitfalls¶
- Forgot
with_embeddings(true)—embedpanics with "embedding mode is not enabled". - Wrong pooling type — similarity is
NaNor close to zero across all pairs. BGE / GTE / E5 wantCls; sentence-transformers wantMean. - Comparing unnormalised vectors — similarity scores look
wrong. Pass
normalize = truetoembed.
Full source¶
examples/embeddings/src/main.rs.
Where to next?¶
- Semantic search — cosine ranking over a small corpus.
- Reranker — bi-encoder ranking demo.
- Embeddings & reranking guide — the full reference.
- RAG recipe — embeddings in a retrieval pipeline.