structured — Constrained JSON output¶
The example uses a JSON Schema → GBNF grammar to force the model to emit only valid JSON of a specific shape. The output is guaranteed to be parseable, regardless of model size or prompt phrasing.
Run¶
The first positional argument is the path to any text GGUF.
What it does¶
use llama_crab::high_level::completion::json_schema_grammar;
use llama_crab::high_level::completion::CompletionOptions;
use llama_crab::sampling::{LlamaSampler, SamplerChain};
use llama_crab::{Llama, LlamaParams};
use serde_json::json;
fn main() -> Result<(), Box<dyn std::error::Error>> {
let schema = json!({
"type": "object",
"properties": {
"name": { "type": "string" },
"age": { "type": "integer" }
},
"required": ["name", "age"]
});
let grammar_text = json_schema_grammar(&schema).unwrap();
let mut llama = Llama::load(LlamaParams::new("model.gguf").with_n_ctx(1024))?;
let grammar = unsafe { LlamaSampler::grammar(llama.model(), &grammar_text, "root")? };
let greedy = LlamaSampler::greedy()?;
let mut sampler = SamplerChain::new()
.add_sampler(grammar)
.add_sampler(greedy)
.build();
let resp = llama.create_completion_with_sampler(
"Generate a fictional person as JSON: ",
CompletionOptions::new(32),
&mut sampler,
)?;
println!("{}", resp.text);
Ok(())
}
Expected output¶
The model can emit any name or age; the shape is fixed.
How it works¶
flowchart LR
A[JSON Schema] -->|json_schema_grammar| B[GBNF string]
B --> C[LlamaSampler::grammar]
D[Logits] --> C
C -->|mask out invalid tokens| E[Sampled token]
E --> F[Always valid output]
The grammar sampler runs after every other sampler in the
chain. It looks at the current context, computes the set of tokens
that would keep the output valid against the grammar, and masks
all other tokens' logits to -inf. The next sampler in the chain
then picks from the masked distribution.
The result: the model literally cannot emit a token that would break the grammar.
Supported JSON-Schema features¶
The converter understands a useful subset of JSON Schema 2020-12:
| Feature | Status |
|---|---|
type: object with properties, required, additionalProperties |
✅ |
type: array with items, prefixItems, minItems, maxItems |
✅ |
type: string with minLength, maxLength, pattern |
✅ |
type: integer / number with minimum, maximum, … |
✅ |
type: boolean, null |
✅ |
enum, const |
✅ |
format: date-time, email, uri, uuid |
✅ |
oneOf, anyOf, allOf |
✅ |
$ref (local #/definitions/...) |
✅ |
definitions, $defs |
✅ |
See the grammars guide for the full table.
A more complex schema¶
Suppose you want a list of people, each with a name, age, and optional email:
let schema = json!({
"type": "array",
"items": {
"type": "object",
"properties": {
"name": { "type": "string" },
"age": { "type": "integer", "minimum": 0 },
"email": { "type": "string", "format": "email" }
},
"required": ["name", "age"]
},
"minItems": 1,
"maxItems": 5
});
The grammar is generated, the sampler chain is built, and the output is always a 1-to-5-element array of valid person objects.
Combining with sampling¶
The grammar sampler is added to the chain, not a replacement for it. A typical chain is:
let mut sampler = SamplerChain::new()
.temp(0.7) // temperature
.top_p(0.9, 1) // nucleus
.add_sampler(grammar) // grammar (must be last)
.add_sampler(greedy)
.build();
Putting the grammar last ensures every token the chain emits keeps the output valid.
Common pitfalls¶
| Pitfall | What goes wrong | Fix |
|---|---|---|
common feature not enabled |
LlamaSampler::grammar is not in scope. |
Add features = ["common"] to the dependency. |
Schema with no type keyword |
Grammar is unconstrained. | Add type: object (or whatever the root is). |
| Grammar sampler runs before another sampler | The second sampler picks an invalid token. | Always put the grammar sampler last in the chain. |
| Model is too small | Output is valid but semantically off. | Increase model size or improve the prompt. |
Full source¶
examples/structured/src/main.rs.
Where to next?¶
- Grammars guide — the underlying safe API.
- Server structured output — the HTTP
response_formatfield. - Tool calling — when the structured output is a function call, use the chat pipeline instead.