stateful_chat — Interactive REPL¶
A multi-turn chat REPL that grows the conversation history on every
turn. Supports /clear, /save, and EOF to quit. The model is
loaded once and reused; only the message list grows.
Run¶
Downloads Qwen2.5-0.5B-Instruct-GGUF (~400 MB).
Commands¶
| Command | Action |
|---|---|
/exit |
Quit (also /quit, /q, or Ctrl+D). |
/clear |
Reset history (keeps the system message). |
/save |
Print the conversation as JSON. |
| anything else | Sent as a user message. |
What it does¶
use llama_crab::chat::BuiltinTemplate;
use llama_crab::high_level::chat_completion::{create_chat_completion_with, ChatMessage};
use llama_crab::{Llama, LlamaParams, Role};
let mut llama = Llama::load(LlamaParams::new("model.gguf").with_n_ctx(4096))?;
let mut history: Vec<ChatMessage> = vec![
ChatMessage::new(Role::System,
"You are a helpful, concise assistant. Always reply in English, in under 2 sentences."),
];
// On every user turn:
history.push(ChatMessage::new(Role::User, "What is Rust?"));
let resp = create_chat_completion_with(
&mut llama, &history, BuiltinTemplate::ChatMl, &[], 128,
)?;
history.push(ChatMessage::new(Role::Assistant, resp.content));
The history is the entire context. Each high-level call re-sends
and re-evaluates the rendered history. Prompt-cache/session APIs
are manual tools for lower-level loops; they are not used
automatically by create_chat_completion_with.
Expected output¶
🦀 llama-crab interactive chat
model : models/qwen2.5-0.5b-instruct-q4_k_m.gguf
commands: /exit /clear /save
> What is Rust?
(0.81s)
assistant> Rust is a memory-safe systems programming language.
> /save
[
{ "role": "system", ... },
{ "role": "user", "content": "What is Rust?" },
{ "role": "assistant", "content": "Rust is a ..." }
]
Command parsing¶
The REPL parses the input line and dispatches:
flowchart TD
A[Read line] --> B{Starts with /?}
B -- no --> C[history.push user]
C --> D[create_chat_completion_with]
D --> E[history.push assistant]
B -- yes --> F{Which command?}
F -- /clear --> G[Reset history]
F -- /save --> H[Print history as JSON]
F -- /exit --> I[Break loop]
F -- else --> J[Show help]
Trimming history¶
The context size caps the number of tokens the model can see. When
the history grows past n_ctx, the REPL trims the oldest turns
(keeping the system message):
const MAX_HISTORY_TURNS: usize = 40;
if history.len() > MAX_HISTORY_TURNS {
let system = history[0].clone();
history = std::iter::once(system)
.chain(history.into_iter().skip(1).rev().take(MAX_HISTORY_TURNS).rev())
.collect();
}
For a smarter strategy, see the stateful chat guide.
Persisting the session¶
The /save command serialises the history to JSON. The REPL does
not auto-load it on the next start, but you can add a --load <file>
flag and call:
let raw = std::fs::read_to_string(path)?;
let history: Vec<ChatMessage> = serde_json::from_str(&raw)?;
Full source¶
examples/stateful_chat/src/main.rs.
Where to next?¶
- Chatbot recipe — turn this REPL into a deployable agent.
- Caching & session state — skip re-evaluating previous turns.
- Tool calling — add function calls to the loop.