Skip to content

Export Formats

Export a trained model to safetensors, ONNX, or GGUF for deployment.


Quick start

# Export to safetensors (always available, no extra install)
xlmtec export output/run1 --format safetensors --output exported/

# Export to ONNX with fp16 quantisation
xlmtec export output/run1 --format onnx --quantize fp16 --output exported/

# Export to GGUF with 4-bit quantisation
xlmtec export output/run1 --format gguf --quantize q4_0 --output exported/

# Preview without writing any files
xlmtec export output/run1 --format safetensors --dry-run

Supported formats

Format Install required Quantise options Best for
safetensors none Safe storage, fast loading
onnx pip install xlmtec[onnx] fp32, fp16, int8 Cross-platform inference, ONNX Runtime
gguf pip install xlmtec[gguf] + llama.cpp q4_0, q4_1, q8_0, f16 llama.cpp / LM Studio / Ollama

Installing extras

# ONNX export
pip install "xlmtec[onnx]"

# GGUF export
pip install "xlmtec[gguf]"
# Also requires llama.cpp — clone and build, then pass --llama-cpp-dir
git clone https://github.com/ggerganov/llama.cpp

Options

Flag Default Description
--format required safetensors, onnx, or gguf
--output exported/ Directory to write exported files
--quantize none Quantisation type (format-specific)
--llama-cpp-dir auto-detect Path to llama.cpp repo root (GGUF only)
--dry-run off Validate only — do not write files

Dry run

--dry-run validates your model directory and options without loading the model or writing any files. Useful to confirm everything is correct before a long export:

xlmtec export output/run1 --format onnx --quantize fp16 --dry-run
┌─ Export Plan ─────────────────────────────────────┐
│  Source   : output/run1                           │
│  Format   : onnx                                  │
│  Quantise : fp16                                  │
│  Output   : exported/                             │
│  Deps     : optimum ✓  onnx ✓  onnxruntime ✓     │
└───────────────────────────────────────────────────┘
Remove --dry-run to export.

GGUF workflow

GGUF export calls llama.cpp's convert_hf_to_gguf.py script:

# 1. Clone and build llama.cpp
git clone https://github.com/ggerganov/llama.cpp
cd llama.cpp && make

# 2. Export with xlmtec
xlmtec export output/run1 \
  --format gguf \
  --quantize q4_0 \
  --llama-cpp-dir ~/llama.cpp \
  --output exported/

The resulting .gguf file can be loaded directly by LM Studio, Ollama, or llama.cpp's main binary.


Merge adapter before export

If your run used LoRA/QLoRA, merge the adapter into the base model first:

xlmtec merge output/run1 --output merged/
xlmtec export merged/ --format gguf --quantize q4_0 --output exported/