Open Source · Apache 2.0 License

Fine-tune. Export.
Deploy on NPU.

The end-to-end open-source platform for on-device LLM deployment. Train with custom Triton kernels, export to CoreML, run on Apple Neural Engine.

Get Started Read Docs

$ pip install nimbo

$ nimbo train --model meta-llama/Llama-3.2-1B

$ nimbo export --format coreml --quantize lut_4bit

Nimbo — On-device LLM deployment platform

How It Works

Three steps to on-device AI

From training to deployment in minutes, not months.

⚙

Step 1

Train

LoRA/QLoRA fine-tuning with custom Triton kernels. 8x faster training on consumer GPUs.

📦

Step 2

Export

Convert to CoreML with ANE-optimized LUT quantization. 4-bit, 6-bit, and 8-bit support.

📱

Step 3

Deploy

Run on Apple Neural Engine. Not CPU — NPU. Real-time inference on iPhone and iPad.

Faster training with
custom Triton kernels

4-bit

LUT quantization
for on-device deployment

3 lines

To start
fine-tuning

NPU

Apple Neural Engine
not CPU

Features

Built for performance

Everything you need to train and deploy LLMs on-device, nothing you don't.

⚡

Custom Triton Kernels

Hand-optimized RMSNorm (8x), SwiGLU (5x), and RoPE (2x) kernels for dramatically faster training.

🔧

Advanced LoRA Variants

Full support for OLoRA, PiSSA, DoRA, RSLoRA, and LoftQ initialization — go beyond standard LoRA.

🍏

CoreML Export

ANE-optimized export pipeline with LUT quantization. Deploy directly to Apple Neural Engine.

🔥

Minimal & Fast

~50MB install vs 500MB+ alternatives. Minimal dependencies, maximum speed.

🎯

Response-Only Training

Instruction tuning with masked loss — train on completions only for cleaner, more focused models.

📈

Production Callbacks

Built-in W&B integration, early stopping, memory monitoring, and checkpoint management.

Code Examples

Simple by design

From training to on-device inference — clean APIs at every step.

from nimbo import Nimbo, LoRAConfig, TrainingConfig

# Initialize trainer with model and dataset
trainer = Nimbo(
    base_model_name="meta-llama/Llama-3.2-1B",
    dataset="your_dataset.jsonl",
    lora_config=LoRAConfig(r=16, lora_alpha=32),
    training_config=TrainingConfig(
        learning_rate=2e-4,
        num_train_epochs=3,
        train_on_responses_only=True,  # Masked loss on completions
    ),
    use_triton_kernels=True,          # 8x faster RMSNorm, SwiGLU, RoPE
)

trainer.train()
trainer.save()   # Merged model → ./nimbo_output/final_merged

from nimbo.export.coreml import convert_hf_to_coreml

# Convert merged model to CoreML with LUT quantization
result = convert_hf_to_coreml(
    model_id="./nimbo_output/final_merged",
    output_dir="./coreml_output",
    lut_bits=4,                  # 4-bit LUT quantization for ANE
    context_length=512,
    split_model=True,             # Split: embeddings, decoder, lm_head
)

# Output: .mlpackage files + meta.yaml + tokenizer
print(result.output_paths)

import NimboCore

// Load model from Files app or bundle
let manager = InferenceManager()
try await manager.loadModel(from: modelURL)

// Generate with streaming tokens
try await manager.generate(
    prompt: "Explain quantum computing",
    maxTokens: 512,
    temperature: 0.7
) { token in
    print(token, terminator: "")
}

// Runs on Apple Neural Engine — not CPU

Supported Models

Works with your favorite models

Train, export, and deploy popular open-source LLMs.

LLaMA 3.2 1B

LLaMA 3.2 3B

LLaMA 3.1 8B

EXAONE 4.0 1.2B

EXAONE 4.0

Phi-2

Phi-3

Qwen2

Mistral 7B

Gemma 2B

Sample App

See it running on iPhone

NimboChat — a production-ready SwiftUI chat app powered by on-device inference.

📱

NimboChat Screenshot

NimboChat

A fully-featured iOS chat application that demonstrates real-time LLM inference on Apple Neural Engine. Built with SwiftUI, powered by NimboCore.

✓ Streaming token generation with real-time display
✓ Multiple model support — switch models on the fly
✓ Conversation management with persistent history
✓ CoreML + Apple Neural Engine inference
✓ Files app integration for model loading
✓ Dark mode UI, optimized for iPhone

View on GitHub

Comparison

Why Nimbo?

See how Nimbo compares to other fine-tuning frameworks.

	Nimbo	Transformers	Unsloth
Install size	~50 MB	500 MB+	200 MB+
Dependencies	Minimal	Heavy	Moderate
CoreML export	✓ Built-in	✗	✗
On-device sample app	✓ NimboChat	✗	✗
NPU support	✓ Apple ANE	✗	✗
Custom Triton kernels	✓ 8x speedup	✗	✓
Learning curve	Low	Steep	Moderate
License	Apache 2.0	Apache 2.0	Apache 2.0

Fine-tune. Export. Deploy on NPU.