Open Source · Apache 2.0 License

Fine-tune. Export.
Deploy on NPU.

The end-to-end open-source platform for on-device LLM deployment. Train with custom Triton kernels, export to CoreML, run on Apple Neural Engine.

$ pip install nimbo
$ nimbo train --model meta-llama/Llama-3.2-1B
$ nimbo export --format coreml --quantize lut_4bit
Nimbo — On-device LLM deployment platform

Three steps to on-device AI

From training to deployment in minutes, not months.

Step 1

Train

LoRA/QLoRA fine-tuning with custom Triton kernels. 8x faster training on consumer GPUs.

📦
Step 2

Export

Convert to CoreML with ANE-optimized LUT quantization. 4-bit, 6-bit, and 8-bit support.

📱
Step 3

Deploy

Run on Apple Neural Engine. Not CPU — NPU. Real-time inference on iPhone and iPad.

8x
Faster training with
custom Triton kernels
4-bit
LUT quantization
for on-device deployment
3 lines
To start
fine-tuning
NPU
Apple Neural Engine
not CPU

Built for performance

Everything you need to train and deploy LLMs on-device, nothing you don't.

Custom Triton Kernels

Hand-optimized RMSNorm (8x), SwiGLU (5x), and RoPE (2x) kernels for dramatically faster training.

🔧

Advanced LoRA Variants

Full support for OLoRA, PiSSA, DoRA, RSLoRA, and LoftQ initialization — go beyond standard LoRA.

🍏

CoreML Export

ANE-optimized export pipeline with LUT quantization. Deploy directly to Apple Neural Engine.

🔥

Minimal & Fast

~50MB install vs 500MB+ alternatives. Minimal dependencies, maximum speed.

🎯

Response-Only Training

Instruction tuning with masked loss — train on completions only for cleaner, more focused models.

📈

Production Callbacks

Built-in W&B integration, early stopping, memory monitoring, and checkpoint management.

Simple by design

From training to on-device inference — clean APIs at every step.

from nimbo import Nimbo, LoRAConfig, TrainingConfig

# Initialize trainer with model and dataset
trainer = Nimbo(
    base_model_name="meta-llama/Llama-3.2-1B",
    dataset="your_dataset.jsonl",
    lora_config=LoRAConfig(r=16, lora_alpha=32),
    training_config=TrainingConfig(
        learning_rate=2e-4,
        num_train_epochs=3,
        train_on_responses_only=True,  # Masked loss on completions
    ),
    use_triton_kernels=True,          # 8x faster RMSNorm, SwiGLU, RoPE
)

trainer.train()
trainer.save()   # Merged model → ./nimbo_output/final_merged
from nimbo.export.coreml import convert_hf_to_coreml

# Convert merged model to CoreML with LUT quantization
result = convert_hf_to_coreml(
    model_id="./nimbo_output/final_merged",
    output_dir="./coreml_output",
    lut_bits=4,                  # 4-bit LUT quantization for ANE
    context_length=512,
    split_model=True,             # Split: embeddings, decoder, lm_head
)

# Output: .mlpackage files + meta.yaml + tokenizer
print(result.output_paths)
import NimboCore

// Load model from Files app or bundle
let manager = InferenceManager()
try await manager.loadModel(from: modelURL)

// Generate with streaming tokens
try await manager.generate(
    prompt: "Explain quantum computing",
    maxTokens: 512,
    temperature: 0.7
) { token in
    print(token, terminator: "")
}

// Runs on Apple Neural Engine — not CPU

Works with your favorite models

Train, export, and deploy popular open-source LLMs.

LLaMA 3.1 8B
EXAONE 4.0
Phi-3
Qwen2
Mistral 7B
Gemma 2B

See it running on iPhone

NimboChat — a production-ready SwiftUI chat app powered by on-device inference.

📱
NimboChat Screenshot

NimboChat

A fully-featured iOS chat application that demonstrates real-time LLM inference on Apple Neural Engine. Built with SwiftUI, powered by NimboCore.

  • Streaming token generation with real-time display
  • Multiple model support — switch models on the fly
  • Conversation management with persistent history
  • CoreML + Apple Neural Engine inference
  • Files app integration for model loading
  • Dark mode UI, optimized for iPhone
View on GitHub

Why Nimbo?

See how Nimbo compares to other fine-tuning frameworks.

Nimbo Transformers Unsloth
Install size ~50 MB 500 MB+ 200 MB+
Dependencies Minimal Heavy Moderate
CoreML export ✓ Built-in
On-device sample app ✓ NimboChat
NPU support ✓ Apple ANE
Custom Triton kernels ✓ 8x speedup
Learning curve Low Steep Moderate
License Apache 2.0 Apache 2.0 Apache 2.0

Ready to deploy LLMs
on-device?

Get started with Nimbo in minutes. Open source, Apache 2.0 licensed, community driven.