Understanding MLX Swift on Apple Silicon

May 26, 2026#Swift #iOS #AI #ML #Silicon

MLX Swift brings Apple’s MLX array framework to Swift, letting developers experiment with model execution and training directly on Apple silicon. This guide explains its architecture, unified-memory model, Swift APIs, LLM tooling, and the boundary between MLX Swift research workflows and Core ML deployment.

1. What MLX Swift Is (and Isn’t)

Before anything else, the positioning matters. The Swift.org announcement by the authors is direct:

“MLX is intended for research and not for production deployment of models in apps.” — David Koski, Awni Hannun, Ronan Collobert, swift.org/blog/mlx-swift

MLX Swift’s stated purpose is to make ML research and experimentation easier on Apple silicon for developers and researchers already working in Swift. It is not a replacement for Core ML, and the two serve different roles:

Core ML is Apple’s production deployment framework — model packaging, hardware-specific optimization (Neural Engine, GPU, CPU), App Store compatibility, versioned model formats, and tooling for model compression and fine-tuning for deployment.
MLX Swift is a researcher-friendly, dynamic array framework. You write ML code directly in Swift, iterate quickly, load models straight from Hugging Face, and experiment without conversion pipelines.

The distinction matters for how you evaluate MLX Swift. For production apps, Core ML is well-supported and as of WWDC24, handles generative models, stateful KV-cache, transformer operations, on-device adapters, and model compression — including a demonstrated workflow with Mistral 7B. When building for the App Store, that’s your primary framework.

MLX Swift is the right tool when: you’re exploring a model architecture in Swift, prototyping an on-device ML workflow, reproducing a Python MLX experiment in a native language, or building developer tooling and research applications where the production deployment constraints don’t apply yet.

With that framing clear, there’s a lot to explore.

2. What Is MLX, Really?

MLX is an array framework for machine learning research on Apple silicon, developed by Apple’s ML Research team and open-sourced in late 2023. Think of it as a NumPy/JAX-style library designed around Apple Silicon’s unique hardware characteristics.

At its core, MLX provides:

N-dimensional arrays with a NumPy-like API
Automatic differentiation for training
Lazy evaluation — operations build a computation graph and execute only when results are needed
Function transformations: grad(), valueAndGrad(), vmap(), and compile() — in the style of JAX
Unified compute: the same array object lives in unified memory and is accessible to both CPU and GPU

The Python MLX library (mlx) was the original implementation. MLX Swift is a Swift-native API wrapping the same underlying C++ engine, with an explicit goal of API parity with Python MLX. The repository tracks the same version numbering: MLX Swift 0.31.3 corresponds to the same generation as Python MLX 0.31.x.

3. Architecture: Why 82% of a Swift Library Is C++

Clone the repo and inspect the language breakdown: 82.1% C++, 13.7% Swift, with small amounts of C, Metal, Python, and CMake.

This isn’t a surprise once you understand the architecture. MLX Swift is a Swift interface over a C++ engine, bridged through a C layer.

The Layer Cake

┌──────────────────────────────────┐
│         Swift API (MLX Swift)    │  ← What you write
├──────────────────────────────────┤
│         mlx-c (C bindings)       │  ← Bridge layer
├──────────────────────────────────┤
│         MLX (C++ core)           │  ← The actual engine
└──────────────────────────────────┘

MLX (C++ core): The computational engine. Array operations, automatic differentiation, the computation graph, Metal GPU dispatch, memory management. This same core also powers the Python MLX bindings.

mlx-c: A thin C interface over the C++ core. Swift’s C++ interop exists but has limits; a clean C boundary makes the integration predictable and stable across compiler versions.

MLX Swift: The Swift layer. It wraps mlx-c and provides idiomatic Swift types, operator overloading, generics, protocol conformances, and Swift concurrency hooks. The Swift code is relatively thin — its job is ergonomics, not computation.

Both mlx and mlx-c are included as git submodules under Source/Cmlx/. This is why the first required step after cloning is:

git submodule update --init --recursive

Missing this step produces cryptic “file not found” errors deep in the Cmlx directory — one of the most common stumbling blocks for new users.

The Metal Shader Constraint

MLX Swift includes Metal shaders for GPU acceleration, and Metal shaders must be compiled by Xcode. The command-line swift build tool cannot compile .metal files.

Practical consequences:

GPU-accelerated builds require Xcode or xcodebuild
swift build from the terminal produces a CPU-only binary — it won’t error, it just won’t use the GPU
On Linux, Metal doesn’t exist; you use the CPU or CUDA backend via CMake instead

4. The Unified Memory Advantage

MLX’s design is explicitly shaped around Apple Silicon’s unified memory architecture, which is worth understanding before diving into code.

Traditional GPU computing has separate memory pools:

CPU RAM ←── PCIe Bus ──→ GPU VRAM (separate pool)

Data moving from CPU preprocessing to GPU inference crosses the PCIe bus — a relatively slow transfer. “Fitting a model in VRAM” is a real constraint, separate from total system RAM.

Apple Silicon works differently:

CPU Cores + GPU Cores
          ↕
    Unified Memory (shared physical pool)

Every compute unit — CPU cores and GPU cores — reads from and writes to the same physical memory. There is no separate VRAM. An array written by the CPU is immediately available to the GPU at zero transfer cost.

Per Apple’s MLX documentation, this means: “arrays live in shared memory. Operations on the CPU and GPU can be performed without transferring data between devices.” MLX’s allocator is designed around this, which is why memory-bound workloads on Apple Silicon often behave differently than on discrete GPU setups.

For ML experimentation, the practical effects are:

No separate VRAM budget: total unified memory (up to 512GB on M3 Ultra) is the single limit for model + activations + optimizer state
Zero-copy CPU-to-GPU: preprocessing on CPU, inference on GPU, no transfer cost between them
Large model feasibility: quantized models that wouldn’t fit in discrete VRAM often fit comfortably in unified memory

Note that MLX targets the CPU and GPU backends through unified memory. The Neural Engine is not a documented MLX execution target.

5. Package Structure: What’s in the Box

Four MLX Swift library targets cover the common array, neural-network, training, and random-generation workflows:

`MLX` — Core Array Operations

The foundation. The MLXArray type, all arithmetic and linear algebra ops, indexing, slicing, shape manipulation, reductions, and the function transformation primitives:

grad(_:) — automatic differentiation
valueAndGrad(model:_:) — compute value and gradient in one pass
vmap(_:) — vectorize a function over a batch dimension
compile(_:) — compile a function for repeated execution (equivalent to Python MLX’s mx.compile)

`MLXNN` — Neural Network Layers

Pre-built building blocks: Linear, Conv1d/Conv2d, MultiHeadAttention, RoPE, LayerNorm, RMSNorm, Embedding, LSTM, GRU, and more. These are composable classes that extend the Module base class.

`MLXOptimizers` — Training Optimizers

Gradient descent variants: SGD, Adam, AdamW, Adagrad, RMSProp. Designed to work with MLX’s lazy evaluation graph.

`MLXRandom` — Random Number Generation

Seeded, reproducible random arrays: uniform, normal, Bernoulli, categorical distributions. Used for weight initialization and dropout.

The package also exports MLXFFT, MLXLinalg, and MLXFast for Fourier transforms, additional linear algebra operations, and optimized primitives.

A typical Package.swift dependency block:

dependencies: [
    .package(url: "https://github.com/ml-explore/mlx-swift", from: "0.10.0")
],
targets: [
    .target(
        name: "YourTarget",
        dependencies: [
            .product(name: "MLX", package: "mlx-swift"),
            .product(name: "MLXNN", package: "mlx-swift"),
            .product(name: "MLXOptimizers", package: "mlx-swift"),
            .product(name: "MLXRandom", package: "mlx-swift"),
        ]
    )
]

6. Installation & Setup (and the Gotchas)

There are three installation paths, each with different tradeoffs.

Method 1: Xcode Package Dependency (Recommended for Apps)

In Xcode: File → Add Package Dependencies → paste https://github.com/ml-explore/mlx-swift.git.

Select the libraries you need and link them to your target. Xcode handles the Metal shader compilation automatically.

The Duplicate Framework Problem

If your app links MLX and you have an embedded framework that also links MLX, you end up with two copies in the same process. Shared global state gets duplicated and behavior becomes unpredictable.

Solutions:

Make your inner framework a static library (not .framework) so it shares the app’s MLX instance
Have the app not link MLX directly and let the framework provide it
Use xcode/MLX.xcodeproj to build MLX as a proper framework and manage the graph manually

Method 2: SwiftPM

Add to Package.swift:

dependencies: [
    .package(url: "https://github.com/ml-explore/mlx-swift", from: "0.10.0")
]

Important: swift build from the terminal produces a CPU-only binary — no Metal shaders. For GPU-accelerated macOS command-line tools, use xcodebuild:

xcodebuild build -scheme YourScheme -destination 'platform=OS X'

Method 3: CMake (Linux / Cross-Platform)

mkdir -p build && cd build

# macOS with Metal (GPU)
cmake .. -G Ninja && ninja

# Linux — CPU backend
cmake -DMLX_BUILD_METAL=OFF .. -G Ninja && ninja

# Linux — CUDA GPU backend
cmake -DMLX_BUILD_METAL=OFF -DMLX_BUILD_CUDA=ON .. -G Ninja && ninja

The Linux CUDA path means MLX Swift is not Apple-hardware-only — Metal is Apple-specific, but the CUDA backend brings GPU acceleration to Linux as well.

Don’t Skip the Submodule Step

For any local build:

git submodule update --init --recursive

The Source/Cmlx/mlx and Source/Cmlx/mlx-c directories are submodules. Without this step, the build fails with errors inside the Cmlx directory tree.

7. Swift API: A Tour with Code

The MLX Swift API closely mirrors Python MLX, by design. Here’s a side-by-side comparison followed by more complete examples.

Basic Array Operations

Python MLX:

import mlx.core as mx

a = mx.array([1.0, 2.0, 3.0])
b = mx.array([4.0, 5.0, 6.0])
c = a + b
mx.eval(c)
print(c)  # [5, 7, 9]

MLX Swift:

import MLX

// Use MLXArray(converting:) for Float/Double literals
let a = MLXArray(converting: [1.0, 2.0, 3.0])
let b = MLXArray(converting: [4.0, 5.0, 6.0])
let c = a + b
eval(c)
print(c)  // [5, 7, 9]

The eval() call is central to understanding MLX. Operations don’t execute immediately — they build a lazy computation graph. Calling eval() flushes it. This lets MLX fuse operations, optimize memory, and batch GPU dispatch efficiently.

The official Swift.org tour (from the MLX Swift authors) also shows range-based initialization:

import MLX
import MLXRandom

let r = MLXRandom.normal([2])
print(r)
// array([-0.125875, 0.264235], dtype=float32)

let a = MLXArray(0 ..< 6, [3, 2])
print(a)
// array([[0, 1],
//        [2, 3],
//        [4, 5]], dtype=int32)

// Slice the first two rows
print(a[0 ..< 2])
// array([[0, 1],
//        [2, 3]], dtype=int32)

Automatic Differentiation

func fn(_ x: MLXArray) -> MLXArray {
    x.square()
}

let gradFn = grad(fn)
let x = MLXArray(1.5)
let dfdx = gradFn(x)

// prints 3 (= 2 * 1.5)
print(dfdx)

Defining a Neural Network

Module in MLX Swift is a base class, not a protocol — subclass it to define your model:

import MLX
import MLXNN

class SimpleMLP: Module, UnaryLayer {
    let layer1: Linear
    let layer2: Linear

    init(inputDim: Int, hiddenDim: Int, outputDim: Int) {
        layer1 = Linear(inputDim, hiddenDim)
        layer2 = Linear(hiddenDim, outputDim)
    }

    func callAsFunction(_ x: MLXArray) -> MLXArray {
        var out = layer1(x)
        out = relu(out)
        out = layer2(out)
        return out
    }
}

Training Loop

import MLX
import MLXNN
import MLXOptimizers

let model = SimpleMLP(inputDim: 784, hiddenDim: 256, outputDim: 10)
let optimizer = Adam(learningRate: 1e-3)

func lossFunction(_ model: SimpleMLP, _ x: MLXArray, _ y: MLXArray) -> MLXArray {
    let logits = model(x)
    return crossEntropy(logits: logits, targets: y).mean()
}

// valueAndGrad(model:) returns value and gradients in one forward+backward pass
let lossAndGrad = valueAndGrad(model: model, lossFunction)
let (loss, grads) = lossAndGrad(model, xBatch, yBatch)

// Update weights
optimizer.update(model: model, gradients: grads)
eval(model, optimizer)

Compiled Functions

For functions called repeatedly, compile can avoid re-tracing the graph:

let compiledForward = compile(model.callAsFunction)
let output = compiledForward(input)

By default, a compiled function is recompiled when an input shape changes. MLX Swift also exposes compile(shapeless: true, ...) for functions that can safely operate across varying shapes without shape-dependent graph logic.

Async Patterns — Architectural Note

The following is pseudocode illustrating a design pattern, not a runnable mlx-swift-lm API snippet.

A common pattern for non-blocking inference is wrapping the model in a Swift actor, which serializes access to its isolated state and protects against concurrent mutation:

// Pseudocode — illustrative pattern only
actor InferenceSession {
    var session: SomeModelSession

    func generate(prompt: String) async -> AsyncStream<String> {
        AsyncStream { continuation in
            // Token generation loop runs here
            // Actor isolation protects session state
            // Use Task/detached tasks to move compute off main actor if needed
        }
    }
}

Actor isolation protects shared state; it doesn’t by itself determine which thread compute runs on. For actual LLM inference patterns using mlx-swift-lm, see the MLXChatExample source in the examples repository — it demonstrates the ChatSession API with streaming token output.

8. Real Examples: From MNIST to LLMs

The mlx-swift-examples repository contains complete, runnable applications covering the main use cases.

MNISTTrainer

A LeNet-style CNN trained on MNIST, running on both iOS and macOS. The full training loop — data download, forward pass, backpropagation, optimizer step, accuracy tracking — runs on-device. It’s a useful baseline for understanding MLX Swift’s training patterns in a contained setting.

LLMEval

Downloads a language model from Hugging Face (Mistral, Llama, and similar architectures), loads the tokenizer, and generates text from a prompt. The model weights download on first run and are cached locally. This is the starting point for understanding how mlx-swift-lm handles weight loading and autoregressive generation.

MLXChatExample

A chat application (iOS + macOS) supporting both LLMs and VLMs. You can load a vision-language model and ask questions about images you provide — all processed locally. The source code is a practical reference for the mlx-swift-lm ChatSession API and SwiftUI integration.

StableDiffusionExample

Runs Stable Diffusion locally — downloads weights from Hugging Face and generates images from text prompts. If you want to understand how MLX Swift handles multi-step generative loops (the diffusion process), this is the example to study.

llm-tool

A command-line tool for generating text with various LLMs from Hugging Face. Useful for systematic evaluation: you can control model ID, quantization, prompt, and token count, and record consistent measurements for your specific device.

9. Benchmarking Your Own Setup

The review that accompanied this draft correctly noted that generic token-per-second tables without reproducible methodology are not reliable. Rather than publish numbers that may not apply to your hardware or model, here’s how to measure your own setup properly.

Using `llm-tool`

The llm-tool CLI in mlx-swift-examples is the right instrument. A repeatable benchmark should record:

Parameter	Example value
Model ID (Hugging Face)	`mlx-community/Mistral-7B-Instruct-v0.3-4bit`
Quantization	`4-bit`
MLX Swift version	`0.31.3`
mlx-swift-lm version	`3.31.3`
Device	`MacBook Pro M3 Max, 48GB`
Prompt token count	`50`
Generated token count	`200`
Measurement	prompt processing tok/s + generation tok/s

Warm up the model first, run the same command multiple times, and report the individual results plus an aggregate. Record relevant conditions such as power mode and whether the device was already warm, since repeated runs can themselves change thermal behavior.

This approach produces numbers that are reproducible, attributable, and useful to other developers on the same hardware — far more valuable than a generic table.

10. Porting Models from Python MLX

One of MLX Swift’s explicit goals is easy portability from Python MLX. For LLM and VLM model ports, see the MLXLMCommon porting guide in the dedicated mlx-swift-lm package. The reusable language-model libraries were moved there from mlx-swift-examples.

What Maps Directly

Array ops: mx.matmul → MLX.matmul, mx.softmax → MLX.softmax, etc.
Module definitions: nn.Module subclass → MLXNN.Module subclass
Optimizer usage: optim.Adam → MLXOptimizers.Adam
Weight shapes and naming — if loading from the same safetensors checkpoint

Common Gotchas

jit → compile: Python MLX uses @mx.compile (also callable as mx.compile). The Swift equivalent is MLX.compile(_:). The draft’s original references to jit() were incorrect — use compile throughout.

Array initialization: MLXArray([1.0, 2.0, 3.0]) with a Swift [Double] literal may not resolve as expected. Use MLXArray(converting: [1.0, 2.0, 3.0]) or provide explicit [Float] literals: MLXArray([Float(1.0), 2.0, 3.0]).

valueAndGrad call signature: The Swift API is valueAndGrad(model: model, lossFunction), not the positional form valueAndGrad(model, lossFunction).

Module is a class, not a protocol: Subclass Module, don’t conform to a protocol named Module.

eval() placement: Call eval() after each training step or inference call to ensure Metal work is dispatched before proceeding with measurements or state updates.

Loading Weights

MLX Swift loads weights in safetensors format (.safetensors), which is standard for Hugging Face models:

let weights = try loadArrays(url: weightsURL)
model.update(parameters: ModuleParameters.unflattened(weights))
eval(model)

11. The Ecosystem Around MLX Swift

`mlx-swift-lm` (v3.31.3)

The LLM/VLM inference library. If you’re building anything involving language models in MLX Swift, this is your starting point. It provides model architecture definitions (Llama, Mistral, Phi, Gemma, Qwen, and more), tokenizer loading from Hugging Face, KV-cache management, streaming token generation, and LoRA/full fine-tuning support.

Important note on versioning: mlx-swift-lm is on a 3.x major version line, which introduced breaking changes from earlier versions. It does not share the same version number as mlx-swift itself. When adding it as a dependency, pin to a specific version and read the changelog before upgrading.

`mlx-swift-examples`

Complete, runnable example applications — more useful than documentation for understanding idiomatic MLX Swift patterns. Clone it, run the examples, then read the source.

The `ml-explore` Organization

MLX Swift sits within a broader ecosystem:

mlx (Python) — the original, with the most complete set of documented examples
mlx-data — data loading and preprocessing primitives
mlx-lm — the Python LLM library that mlx-swift-lm mirrors

MLX Swift (0.31.3) tracks the same version numbering as Python MLX. mlx-swift-lm has its own 3.x version line.

12. Limitations & Honest Caveats

Positioned as Research, Not Production

This bears repeating: the official positioning from the authors is that MLX is for research and experimentation, not production deployment in apps. Evaluate it accordingly — it’s a research toolkit, not a drop-in replacement for Core ML.

Metal GPU is Apple-Only

Metal shaders work only on Apple hardware. The CUDA backend (via CMake on Linux) provides GPU acceleration on NVIDIA hardware, and the CPU backend works everywhere. But the primary, optimized path is Apple Silicon + Metal.

`mlx-swift-lm` Breaking Changes

The 3.x major version of mlx-swift-lm introduced breaking API changes. If you’re following tutorials or examples written against older versions, check which version they target.

Still Actively Evolving

At ~420 commits and version 0.31.x, the API surface is maturing but hasn’t reached a 1.0 stability guarantee. Pin your dependency to a specific version in projects where stability matters.

Python Ecosystem Depth

Python MLX has a larger base of model ports, tutorials, and community examples. MLX Swift users may hit the leading edge sooner and find fewer existing answers to their specific questions.

Shape-Dependent Compilation

By default, compile() recompiles when input shapes change. compile(shapeless: true, ...) can reuse a compiled function across shape changes, but only where the graph does not depend on concrete input shapes.

13. The Bigger Picture

The Research-to-App Pipeline

The most compelling use case for MLX Swift isn’t replacing production inference — it’s collapsing the research-to-prototype loop for Swift developers. If you’re an iOS developer who wants to experiment with a custom model architecture, fine-tune a small LLM on a specific dataset, or build a local-first AI feature to evaluate before committing to a deployment strategy, MLX Swift lets you do all of that in the same language and toolchain you’ll eventually ship in.

The gap between “Python experiment” and “Swift prototype” has always involved a translation step — model format conversion, Core ML integration, shape debugging. MLX Swift removes that for the experimentation phase.

Where It Fits in the Stack

A reasonable way to think about the two frameworks together:

	MLX Swift	Core ML
Primary use	Research, experimentation, prototyping	Production deployment
Model input	Weights from Hugging Face, safetensors	`.mlpackage`, `.mlmodel`
Graph style	Dynamic, imperative	Static, compiled
LLM support	Yes, first-class via mlx-swift-lm	Yes, via stateful Core ML (WWDC24+)
App Store	Not the target	Fully supported
On-device model training or updates	General training workflows	Updates for eligible updatable models
Neural Engine	Not targeted	Yes

They’re complementary. The natural workflow is: experiment in MLX Swift → validate the approach → deploy via Core ML (or evaluate whether MLX Swift can stay in the stack for your specific use case).

Conclusion

MLX Swift is a well-designed Swift-native interface to a serious ML research framework, built around Apple Silicon’s unified memory architecture. Its purpose — as explicitly stated by its authors — is to make ML research and experimentation easier in Swift. Within that scope, it delivers: a clean API, close parity with Python MLX, genuine GPU acceleration via Metal, and a growing set of example applications covering LLMs, VLMs, image generation, and training.

It’s not a production deployment framework and isn’t trying to be. Understand that, and you’ll find it a capable and expressive tool for on-device ML research on Apple hardware.

MLX Swift v0.31.3 · mlx-swift-lm v3.31.3 · Last updated May 2026

send feedback

Top sites for iOS and Swift developersFeb 19, 2023

Mobile Development Landscape in 2019Sep 24, 2019

What's New in Swift 5.5Feb 14, 2023

Cracking the iOS interviewMay 06, 2024

iOS Architectural Patterns Deep Dive — MVC, MVVM, TCA, and BeyondJun 01, 2026

Swift Concurrency ModelJan 25, 2022

Swift Opaque TypesJan 27, 2022