Core ML is Apple’s native inference engine for Apple platforms. ONNX Runtime is Microsoft’s cross-platform inference engine with official Apple Silicon support via the Core ML execution provider. Both can run models on the same hardware, but they take different paths to get there.
This comparison covers the technical differences: how each framework maps operations to Apple Silicon hardware, conversion workflows, operator coverage, performance characteristics, and real-world tradeoffs for shipping a model on macOS or iOS.
Core ML takes a model in its compiled .mlmodelc format and distributes operations across the CPU, GPU, and Neural Engine using a runtime planner. The planner decides which ops run on which processor based on the model’s structure and the capabilities of the device.
Model (.mlpackage)
→ coremltools converts from PyTorch/TF
→ Model is compiled to .mlmodelc at build time
→ Core ML runtime dispatches to CPU/GPU/ANEThe key advantage: the conversion + compilation step allows Core ML to optimize the execution plan, including fusing adjacent operations and selecting compatible compute units.
ONNX Runtime loads a model in ONNX format and executes it through a configurable set of execution providers. On Apple Silicon, the relevant providers are:
ONNX model (.onnx)
→ ONNX Runtime loads the model
→ Session configured with execution providers
→ Each op is dispatched to the first provider that handles itThe typical configuration on Apple Silicon uses the Core ML provider as the primary execution backend, falling back to CPU for ops Core ML does not support.
import coremltools as ct
# Convert a TorchScript model
model = ct.convert(
traced_model,
inputs=[ct.TensorType(shape=(1, 3, 224, 224))],
minimum_deployment_target=ct.target.iOS18
)
model.save("Model.mlpackage")The unified coremltools conversion API supports PyTorch and TensorFlow source models. The older ONNX-to-Core ML converter is frozen and no longer maintained. If your source model is ONNX, either export from the original framework for Core ML conversion or run the ONNX model with ONNX Runtime.
Microsoft publishes an official ONNX Runtime Swift package that exposes the Objective-C bindings to Swift:
dependencies: [
.package(
url: "https://github.com/microsoft/onnxruntime-swift-package-manager",
from: "1.24.2"
)
]import OnnxRuntimeBindings
let env = try ORTEnv(loggingLevel: .warning)
let options = try ORTSessionOptions()
try options.appendExecutionProvider(
"CoreML",
providerOptions: ["ModelFormat": "MLProgram"]
)
let session = try ORTSession(
env: env,
modelPath: modelPath,
sessionOptions: options
)ONNX Runtime loads the .onnx file directly. The Core ML provider converts and compiles supported subgraphs internally, while unsupported nodes fall back to the default CPU provider.
Benchmark results vary by model architecture. General observations:
| Scenario | Core ML | ONNX Runtime (Core ML EP) |
|---|---|---|
| ANE-compatible model (e.g., MobileNet) | Can use ANE | Can delegate supported subgraphs to Core ML |
| GPU-compatible model | Can use GPU | Can delegate supported subgraphs to Core ML |
| Unsupported operations | May require a different export strategy | Uses CPU fallback when a compatible kernel exists |
| Mixed-precision quantization | Yes (FP16, INT8, palettization) | Depends on model and provider support |
| Startup time | Compile before shipping or when loading a model | Core ML EP compilation can add startup cost |
| Binary size | Small runtime footprint (system framework) | Larger (includes ONNX Runtime library + .onnx) |
Core ML has direct access to the Neural Engine planner. For models that fit the ANE’s constraints, Core ML can be faster and more power-efficient than CPU fallback.
ONNX Runtime with the Core ML provider can approach native Core ML performance when most of the graph is delegated. Measure your actual model: unsupported nodes can split the graph into partitions and add overhead.
For models with operations that the Core ML provider cannot delegate, ONNX Runtime can still execute remaining nodes with its default CPU provider. This makes ONNX Runtime more flexible, but mixed-provider execution is not automatically faster.
ONNX Runtime also supports dynamic shapes. The Core ML provider allows dynamic shapes by default, but its documentation warns that they may reduce performance. Benchmark variable-length workloads on target devices.
Core ML and ONNX use different model representations and operator sets. The coremltools converter handles many common PyTorch and TensorFlow operations, but unsupported operations may require a different export strategy or custom handling.
ONNX Runtime supports a broad range of ONNX operators. If an op is not handled by the Core ML provider, it falls through to the default CPU provider when a compatible CPU kernel exists. Custom operators still require registration.
The ONNX Runtime Core ML provider publishes separate supported-op lists for its NeuralNetwork and MLProgram formats. Support also depends on attributes and shape constraints. Common limitations include:
| Op / Pattern | Core ML provider limitation |
|---|---|
| Convolution and pooling | Some variants and dimensions are unsupported |
| Resize | Only specific combinations of attributes are supported |
| Slice | Several inputs must be constant |
| MatMul / Gemm | Some inputs or attributes are constrained |
| Control flow | Delegation inside If, Loop, and Scan requires EnableOnSubgraphs |
| Dynamic shape inputs | Allowed by default, but may reduce performance |
ONNX Runtime lets you prioritize providers and configure per-provider options:
let options = try ORTSessionOptions()
try options.appendExecutionProvider("CoreML", providerOptions: [
"ModelFormat": "MLProgram",
"MLComputeUnits": "ALL",
"RequireStaticInputShapes": "0",
"EnableOnSubgraphs": "0",
])
// CPU is the default fallback for nodes not assigned to Core ML.Execution-provider priority matters when you register multiple providers. Each provider claims compatible nodes or subgraphs in priority order. The default CPU provider handles compatible nodes that remain unassigned.
For complex models, set ModelCacheDirectory so the compiled Core ML subgraphs can be reused. Without a cache, Core ML EP compilation can add significant startup time.
| Framework | App bundle impact |
|---|---|
| Core ML | Runtime is provided by the operating system |
| ONNX Runtime | Runtime library must be bundled with the app |
| Model file | Measure the exported .onnx or compiled Core ML model for your model |
Core ML wins on binary size because it is a system framework. ONNX Runtime must be bundled with the app.
A practical pattern: use ONNX Runtime as the primary engine for flexibility, with the Core ML execution provider for hardware acceleration of supported subgraphs. Profile the result because graph partitioning and Core ML compilation can outweigh the acceleration benefit for some models.
import OnnxRuntimeBindings
import Foundation
class InferenceService {
private let env: ORTEnv
private var session: ORTSession?
init(modelURL: URL) throws {
env = try ORTEnv(loggingLevel: .warning)
let options = try ORTSessionOptions()
try options.appendExecutionProvider("CoreML", providerOptions: [
"ModelFormat": "MLProgram",
"MLComputeUnits": "ALL",
])
session = try ORTSession(
env: env,
modelPath: modelURL.path,
sessionOptions: options
)
}
func run(input: [Float], shape: [NSNumber]) throws -> [Float] {
guard let session else { throw ServiceError.notInitialized }
let inputData = NSMutableData(
bytes: input,
length: input.count * MemoryLayout<Float>.stride
)
let inputTensor = try ORTValue(
tensorData: inputData,
elementType: .float,
shape: shape
)
let outputs = try session.run(
withInputs: ["input": inputTensor],
outputNames: Set(["output"]),
runOptions: nil
)
guard let outputTensor = outputs["output"],
let outputData = try? outputTensor.tensorData() else {
throw ServiceError.inferenceFailed
}
return outputData.withUnsafeBytes { buffer in
Array(buffer.bindMemory(to: Float.self))
}
}
enum ServiceError: Error {
case notInitialized
case inferenceFailed
}
}