Auto-Tagging in iOS with an OpenRouter LLM API

May 09, 2026#AI #LLM #iOS #SwiftData #Swift

I’ve been building a iOS app called Drafts — a place to write and manage social posts before publishing. One problem I kept running into: every time I saved a draft, I had to manually write a title and pick tags. Boring, repetitive, and easy to skip.

So I automated it. Here’s how I built a background analyzer that uses an LLM to generate titles and tags for every draft — fully automatic, non-blocking, with retries and concurrency control.

The Architecture

Three moving parts:

fast_analyze.js — a Vercel serverless function that proxies OpenRouter, handles rate limiting, and returns { title, tags, model } as JSON
OpenRouterService.swift — a Swift actor that calls the endpoint
DraftAnalyzer.swift — a @MainActor class that fetches unanalyzed drafts from SwiftData and runs the analysis pipeline

The Backend: A Vercel Proxy for OpenRouter

I didn’t want to ship an API key inside the app binary. Instead, I put a thin serverless function in front of OpenRouter.

// POST /api/fast_analyze
// Body: { text: "your draft here", model?: "..." }
// Returns: { title: "Short title", tags: ["#tag1", "#tag2"], model: "..." }

const MODEL = "openrouter/free";

export default async function handler(req, res) {
  // rate limiting, validation, etc.

  const response = await fetch("https://openrouter.ai/api/v1/chat/completions", {
    method: "POST",
    headers: { Authorization: `Bearer ${apiKey}`, "Content-Type": "application/json" },
    body: JSON.stringify({
      model: modelParam ?? MODEL,
      messages: [
        {
          role: "system",
          content: `You analyze social media drafts. Respond with valid JSON only, no markdown.
Format: { "title": "max 6 word title", "tags": ["#tag1", "#tag2", "#tag3"] }`,
        },
        { role: "user", content: `Analyze this draft:\n\n${text}` },
      ],
    }),
  });

  // parse, clean markdown fences, normalize tags
  return res.status(200).json({ title, tags, model: modelUsed });
}

A few things worth noting:

The system prompt is strict: JSON only, no explanation, no code fences. LLMs love to wrap responses in json even when you tell them not to — so I strip those defensively on the server side anyway.
The model parameter is optional. You can pass any OpenRouter-compatible model string from the client, which makes it easy to experiment without redeploying.
Rate limiting is per-IP, in-memory. Simple and enough for a personal app.

The Swift Side: An Actor for Network Calls

actor OpenRouterService {
    static let shared = OpenRouterService()
    private let baseURL = "https://your-vercel-app.vercel.app"

    func analyze(text: String, model: String? = "minimax/minimax-m2.5:free") async throws -> AnalyzeResponse {
        guard let url = URL(string: "\(baseURL)/api/fast_analyze") else {
            throw OpenRouterError.invalidURL
        }

        var body: [String: String] = ["text": text]
        if let model { body["model"] = model }

        var request = URLRequest(url: url)
        request.httpMethod = "POST"
        request.setValue("application/json", forHTTPHeaderField: "Content-Type")
        request.httpBody = try JSONSerialization.data(withJSONObject: body)

        let (data, _) = try await URLSession.shared.data(for: request)
        return try JSONDecoder().decode(AnalyzeResponse.self, from: data)
    }
}

Using actor here is the right call — it serializes access to the service’s internal state and plays well with Swift Concurrency without any manual locking.

The Analyzer: Concurrency, Retries, and SwiftData

This is where it all comes together. DraftAnalyzer is marked @MainActor because SwiftData’s ModelContext isn’t thread-safe — it must be accessed on the main thread. Without this, you’d risk crashes or data corruption the moment two tasks touch the context concurrently.

It fetches all drafts missing a title or tags, runs up to 3 concurrent analysis tasks using withTaskGroup, retries failed requests with exponential backoff, and normalizes tags into a consistent slug format. Let’s walk through each part.

Fetching and filtering

let descriptor = FetchDescriptor<Draft>(
    predicate: #Predicate { draft in draft.body != "" },
    sortBy: [SortDescriptor(\.updatedAt, order: .reverse)]
)
let allDrafts = try context.fetch(descriptor)
let drafts = allDrafts.filter { $0.title.isEmpty || $0.tags.isEmpty }

I fetch all non-empty drafts first, then filter in memory. This lets me also normalize existing tags in the same pass — something you can’t do cleanly inside a #Predicate.

Controlled concurrency

await withTaskGroup(of: Void.self) { group in
    let maxConcurrent = 3
    var active = 0

    for draft in drafts {
        if active >= maxConcurrent {
            await group.next()
            active -= 1
        }
        group.addTask {
            await self.analyzeOne(draft: draft, context: context)
        }
        active += 1
    }
}

withTaskGroup doesn’t have a built-in concurrency limit, so I manage it manually. When active hits the cap, I await one completion before adding the next task. Clean and easy to tune.

Retry with backoff

private func analyzeOne(draft: Draft, context: ModelContext, retries: Int = 2) async {
    for attempt in 1...max(1, retries + 1) {
        do {
            let result = try await OpenRouterService.shared.analyze(text: draft.body)
            draft.title = result.title
            draft.tags = normalizeTags(result.tags)
            try context.save()
            return
        } catch {
            if attempt <= retries {
                try? await Task.sleep(for: .seconds(Double(attempt) * 2))
            }
        }
    }
}

Each retry waits 2× longer than the previous one (2s, 4s). Free-tier LLM endpoints can be flaky — this makes the analyzer resilient without hammering the API.

Tag normalization

func normalizeTags(_ tags: [String]) -> [String] {
    tags.compactMap { tag in
        let slug = tag
            .lowercased()
            .replacing(/#/, with: "")
            .replacing(/\s+/, with: "-")
            .replacing(/[^a-z0-9\-]/, with: "")
            .replacing(/\-+/, with: "-")
            .trimmingCharacters(in: CharacterSet(charactersIn: "-"))
        return slug.isEmpty ? nil : "#\(slug)"
    }
}

The LLM returns tags inconsistently — sometimes #SwiftUI, sometimes swift ui, sometimes swift-ui. This normalizes everything to #swiftui regardless of input.

Wiring It Into the App

A ViewModifier triggers the analyzer on onAppear and whenever the app becomes active:

struct DraftAnalyzerModifier: ViewModifier {
    @Environment(\.scenePhase) private var scenePhase
    let container: ModelContainer

    func body(content: Content) -> some View {
        content
            .onAppear {
                DraftAnalyzer.shared.run(context: container.mainContext)
            }
            .onChange(of: scenePhase) { _, phase in
                if phase == .active {
                    DraftAnalyzer.shared.run(context: container.mainContext)
                }
            }
    }
}

extension View {
    func analyzeDraftsInBackground(container: ModelContainer) -> some View {
        modifier(DraftAnalyzerModifier(container: container))
    }
}

Usage is one line at the root of the app:

ContentView()
    .analyzeDraftsInBackground(container: modelContainer)

Every time you open the app, any draft without a title or tags gets quietly analyzed in the background.

What Model to Use

I’m currently using minimax/minimax-m2.5:free via OpenRouter for this task. It’s fast, free, and good enough for short-form classification. For longer or more complex drafts, openrouter/free will auto-route to whatever free model is available.

For production, you’d want to pin a specific model and possibly pay for it — free tiers can be slow under load.

Takeaways

Put the API key on the server, not in the app. A Vercel function costs nothing and keeps your credentials safe.
actor is the right tool for shared network services in Swift Concurrency. No locks, no races.
Manual concurrency caps with withTaskGroup are simple and effective. Don’t spawn unbounded tasks against a rate-limited API.
LLMs are inconsistent formatters. Always sanitize their output — strip code fences, normalize tags, handle empty fields gracefully.

The whole thing is maybe 150 lines of Swift and 80 lines of JavaScript. A small investment for a feature that just works every time you open the app.

send feedback