Auto-Tagging in iOS with an OpenRouter LLM API

I’ve been building a iOS app called Drafts — a place to write and manage social posts before publishing. One problem I kept running into: every time I saved a draft, I had to manually write a title and pick tags. Boring, repetitive, and easy to skip.

So I automated it. Here’s how I built a background analyzer that uses an LLM to generate titles and tags for every draft — fully automatic, non-blocking, with retries and concurrency control.


The Architecture

Three moving parts:

  1. fast_analyze.js — a Vercel serverless function that proxies OpenRouter, handles rate limiting, and returns { title, tags, model } as JSON
  2. OpenRouterService.swift — a Swift actor that calls the endpoint
  3. DraftAnalyzer.swift — a @MainActor class that fetches unanalyzed drafts from SwiftData and runs the analysis pipeline

The Backend: A Vercel Proxy for OpenRouter

I didn’t want to ship an API key inside the app binary. Instead, I put a thin serverless function in front of OpenRouter.

// POST /api/fast_analyze
// Body: { text: "your draft here", model?: "..." }
// Returns: { title: "Short title", tags: ["#tag1", "#tag2"], model: "..." }

const MODEL = "openrouter/free";

export default async function handler(req, res) {
  // rate limiting, validation, etc.

  const response = await fetch("https://openrouter.ai/api/v1/chat/completions", {
    method: "POST",
    headers: { Authorization: `Bearer ${apiKey}`, "Content-Type": "application/json" },
    body: JSON.stringify({
      model: modelParam ?? MODEL,
      messages: [
        {
          role: "system",
          content: `You analyze social media drafts. Respond with valid JSON only, no markdown.
Format: { "title": "max 6 word title", "tags": ["#tag1", "#tag2", "#tag3"] }`,
        },
        { role: "user", content: `Analyze this draft:\n\n${text}` },
      ],
    }),
  });

  // parse, clean markdown fences, normalize tags
  return res.status(200).json({ title, tags, model: modelUsed });
}

A few things worth noting:

  • The system prompt is strict: JSON only, no explanation, no code fences. LLMs love to wrap responses in json even when you tell them not to — so I strip those defensively on the server side anyway.
  • The model parameter is optional. You can pass any OpenRouter-compatible model string from the client, which makes it easy to experiment without redeploying.
  • Rate limiting is per-IP, in-memory. Simple and enough for a personal app.

The Swift Side: An Actor for Network Calls

actor OpenRouterService {
    static let shared = OpenRouterService()
    private let baseURL = "https://your-vercel-app.vercel.app"

    func analyze(text: String, model: String? = "minimax/minimax-m2.5:free") async throws -> AnalyzeResponse {
        guard let url = URL(string: "\(baseURL)/api/fast_analyze") else {
            throw OpenRouterError.invalidURL
        }

        var body: [String: String] = ["text": text]
        if let model { body["model"] = model }

        var request = URLRequest(url: url)
        request.httpMethod = "POST"
        request.setValue("application/json", forHTTPHeaderField: "Content-Type")
        request.httpBody = try JSONSerialization.data(withJSONObject: body)

        let (data, _) = try await URLSession.shared.data(for: request)
        return try JSONDecoder().decode(AnalyzeResponse.self, from: data)
    }
}

Using actor here is the right call — it serializes access to the service’s internal state and plays well with Swift Concurrency without any manual locking.


The Analyzer: Concurrency, Retries, and SwiftData

This is where it all comes together. DraftAnalyzer is marked @MainActor because SwiftData’s ModelContext isn’t thread-safe — it must be accessed on the main thread. Without this, you’d risk crashes or data corruption the moment two tasks touch the context concurrently.

It fetches all drafts missing a title or tags, runs up to 3 concurrent analysis tasks using withTaskGroup, retries failed requests with exponential backoff, and normalizes tags into a consistent slug format. Let’s walk through each part.

Fetching and filtering

let descriptor = FetchDescriptor<Draft>(
    predicate: #Predicate { draft in draft.body != "" },
    sortBy: [SortDescriptor(\.updatedAt, order: .reverse)]
)
let allDrafts = try context.fetch(descriptor)
let drafts = allDrafts.filter { $0.title.isEmpty || $0.tags.isEmpty }

I fetch all non-empty drafts first, then filter in memory. This lets me also normalize existing tags in the same pass — something you can’t do cleanly inside a #Predicate.

Controlled concurrency

await withTaskGroup(of: Void.self) { group in
    let maxConcurrent = 3
    var active = 0

    for draft in drafts {
        if active >= maxConcurrent {
            await group.next()
            active -= 1
        }
        group.addTask {
            await self.analyzeOne(draft: draft, context: context)
        }
        active += 1
    }
}

withTaskGroup doesn’t have a built-in concurrency limit, so I manage it manually. When active hits the cap, I await one completion before adding the next task. Clean and easy to tune.

Retry with backoff

private func analyzeOne(draft: Draft, context: ModelContext, retries: Int = 2) async {
    for attempt in 1...max(1, retries + 1) {
        do {
            let result = try await OpenRouterService.shared.analyze(text: draft.body)
            draft.title = result.title
            draft.tags = normalizeTags(result.tags)
            try context.save()
            return
        } catch {
            if attempt <= retries {
                try? await Task.sleep(for: .seconds(Double(attempt) * 2))
            }
        }
    }
}

Each retry waits 2× longer than the previous one (2s, 4s). Free-tier LLM endpoints can be flaky — this makes the analyzer resilient without hammering the API.

Tag normalization

func normalizeTags(_ tags: [String]) -> [String] {
    tags.compactMap { tag in
        let slug = tag
            .lowercased()
            .replacing(/#/, with: "")
            .replacing(/\s+/, with: "-")
            .replacing(/[^a-z0-9\-]/, with: "")
            .replacing(/\-+/, with: "-")
            .trimmingCharacters(in: CharacterSet(charactersIn: "-"))
        return slug.isEmpty ? nil : "#\(slug)"
    }
}

The LLM returns tags inconsistently — sometimes #SwiftUI, sometimes swift ui, sometimes swift-ui. This normalizes everything to #swiftui regardless of input.


Wiring It Into the App

A ViewModifier triggers the analyzer on onAppear and whenever the app becomes active:

struct DraftAnalyzerModifier: ViewModifier {
    @Environment(\.scenePhase) private var scenePhase
    let container: ModelContainer

    func body(content: Content) -> some View {
        content
            .onAppear {
                DraftAnalyzer.shared.run(context: container.mainContext)
            }
            .onChange(of: scenePhase) { _, phase in
                if phase == .active {
                    DraftAnalyzer.shared.run(context: container.mainContext)
                }
            }
    }
}

extension View {
    func analyzeDraftsInBackground(container: ModelContainer) -> some View {
        modifier(DraftAnalyzerModifier(container: container))
    }
}

Usage is one line at the root of the app:

ContentView()
    .analyzeDraftsInBackground(container: modelContainer)

Every time you open the app, any draft without a title or tags gets quietly analyzed in the background.


What Model to Use

I’m currently using minimax/minimax-m2.5:free via OpenRouter for this task. It’s fast, free, and good enough for short-form classification. For longer or more complex drafts, openrouter/free will auto-route to whatever free model is available.

For production, you’d want to pin a specific model and possibly pay for it — free tiers can be slow under load.


Takeaways

  • Put the API key on the server, not in the app. A Vercel function costs nothing and keeps your credentials safe.
  • actor is the right tool for shared network services in Swift Concurrency. No locks, no races.
  • Manual concurrency caps with withTaskGroup are simple and effective. Don’t spawn unbounded tasks against a rate-limited API.
  • LLMs are inconsistent formatters. Always sanitize their output — strip code fences, normalize tags, handle empty fields gracefully.

The whole thing is maybe 150 lines of Swift and 80 lines of JavaScript. A small investment for a feature that just works every time you open the app.