TL;DR

Apfel is a CLI tool that exposes the ~3B parameter language model Apple already shipped on your Mac. One brew install, zero API keys, zero cost. It’s useful for shell one-liners, text summaries, and quick classification tasks, but the 4,096-token context window kills it for anything longer than a page. Think of it as a fast, private scratchpad AI, not a Claude or GPT replacement.

Your Mac Has Been Hiding an LLM

Starting with macOS 26 Tahoe, every Apple Silicon Mac ships with a language model baked into the OS. Apple uses it for Siri, Writing Tools, and system-level features through the FoundationModels framework. But until Apfel showed up on April 3, 2026, there was no clean way for developers to touch it directly.

Arthur-Ficial (the developer behind Apfel) wrapped LanguageModelSession in a Swift 6.3 binary and gave it three interfaces: a standard CLI with stdin/stdout, an OpenAI-compatible HTTP server, and an interactive chat mode. The whole thing went up on Hacker News, hit 513 points, and for good reason. It scratches an itch a lot of Mac developers didn’t realize they had.

I’ve been running it for three days. Here’s what I found.

Installation Takes 30 Seconds

brew install apfel

That’s it. No Python environments, no model downloads, no Docker containers. The model is already on your machine. Apfel just gives you a front door.

You need:

  • Apple Silicon Mac (M1 or later)
  • macOS 26 Tahoe
  • Apple Intelligence enabled in System Settings

Quick sanity check after install:

apfel --model-info

This dumps the model’s metadata: parameter count, context window, supported capabilities. If you see an error about Apple Intelligence not being enabled, flip the toggle in System Settings > Apple Intelligence & Siri.

What It Can Actually Do

Apfel runs in five modes: single prompt, streaming, interactive chat, OpenAI-compatible server, and internal benchmarks. The single-prompt mode is where I’ve spent most of my time.

Shell One-Liners

This is Apfel’s sweet spot. Need to remember the find syntax you google every single time?

apfel "find all .log files modified in the last 24 hours and compress them"

It spits back:

find . -name "*.log" -mtime -1 -exec gzip {} \;

Fast, correct, and I didn’t leave my terminal. For awk, sed, jq, xargs, all the commands where I know what I want but can’t remember the flags — Apfel nails it about 80% of the time. The other 20% it gets close enough that a quick edit fixes it.

Text Summarization

Pipe a file in, get a summary out:

cat README.md | apfel "summarize this in 3 bullet points"

Works well for short documents. But remember: 4,096 tokens total, input and output combined. A 3,000-word document eats most of the context window before the model even starts generating. For anything longer than a typical README, you’ll hit the wall.

Classification and Extraction

echo "The server returned a 503 after the deployment at 14:32 UTC" | apfel -j "classify: is this a bug report, feature request, or ops incident?"

The -j flag forces JSON output, which makes Apfel usable in scripts. I piped 50 GitHub issue titles through it and got reasonable classifications for about 85% of them. Not production-grade, but solid for quick triage.

File Attachments

apfel -f error.log "what went wrong here?"

You can attach multiple files with repeated -f flags. The model reads the content and answers questions about it. Again, context window is the limiting factor. A 200-line log file works fine. A 2,000-line one won’t fit.

The OpenAI-Compatible Server

Start it with:

apfel serve

This launches an HTTP server on localhost:11434 that speaks the OpenAI chat completions API. Any tool that works with the OpenAI SDK can point at this endpoint instead.

from openai import OpenAI

client = OpenAI(base_url="http://localhost:11434/v1", api_key="not-needed")
response = client.chat.completions.create(
    model="apple-local",
    messages=[{"role": "user", "content": "explain git rebase in one sentence"}]
)
print(response.choices[0].message.content)

I tested this with a few tools that support custom OpenAI endpoints. It works, but the small context window means you can’t use it as a drop-in replacement for GPT-4o or Claude in any agent workflow. It’s better suited for lightweight automations: commit message generation, code comment cleanup, that sort of thing.

Where It Falls Short

The “free AI on your Mac” pitch makes it sound bigger than it is. Here’s where the ceiling hits.

The 4,096-token context window is the biggest problem. That’s input and output combined, roughly 3,000 words total. Claude gives you 200K tokens. GPT-4o gives you 128K. Apfel gives you a sticky note. For anything requiring multi-turn conversation, referencing long documents, or generating long output, you’ll bounce off this wall immediately.

Math doesn’t work well either. I asked it to calculate compound interest and it fumbled the formula. Basic arithmetic is fine, multi-step word problems aren’t. Not surprising for a 3B model, but worth knowing.

Shell one-liners it handles well, but ask for a full Python function with error handling and it’s a coin flip. I wouldn’t trust it for anything beyond short snippets without reviewing every line.

On my M3 Pro MacBook, responses take 2-4 seconds for short prompts. Fine for interactive use, too slow for batch procesing hundreds of items. Cloud APIs return faster because datacenter GPUs are just bigger.

And you can’t customize it. No fine-tuning, no system prompts. You get Apple’s base model as-is, safety guardrails and all. It’ll refuse some requests that other local models would handle without complaint.

Apfel vs. Ollama vs. LM Studio

Everyone I’ve talked to asks the same thing: why not just run Ollama?

FeatureApfelOllamaLM Studio
Setup time30 seconds2-5 minutes5-10 minutes
Model downloadNone (pre-installed)2-50 GB per model2-50 GB per model
Context window4,096 tokensModel-dependent (up to 128K)Model-dependent
Model quality~3B Apple modelYour choice (Gemma 4, Llama, Qwen)Your choice
OpenAI-compatible APIYesYesYes
PrivacyFull on-deviceFull on-deviceFull on-device
CostFreeFreeFree (with paid tier)
GPU memory neededMinimal (built into OS)4-32 GB depending on model4-32 GB depending on model

Ollama with Gemma 4 26B will blow Apfel away on quality. LM Studio gives you a nice GUI and model management. Both let you run models with 32K-256K context windows.

But both require downloading multi-gigabyte model files and allocating significant GPU memory. Apfel uses a model that’s already on your disk, already loaded by the OS, and uses minimal additional resources. If you just want to convert natural language to a shell command without waiting for a 14 GB download, Apfel wins on friction.

My take: Apfel isn’t competing with Ollama. It’s the calculator app of local AI — always there when you need a quick answer, not the tool you reach for when the task is serious.

The Privacy Story

No telemetry. No analytics. No crash reporting. No phone-home behavior. The only network call Apfel makes is when you explicitly run apfel --update. I verified this by running it with Little Snitch active for two days. Zero outbound connections during normal use.

Every token is generated on your Apple Silicon chip. If you work with sensitive codebases or client data and can’t send prompts to cloud APIs, this is a real advantage. I know devs at consulting firms who’ve been waiting for exactly this kind of tool because their contracts prohibit sending code to third-party APIs.

Practical Workflows I’ve Been Using

The one I reach for most: git commit messages.

git diff --staged | apfel "write a concise commit message for this diff"

Gets the job done for small commits. Falls apart on large diffs that exceed the context window.

I’ve also been piping foreign-language logs through it for quick translations:

echo "Deployment erfolgreich abgeschlossen" | apfel "translate to English"

The Apple model supports multiple languages out of the box. Handy when you’re reading docs or error messages in a language you don’t speak fluently.

For log triage, the JSON mode is the real trick:

tail -50 /var/log/app.log | apfel -j "extract all error messages as a JSON array"

Structured output from unstructured logs, and it’s scriptable.

README drafting works too, if the project is small enough to fit in context:

apfel -f main.go -f go.mod "write a short README for this project"

It produces a reasonable first draft that I then edit. Saves maybe 10 minutes compared to starting from scratch.

Who Should Install This

If you meet all three conditions, Apfel is worth the 30-second install:

  1. You’re on an Apple Silicon Mac running macOS 26
  2. You frequently work in the terminal
  3. You want quick AI assists without leaving your shell or paying for API calls

It won’t replace Claude Code or Cursor for serious coding work, and it won’t replace cloud models for anything requiring real reasoning depth. But as a zero-friction utility that’s always one keystroke away? I’ve been reaching for it multiple times a day.

If you primarily need a local model for longer tasks like writing full functions, analyzing large files, or multi-turn debugging sessions, skip Apfel and set up Ollama with Gemma 4 or Qwen 3.5 instead. The quality gap is massive once you move beyond short prompts.

FAQ

Does Apfel work on Intel Macs?

No. It requires Apple Silicon (M1 or later) because the on-device model runs on the Neural Engine. Intel Macs don’t have the FoundationModels framework.

Can I use Apfel offline?

Yes, completely. The model runs on your hardware with no network dependency. I tested it in airplane mode. Works identically.

Is the Apple on-device model any good compared to GPT-4o or Claude?

For short tasks like shell commands and summaries, it punches above its weight for a 3B model. For anything requiring reasoning, long context, or code generation beyond snippets, cloud models are in a different league. It’s a fast, private utility, not a reasoning engine.

Can I swap in a different model?

No. Apfel specifically wraps Apple’s FoundationModels framework, which only exposes the system model. If you want to run other models locally, use Ollama, LM Studio, or llama.cpp.

Does it work with Cursor, VS Code, or other editors?

The OpenAI-compatible server means any editor extension that supports custom API endpoints could technically use it. But the 4,096-token context window makes it impractical for editor-integrated AI features that need to see your whole file.

Bottom Line

Apfel is the best zero-setup local AI tool I’ve used — because it doesn’t require any setup at all. The model is already on your Mac. Apfel just hands you the keys.

It’s not going to write your app for you. The 3B model and 4K context window make sure of that. But for the small stuff — shell commands you can’t remember, quick summaries, log parsing, text classification — it’s faster than opening a browser and cheaper than any API. Three days in, it’s earned a permanent spot in my terminal workflow. The brew install takes less time than you spent reading this paragraph.