Local AI in Your Terminal: Scripting with Apple’s New fm CLI and MLX

macOS 27 added a quiet but genuinely useful tool: fm, a built-in command that talks to the on-device model behind Apple Intelligence right from your shell. Pair it with the mlx_lm CLI for open Hugging Face models, and your terminal turns into a private, offline AI workbench. This is also the natural sequel to the local transcription pipeline from the last article — we’ll wire the two together.

Two local engines, one terminal

Until now, scripting AI on a Mac meant either calling a paid cloud API or installing a separate runtime. WWDC 2026 changed the default. There are now two genuinely local options that live happily side by side in your shell:

fm — a command-line tool pre-installed with macOS 27 that prompts Apple’s on-device Foundation Model (and, optionally, the Private Cloud Compute model). No install, no API key, no account. Great for everyday text tasks.
mlx_lm — the MLX language-model CLI (a quick pip install away) that runs any open model from the Hugging Face MLX community on your GPU. Use it when you want a specific model, a larger one, or full control.

The distinction matters and it’s worth being precise: fm runs Apple’s model — you don’t pick the weights. When you need a particular open model, that’s mlx_lm’s job. This article uses both for what each does best.

Before you start

Everything here is beta software (macOS 27 developer beta, June 2026), so command flags may shift before the fall release — run fm --help to confirm. You’ll need:

macOS 27 on an Apple Silicon Mac. fm is preinstalled; if it’s missing, install the Xcode 27 beta.
mlx_lm for the open-model half: pip install mlx-lm.
A terminal and basic shell familiarity. The structured-output examples use jq (brew install jq).

Part 1 — fm basics

Two modes. Interactive:

fm chat

This opens a conversation with the on-device model. Within a chat you can switch between the on-device and Private Cloud Compute models and save a session to resume later — handy for longer back-and-forths.

For scripting, the workhorse is fm respond, which takes a prompt and prints the answer:

fm respond "Give me three subject-line options for an email announcing a team offsite."

That single line is the foundation of everything that follows.

Part 2 — Structured output you can pipe

Plain text is hard to script against. fm can emit structured JSON if you give it a schema, which you generate with fm schema. Define the shape, then ask for output that conforms to it:

# 1. Build a schema describing the output you want
fm schema object --name ActionItems --string items --array > schema.json

# 2. Ask for output that conforms, then parse it with jq
fm respond "Extract the action items from this note: \
We agreed Maria ships the API by Friday and Tom reviews the designs Monday." \
  --schema schema.json | jq -r '.items[]'

Because the output is valid JSON, you can pipe it straight into jq, a loop, or another tool. This is what turns fm from a chatbot into a scripting primitive.

Part 3 — Pick up where transcription left off

If you followed the earlier article, you have a folder of .txt transcripts produced by MLX Whisper. Now you can summarize them locally, with no cloud round-trip. Drop the prompt’s file content in with command substitution:

fm respond "Summarize this meeting transcript in three bullet points, \
then list any action items:

$(cat interview.txt)"

Turn that into a batch script that summarizes every transcript in a folder:

#!/usr/bin/env bash
set -euo pipefail

TRANSCRIPTS=~/Transcripts
SUMMARIES=~/Summaries
mkdir -p "$SUMMARIES"

for txt in "$TRANSCRIPTS"/*.txt; do
    name=$(basename "$txt" .txt)
    out="$SUMMARIES/$name.summary.md"
    [ -f "$out" ] && { echo "skip: $name"; continue; }

    echo "summarizing: $name"
    fm respond "Summarize this transcript in 3 bullets, then list action items:

$(cat "$txt")" > "$out"
done

echo "done."

The whole chain — audio in, structured summary out — now runs on your laptop with nothing leaving the machine.

Part 4 — Reach for Private Cloud Compute on the hard ones

The on-device model is fast and free but small. When a transcript is long or the reasoning is demanding, switch to Apple’s Private Cloud Compute model with a single flag — same privacy posture, more capability, still no API key:

fm respond "Write a detailed summary with decisions, owners, and deadlines:

$(cat all-hands.txt)" --model pcc

fm also accepts images, so multimodal prompts work the same way:

fm respond "What's shown in this screenshot?" --image screenshot.png --model pcc

Part 5 — When you need a specific model: mlx_lm

fm is deliberately opinionated — you get Apple’s model. When you want a particular open model (a coding-tuned model, a multilingual one, something larger, or just a model you can pin and reproduce), use the mlx_lm CLI. One-shot generation:

mlx_lm.generate \
  --model mlx-community/Qwen3.5-4B-4bit \
  --prompt "Rewrite this commit message to be clearer: 'fix stuff'" \
  --max-tokens 200

Interactive chat with an open model:

mlx_lm.chat --model mlx-community/Qwen3.5-4B-4bit

The first run downloads and caches the model; after that it’s fully offline. Because it’s a normal command, it pipes like anything else:

cat changes.diff | mlx_lm.generate \
  --model mlx-community/Qwen3.5-4B-4bit \
  --prompt "Write a concise PR description for this diff:" \
  --max-tokens 300

A simple rule of thumb: reach for fm first for everyday text tasks (it’s instant and built in), and switch to mlx_lm when the task needs a specific or larger model.

Part 6 — The full local pipeline in one script

Here’s the payoff: a single script that takes an audio file and produces a tagged, summarized note — transcription via MLX Whisper, summary via fm, and topic tags via an open mlx_lm model. Three local engines, zero cloud calls.

#!/usr/bin/env bash
set -euo pipefail

AUDIO="$1"                       # e.g. ./standup.m4a
STEM=$(basename "$AUDIO" | sed 's/\.[^.]*$//')
OUT="${STEM}.note.md"

echo "1/3 transcribing with MLX Whisper..."
mlx_whisper "$AUDIO" \
  --model mlx-community/whisper-large-v3-turbo \
  --output-name "$STEM" --output-format txt
TRANSCRIPT=$(cat "${STEM}.txt")

echo "2/3 summarizing with the on-device model (fm)..."
SUMMARY=$(fm respond "Summarize in 3 bullets and list action items:

$TRANSCRIPT")

echo "3/3 tagging with an open MLX model..."
TAGS=$(mlx_lm.generate \
  --model mlx-community/Qwen3.5-4B-4bit \
  --prompt "Return 3-5 comma-separated topic tags for this text: $TRANSCRIPT" \
  --max-tokens 40)

{
  echo "# $STEM"
  echo
  echo "**Tags:** $TAGS"
  echo
  echo "## Summary"
  echo "$SUMMARY"
  echo
  echo "## Full transcript"
  echo "$TRANSCRIPT"
} > "$OUT"

echo "wrote $OUT"

Run it with ./note.sh standup.m4a and you get a tidy Markdown note ready to drop into Obsidian, Notion, or a Git repo — generated entirely on your own hardware.

Gotchas

It’s beta. fm flags (--model, --schema, --image, --instructions) reflect the macOS 27 beta. Run fm respond --help to confirm current syntax.
--model pcc needs a network connection. Private Cloud Compute is remote (though private); the on-device default and mlx_lm are fully offline after any model download.
Quote and escape carefully. When injecting file contents with $(cat file), very large files can blow past the on-device model’s context window — switch to --model pcc or chunk the input.
First-run model downloads. Both fm’s PCC path and mlx_lm’s first call need network access; plan for the initial delay.
Memory. Large mlx_lm models can strain RAM. Stay in the 3–8B range on a 16 GB Mac, or use a more aggressively quantized variant.

Wrapping up

The terminal is where automation actually lives, and as of macOS 27 it has two complementary local AI engines: fm for instant, built-in access to Apple’s model, and mlx_lm for the full open-model ecosystem. Between them you can summarize, extract, classify, and generate — in shell scripts, cron jobs, and pipelines — without an API key or a single byte leaving your Mac.

Start by running fm chat to feel out the on-device model, then wire fm respond into one script you run often. The transcription-to-note pipeline above is a good first target.

All commands reflect the macOS 27 developer beta as of June 2026 and may change before the public release. Check fm --help and the WWDC 2026 session “Build AI-powered scripts with the fm CLI and Python SDK” for the authoritative reference.

Local AI in Your Terminal: Scripting with Apple's New fm CLI and MLX

macOS 27 ships a built-in fm command for the on-device Apple model. Paired with the mlx_lm CLI for open Hugging Face models, your terminal becomes a private, offline AI workbench. With copy-paste scripts.