cited / about
Engineering discipline made visible
A daily AI-engineering news briefing where every summary has a citation, and topics that find no credible source are refused rather than hallucinated. The point isn’t another AI digest — it’s the engineering discipline visible in the code.
how a briefing is made
Every day a Temporal workflow runs five focused web-search calls — one per topic — in priority-rank order. The first topic gets the field; each subsequent topic is told which URLs are already taken so it routes around them. Each call is forced through a JSON-schema-shaped tool so the model can’t return free-form prose: it must produce one short sentence, one URL, or refuse.
Two providers run on every topic in parallel — Anthropic’s Claude (canonical) and OpenAI’s GPT (comparative). Only the canonical set is published. The comparative set persists for the methodology post that comes after several weeks of running.
why each pattern is here
- Eval gate before publish. Five blocking checks run between the LLM call and the database write — citation completeness, dead-link probe, recency, URL uniqueness, source diversity. A run that fails any of them gets
runs.status='failed_eval'and no briefing row. The worst version of this project is one that demonstrates the failure mode the companion article warns about, on the author’s own domain. That cannot ship. - Refusal as a first-class outcome. If the LLM finds no credible primary source for a topic in the window, it returns zero items with
refused: true. The slot is empty on the page — graceful degrade — and the run is logged as refused, not failed. Five real items beats five invented ones. - Three idempotency layers.A re-trigger of the same date is a no-op: Temporal’s
REJECT_DUPLICATEpolicy on the workflow ID catches the parent;runs.idempotency_key UNIQUEcatches individual children;briefings (topic, date, provider) UNIQUEcatches the publish step. Cron retries can never double-publish. - Cost tracking surfaced in the UI. Every LLM call writes to
cost_logs; the daily total appears in the footer of the briefing. A weekly budget kill-switch aborts new runs before they exceed it. Cost is a feature, not a hidden ops concern. - Pre-flight evals run in CI. A small fixture set in
evals/fixtures/exercises every blocking check against known-good and known-bad inputs.npm testguards the check logic itself before any deploy.
the methodology post
After four-to-six weeks of cited running daily, the comparative dataset (Anthropic vs. OpenAI on the same prompts and the same time windows) becomes the source data for a long-form post about what each web-search API actually gets right and wrong. That post lives at https://www.jeffreyquan.com when ready.