How 25 Pinball Machines Talk to a Database (Without Anyone Touching a Keyboard)

How 25 Pinball Machines Talk to a Database (Without Anyone Touching a Keyboard)
pinball-ingestion-pipeline.html

How 25 Pinball Machines Talk to a Database (Without Anyone Touching a Keyboard)

It’s a Tuesday night at Midwest Aleworks. Someone — I’ll call her Pinball Tina — just nudged a 4.2 billion score on Iron Maiden. She doesn’t know me. I don’t know her. I’m not even in the building. But within ten minutes, her score is on a public leaderboard, her name is ranked against 329 other players, and a kiosk in the corner of the bar has updated itself to show the new standings.

Nobody touched a keyboard. Nobody pushed a button. Nobody is watching a pipeline. The whole thing is just… running. Has been for 180 days. Through 7 monthly tournaments. Across 25 active machines. With zero gaps.

This article is about the plumbing. The unsexy, beautiful, flawless plumbing.

Heads up: This is a technical article. PostgreSQL, n8n, API contracts, that sort of thing. If those words don’t ring a bell, start with n8n: A Homelab Superpower, But Handle With Care — it’s the prereq.

What you’re about to read isn’t a how-to for building a pinball leaderboard. There’s exactly one of these in the world and Dennis at Level Up Pinball already runs it. What you’re going to get is the design pattern: how to take two ugly, mismatched data sources, glue them together with cron and an opinion, and produce something that quietly works while you sleep.


$ tree pipeline/

The Big Picture

Here’s the architecture, end to end:

High-Level Architecture: Pinball Ingestion Pipeline Architecture diagram showing how 25 pinball machines send scores through two APIs (Stern and IScored) into n8n workflows, which feed a shared PostgreSQL database, serving production and dev Flask apps, ultimately displaying on the bar kiosk. $ cat architecture/high-level.svg ▸ DATA SOURCES 25 Pinball Machines at Midwest Aleworks 10 IScored 15 Stern-only ▸ CLOUD APIs Stern Insider API Auth players · clean IDs IScored API Freeform names · own IDs ▸ INGESTION (n8n) Stern Ingestion Workflow Dual trigger: 60m baseline + 10m peak (10am–midnight) ~94 calls/day IScored Ingestion Workflow Fixed 10m interval, 24/7 + player matching branch ~144 calls/day ▸ STORAGE PostgreSQL (shared) One DB · dev + prod read same data ▸ APPLICATION Flask (prod) pinball.thedelay.com Flask (dev) (internal only) Bar Kiosk Chromium fullscreen auto-refresh
End-to-end architecture: 25 machines → two cloud APIs → n8n workflows → shared PostgreSQL → Flask + bar kiosk.

Two data sources at the top. Two completely different APIs run by two completely different organizations who do not talk to each other and have no reason to. n8n in the middle, doing the dirty work. PostgreSQL on the back end, hoarding every score. A small Flask app reading from PostgreSQL and serving the leaderboard at pinball.thedelay.com. And a kiosk — the bulletproof kiosk I wrote about earlier — pulling that page in a locked-down browser so customers can see the standings without me babysitting it.

The data flows in one direction. There’s no admin app for the bar staff to update. There’s no daily export from Dennis’s phone. The machines themselves emit scores. The scores reach my database within minutes. The leaderboard recomputes. The kiosk re-renders. Then everyone goes back to drinking.

What is n8n and why it’s my homelab glue

If you skipped the prereq article, here’s the ten-second version: n8n is a self-hosted, visual workflow automation tool. Drag nodes onto a canvas, wire them together, hit save. Each node does one thing — HTTP request, database query, transform a payload, send an email. The workflow runs on a schedule, on a webhook, or on demand.

Think of it as Zapier you can host yourself. Or, if you’re a sysadmin, think of it as a graph editor on top of cron, curl, jq, psql, and an SMTP client. That’s roughly what’s happening under the hood. The visual editor saves you from writing the bash glue yourself, and from writing the YAML to describe the bash glue, and from writing the playbook to deploy the YAML.

I picked n8n for the homelab for a few specific reasons:

  • Self-hosted. It runs in a Docker container on my own hardware. No data leaves the building unless I explicitly send it somewhere.
  • Community edition is free. Not a trial. Not a freemium upsell. Free with sensible limits.
  • MCP server exists. I can manage workflows from Claude Code without clicking through the UI. That alone saves me hours per month.
  • It has the right primitives. HTTP, SQL, schedule, webhook, code, and a respectable library of pre-built integrations. That covers ~90% of what a homelab needs.

What it isn’t: a programming environment. The Code nodes are real JavaScript, but you don’t get a debugger, you don’t get version history (in community edition), you don’t get linting. If your workflow grows past 25 nodes you’re going to feel it. The visual editor goes from “elegant” to “spaghetti” somewhere around node 30, and you start wishing you’d just written the whole thing in Python.

The pinball pipeline lives well below that complexity ceiling. The Stern workflow has 14 nodes. The IScored workflow has 8. Both are easy to read at a glance. That’s the right size for n8n.

If you’re new to it: install via Docker, build something tiny, then back up your work religiously. I wrote a whole article on the backup workflow — read that next, before you build anything you’d be sad to lose.

The interesting part isn’t fetching scores. Anyone can fetch scores — it’s an HTTP request. The interesting part is figuring out who scored them. Stern’s API tells you exactly which player set a score because Stern’s machines force you to log into a Stern Insider account before you start playing. IScored doesn’t. IScored lets you type whatever name you want at game start, like a 1990s arcade machine. So one data source has clean identity, and the other is a garbage fire of free-text user input that I have to reconcile against a real player database.

That’s where the work is. Everything else is plumbing.


$ cat workflows/stern-ingestion.json | jq

The Stern Pipeline

Stern is the easy one. Authenticated players, consistent machine IDs, a JSON response that doesn’t change shape from one week to the next. The whole pipeline looks like this:

Stern Ingestion Pipeline: Node-by-Node Workflow Detailed n8n workflow showing dual schedule triggers merging into a shared pipeline that splits into Event Metadata and Score Data paths, merges, recalculates leaderboard, and includes an error branch to Gmail and Pushinator. $ cat workflows/stern-ingestion.svg ▸ TRIGGERS (dual cron, shared pipeline) ⏱ Schedule: Baseline Every 60 min · 24/7 ⏱ Schedule: Peak Hours */10 10-23 * * * (every 10m) ▸ FETCH 🌐 Fetch Stern API Data HTTP Request → JSON ▸ PARALLEL EXTRACT 📋 Extract Event Metadata Tournament codes, dates 🎯 Extract Score Data Players, machines, scores ▸ UPSERT TO PostgreSQL 💾 Upsert Events events + api_snapshots 💾 Upsert Scores players + machines → Check for Higher Score → Insert New High Score ▸ MERGE & RECALCULATE ⏳ Wait for Both Paths 📊 Recalculate Leaderboard 📸 Insert History Snapshot ▸ ERROR HANDLING ⚠ Error Trigger ✉ Gmail Alert 📱 Pushinator on failure
The Stern ingestion workflow. Two parallel schedule triggers feed a single fetch node, which fans out into events and scores paths, then merges and recomputes.

There’s an HTTP node that hits the Stern Insider leaderboard API. The response goes through a Code node that splits it into events metadata and individual scores. The events go to one Postgres upsert, the scores fan out through player + machine upserts, then a “did this player improve?” check, then the score insert. After both paths finish, a final Postgres node recomputes the cached leaderboard and snapshots it to a history table. If anything in that chain fails, an Error Trigger fans out to Gmail and Pushinator so I get yelled at on my phone.

The interesting part is at the front: there are two schedule trigger nodes both pointing at the same downstream pipeline.

Trigger Cadence Active
Schedule Every 60 Minutes Hourly 24/7 baseline
Peak Hours Every 10 Min */10 10-23 * * * 10:00 AM — 11:50 PM

That’s the whole “adaptive scheduling” mechanism. Two crons. Same pipe. No state machine, no business-hours sensor, no IF nodes asking “is the bar open right now?” Just two schedules pointed at the same fetch node, doing the polite thing during the day and the lazy thing overnight.

The result: roughly 94 Stern API calls per day. The naive “every minute, all the time” cron would be 1,440. I’d be a jerk to Stern’s API and burn 15x the calls for zero user-facing benefit.

What Stern’s API actually returns

The Stern Insider leaderboard API is a single GET endpoint:

https://api.prd.sternpinball.io/api/v1/portal/leaderboards/?event_code=<code>&event_state=current&format=json

Pass an event code, get the current standings. No auth required for read access (the leaderboards are public-facing). The response looks like this — I’ve trimmed it for brevity, but the shape is real:

{
  "leaderboard": {
    "code": "xKJL-RGnLk-Eudye",
    "name": "MAW APR LEADERBOARD",
    "start_date": "2026-04-01T00:00:00Z",
    "stop_date": "2026-04-30T23:59:59Z",
    "titles": [
      { "title_name": "Iron Maiden", "title_code": "MDN" },
      { "title_name": "Foo Fighters", "title_code": "FOO" }
    ],
    "scores": [
      {
        "username": "PinballTina",
        "title_name": "Iron Maiden",
        "score": 4234567890,
        "initials": "PTN",
        "avatar_path": "https://...",
        "background_color_hex": "#bd93f9",
        "is_all_accesss": false
      }
    ]
  }
}

A few things to notice.

titles is a separate array from scores. The titles array maps machine names (“Iron Maiden”) to internal Stern codes (“MDN”). The scores array references machines by title_name — the human-readable name, not the code. So before I can insert anything into my database, I have to build a lookup map from titles, then join each score against it. That’s the first Code node in the pipeline.

username is the Stern Insider login, not the player’s display name or initials. It’s stable across tournaments. That’s why the Stern player matching problem is not a problem — the username is a real identity. I use it directly as my player_id.

is_all_accesss is misspelled. Three S’s, not two. That’s not my typo — that’s how Stern’s API spells it. My Code node carefully matches the misspelling because if Stern ever fixes it, my code breaks silently and starts returning undefined. I’ll fix it the day they do.

The mapping into my schema:

Stern API fieldMy DB column
leaderboard.codeevents.event_code
leaderboard.nameevents.event_name
leaderboard.start_date / stop_dateevents.start_date / stop_date
titles[].title_nametitle_codemachines.machine_namemachine_id
scores[].usernameplayers.player_id
scores[].scorehigh_scores_archive.high_score
scores[].avatar_pathplayers.avatar_url

Plus a snapshot of the entire raw response into api_snapshots, so if I ever change the schema and need to re-derive something, the source-of-truth payload is still there.

There is a real story here, and I’ll come back to it in a minute. But the short version is: I built this pattern after I caught myself sketching out something much more “clever,” and the cleverer version was wrong.


$ psql -c ‘SELECT * FROM matching_states;’

The IScored Complication

This is the heart of the article. Stern was a warm-up.

IScored Ingestion Pipeline: Node-by-Node Workflow n8n workflow showing single 10-min schedule trigger, machine mapping lookup, player matching branch with auto-match and pending paths, score upsert, and error handling. $ cat workflows/iscored-ingestion.svg ▸ TRIGGER (single fixed schedule) ⏱ Schedule Every 10 Min Fixed interval · 24/7 ▸ FETCH 🌐 Fetch IScored API HTTP Request → JSON ▸ MACHINE MAPPING 🗺 Machine Mapping Lookup IScored game ID → our machine_id via iscored_machine_mapping table ▸ PLAYER MATCHING ★ heart of the pipeline 🔍 Check Existing Mapping iscored_player_mapping lookup mapped ✓ not found ✓ Use Stern Player ID Linked identity confirmed 🔄 Try Auto-Match Normalize names → compare match found no match ✓ Auto-Matched Link + set auto_matched ⏸ Create Pending Flag for admin ▸ SCORE UPSERT 💾 Upsert to high_scores_archive ON CONFLICT DO NOTHING 📊 Recalculate Leaderboard ▸ ERRORS ⚠ → ✉ + 📱 on failure
The IScored ingestion pipeline. Single 10-minute trigger, machine mapping lookup, then the player matching branch.

IScored is the platform Dennis uses for non-Stern machines. AFM, Theatre of Magic, Funhouse, the older stuff that doesn’t have an Insider connection. Players queue at the kiosk, type in some name, and start playing. Whatever they type becomes their identity. “DAD’S DAY OUT”. “lvl up midwest”. “JOHN” with no further context.

I needed those scores in the same leaderboard as the Stern scores. Same player_id space. Same machines table. Same scoring formula. So when Pinball Tina logs into Stern Insider on Iron Maiden and then walks two machines over and types “TINA” into Funhouse, both scores need to land under the same player record.

That’s the matching problem.

Player Matching Flowchart: IScored to Stern Identity Resolution Decision tree showing how IScored player names are resolved: check existing mapping, try auto-match via name normalization, or create a pending player for admin review. Shows three terminal states: auto_matched, manual_matched, and pending. $ cat matching/player-identity-resolution.svg ▸ INPUT 📥 Incoming IScored Player Freeform name from API e.g. “DADS DAY OUT”, “JohnD” ▸ STEP 1: CHECK EXISTING Existing mapping in DB? YES ✓ Use linked Stern ID NO ▸ STEP 2: NORMALIZE 🔧 Normalize Both Names Strip spaces, special chars → lowercase “DADS DAY OUT” → “dadsdayout” ▸ STEP 3: COMPARE Normalized name matches a Stern player? YES ✓ Link + auto_match NO ⏸ Create pending 👤 Admin Review Dennis links via web UI → set manual_matched ▸ TERMINAL STATES auto_matched Normalized name hit 26 players manual_matched Admin linked via UI 3 players pending Likely IScored-only 46 players
The player matching decision tree. Three terminal states: auto_matched, manual_matched, pending.

Here’s the algorithm, in order:

  1. Normalize the IScored name. Lowercase it. Strip everything that isn’t alphanumeric. “DAD’S DAY OUT” becomes dadsdayout. “Pinball_Tina!” becomes pinballtina.
  2. Check the existing IScored mapping table. If we’ve seen this exact normalized name before and a human (read: Dennis) has already linked it to a player, use that link. Don’t re-decide.
  3. Try to auto-match against the Stern player table. If the normalized IScored name exactly matches a normalized Stern username, auto-link them and call it auto_matched. This is the cheap win for players who use the same handle on both systems.
  4. Otherwise, create a new pending player. Generate a synthetic player_id (iscored_<normalized_name>), insert a row in the mapping table with status pending, and move on. The score still gets ingested. The player just doesn’t show up in the combined leaderboard until Dennis confirms who they are.

Three terminal states: auto_matched, manual_matched, pending.

The current production split, as of right now: 26 auto-matched, 3 manual-matched, 46 pending out of 75 total IScored player mappings. That’s a 35% auto-match rate, which sounds low until you remember that most of the pending players probably aren’t on Stern Insider at all — they’re IScored-only customers. They’re not failures, they’re a different population.

The normalize-and-match algorithm — SQL and all

The whole IScored matching system rests on one PostgreSQL function:

CREATE OR REPLACE FUNCTION normalize_player_name(name TEXT)
RETURNS TEXT AS $$
BEGIN
    RETURN LOWER(REGEXP_REPLACE(name, '[^a-zA-Z0-9]', '', 'g'));
END;
$$ LANGUAGE plpgsql IMMUTABLE;

Three operations: lowercase, strip everything that isn’t a letter or digit, return. Marked IMMUTABLE so PostgreSQL can use it in expression indexes. That’s important — without IMMUTABLE, the planner refuses to index on normalize_player_name(display_name) and every match query becomes a sequential scan of 330 rows. That’s not slow at this scale, but it would be eventually.

What it does to real names:

InputNormalized
DAD'S DAY OUTdadsdayout
Pinball_Tina!pinballtina
lvl up midwestlvlupmidwest
LvlUpMidwestlvlupmidwest
123-XYZ123xyz

The LvlUpMidwestlvlupmidwest collapse is the whole point. Stern lets you pick a username with arbitrary capitalization. IScored lets the bartender type whatever they want with arbitrary spacing. After normalization, both forms collide on the same key.

The match query is small:

SELECT player_id
FROM players
WHERE normalize_player_name(display_name) = normalize_player_name($1)
LIMIT 1;

If that returns a row, the IScored name has a Stern twin. The mapping gets written with match_status = 'auto_matched'. If it returns nothing, the mapping gets written with match_status = 'pending' and a synthetic player_id of the form iscored_<normalized_name>.

The mapping is keyed on the original IScored name, not the normalized one. That matters: if the same human types “PINBALL TINA” today and “pinball tina” tomorrow, they both normalize to pinballtina, but they show up as two distinct mapping rows. The matching query handles them both correctly because it normalizes on the fly. The mapping table is essentially a cache of “we’ve seen this exact spelling before; here’s what we decided.”

Once a mapping exists, it sticks. A future ingestion run that sees the same IScored name doesn’t re-evaluate — it just uses the cached player_id. That’s how Dennis’s manual decisions persist. He clicks “link this to player X” once, the row is updated to manual_matched, and every future score under that exact spelling lands under the right player without further intervention.

The 35% auto-match rate is what it is. The remaining 46 pending mappings are mostly people who play at the bar regularly but have never created a Stern Insider account — they’re not on Stern at all. There’s nothing to match against, so they sit pending until Dennis confirms them as new IScored-only players. That’s not a failure mode; that’s the system working correctly.

The pending players need a human in the loop. That human is Dennis. He has an admin UI — bog-standard HTML table behind Cloudflare Access — that shows him every pending mapping, lets him search the existing player roster, and lets him either link the IScored name to a Stern account (“yes, lvl up midwest is the same human as LvlUpMidwest”) or confirm them as a brand new IScored-only player. The UI doesn’t try to be clever about it. He clicks. The mapping changes state. The scores re-attribute on the next ingestion run.

The admin UI for when auto-matching fails

Dennis has an admin page behind Cloudflare Access. From the outside, it’s a single HTML page served by the Flask app, no JS frameworks, no SPA, nothing fancy. Three sections stacked vertically:

  1. Stats cards at the top. How many auto-matched, how many manual-matched, how many pending. Big numbers, plain font, instant glance.
  2. Pending review table. One row per pending IScored name, with the normalized form and creation timestamp. Three buttons per row: Link to existing player, Confirm as IScored-only, Ignore (rare; for clearly-bad data like test names).
  3. All mappings view. A read-only audit log. Every IScored name ever seen, what it’s currently mapped to, and what status it’s in. Useful when Dennis wants to undo a previous decision.

The link flow is the part that does the actual work. He picks a pending row. Clicks Link. A modal opens with a search box pre-populated with the normalized name. He types or scrolls until he finds the right Stern player. Confirms. The backend hits this update:

UPDATE iscored_player_mapping
SET player_id = $stern_player_id,
    match_status = 'manual_matched'
WHERE iscored_name = $original_name;

That’s the entire link operation. One update statement. The next ingestion run (within 10 minutes) re-attributes any new scores under that name to the correct player. Old scores are also retroactively correct because the leaderboard cache recomputes from high_scores_archive, which already has the new player_id foreign-keyed in.

The Confirm as IScored-only flow is even simpler. The mapping flips from pending to new_player, and the synthetic iscored_<name> player_id stays in place. Their scores show up on the leaderboard under that name from then on. If they later create a Stern account and Dennis wants to merge them, that’s a follow-up manual link.

There is no automation in the admin UI. There are no “smart suggestions” or LLM-powered match recommendations. There’s a search box, a list of player IDs, and Dennis’s brain. That brain knows which regulars play under which name. Software does not. So the software gets out of the way and lets the brain do its thing.

Dennis touches this UI maybe once a tournament. He doesn’t get an alert when there are pending mappings — he just remembers to check it after a busy weekend. The pending count is bounded (at most a few new players per month), the work per row is seconds, and there’s no ambiguity in the workflow. That’s the whole design goal: keep the human’s job small enough that it doesn’t feel like a job.

This is the only manual step in the entire system. Dennis touches the admin UI maybe once a tournament, when new players show up. The rest of the time he forgets it exists. That, again, is the goal.


$ crontab -l

The Responsible Citizen

A new question for you: how often should my workflow poll the Stern API?

The honest answer is “as infrequently as possible without ruining the user experience.” Scores don’t change every minute. The bar isn’t open at 4 AM. Stern is hosting an API on their dime, not mine, and if everyone with a homelab spammed it like a DDoS, they’d add rate limiting and rightfully so.

So I sat down to design the scheduler. And I got really tempted by something cute.

The road not taken. A Code node that checks the current hour. A time-of-day comparison. Maybe a HEAD request to grab a Last-Modified header and short-circuit if nothing changed. Maybe — maybe — a webhook from the bar’s POS system to detect when there are actually customers in the building. We could go bigger: exponential backoff based on how many consecutive runs returned no new scores, with a state variable persisting across executions, and a manual override for when Dennis runs a special tournament…

You know what’s kept jobs running on a regular basis across the entire internet for the top enterprises on the planet? Cron. Cron works. Use it.

You know what breaks? Cute n8n workflows with fiddly bits. Code nodes with time-of-day logic that someone (you, at 2 AM, six months from now) has to debug when the bar changes its hours. State variables that get corrupted. Webhook integrations with third-party POS systems that update their API without telling you.

Two crons. Same pipe. Done.

Scheduling Timeline: 24-Hour API Polling View 24-hour timeline showing Stern dual-trigger polling (60m baseline all day plus 10m peak from 10am-midnight) overlaid with IScored fixed 10-minute polling. Shows ~94 Stern calls/day vs ~144 IScored calls/day. $ cat scheduling/24hr-timeline.svg ▸ POLLING SCHEDULE (24-hour view) Stern Baseline (60m, 24/7) Stern Peak (10m, 10am–midnight) IScored (10m, 24/7) STERN 1/hr 6/hr ISCORED 6/hr (constant) 12a 1 2 3 4 5 6 7 8 9 10a 11 12p 1 2 3 4 5 6 7 8 9 10 11 12a peak starts peak ends ▸ DAILY TOTALS Stern API Calls ~94 calls/day Off-peak (12a–10a): 10 calls 1 per hour × 10 hours Peak (10a–12a): 84 calls 6 per hour × 14 hours IScored API Calls ~144 calls/day Fixed: 6 per hour × 24 hours No peak/off-peak distinction Could adopt dual-trigger pattern in the future
24-hour polling timeline. Stern’s dual-trigger pattern (cyan baseline + purple peak) vs IScored’s flat 10-minute interval.

That’s it. That’s the whole “adaptive scheduling system.” A 60-minute baseline cron and a 10-minute peak cron pointed at the same node. If I want to widen the peak window, I edit one cron expression. If I want to drop the peak cadence to 5 minutes, I edit one cron expression. There is nothing to debug because there is nothing clever happening.

IScored, for the record, runs every 10 minutes 24/7. That’s a single fixed trigger — I haven’t bothered to give it the dual-trigger treatment yet. It’s on the backlog. The pattern is right there waiting.

The scheduling math — when to poll and when to chill

If I polled the Stern API every minute, all day, all night, I’d make 1,440 API calls per day per workflow. With the IScored workflow doing the same, that’s 2,880 calls/day for two leaderboards that update at the speed of human pinball. That’s stupid. Let’s do better.

The Stern workflow has two schedule triggers:

TriggerCron-equivalentActive hours
Schedule Every 60 Minutes0 * * * *All day, every day
Peak Hours Every 10 Min*/10 10-23 * * *10:00 AM — 11:59 PM

Both fire independently. Both call the same fetch node downstream. n8n doesn’t dedupe — if both trigger at exactly 10:00:00 AM, the workflow runs twice. In practice, they don’t collide because the 10-minute cron fires at minute 0, 10, 20, 30, 40, 50, and the hourly fires at minute 0. So they overlap once per hour during peak hours.

The math, generously rounded:

Off-peak window: 10 hours (midnight -> 10am)
  - Hourly cron only -> 10 runs

Peak window: 14 hours (10am -> midnight)
  - Hourly cron: ~14 runs
  - 10-min cron: 6/hr x 14h = 84 runs
  - But: ~14 collisions/hour where both crons fire at the same minute
  - Effective: ~84 runs (the hourly is dominated by the 10-min during peak)

Total: ~94 calls/day

That’s what the workflow’s own sticky note claims, and it’s close enough to true that I haven’t bothered to argue. Even at the upper bound (~108 if you count every overlap as a separate run), it’s an order of magnitude better than 1,440.

Why this shape and not something else?

Hourly baseline because new scores can land at any hour. Bartenders close out at 1 AM, but I’ve seen scores get processed by Stern’s backend at 3 AM as the system catches up on whatever’s queued. A 60-minute baseline means the worst case for a stale score is 60 minutes — that’s fine for a leaderboard nobody is staring at when the bar is closed.

10-minute peak because that’s the sweet spot between “keeps the kiosk feeling live” and “doesn’t hammer Stern.” A score change shows up in 5 minutes on average, 10 minutes worst case. Fast enough that customers checking the kiosk see their score reflected quickly. Slow enough that Stern’s API doesn’t notice me.

Cron, not state machines. I covered this in the main article. The temptation to build something feedback-driven was real. The reasons not to are stronger. Two crons. Same pipe. Done.

IScored at fixed 10-minute because I haven’t bothered to give it the dual-trigger treatment yet. It’s a single Schedule Every 10 Min trigger, 144 calls/day, every day. That’s on the backlog. The pattern is right there waiting whenever I get tired of the 50 extra daily calls.

If I needed to widen the peak window or speed up the cadence, it’s a single cron expression to edit. */10 9-23 for 9am-11pm. */5 10-23 to drop to 5-minute peak. No state, no migration, no thinking. That’s the entire reason to build it this way.

One more design call worth naming: there is exactly one ingestion layer feeding both dev and prod. Same database, same scores, different Flask instances reading from the same tables. “But what about data isolation?” What about not hammering Stern’s API twice for a dev environment that is unironically being looked at by zero people? Pragmatism beats purity here. The ingestion is the expensive part. The app is the cheap part. Sharing the expensive part is the whole point.


$ psql -d pinball -c ‘\d+ high_scores_archive’

The Database

PostgreSQL. Of course PostgreSQL.

Database ERD: Pinball Leaderboard Schema Entity relationship diagram showing key tables: high_scores_archive, players, machines, events, iscored_machine_mapping, iscored_player_mapping, and leaderboard_cache with their relationships and the UNIQUE deduplication constraint. $ cat schema/pinball-erd.svg high_scores_archive 🔑 score_id SERIAL PK → player_id FK → players → machine_id FK → machines high_score BIGINT → event_code FK → events source ‘stern’ | ‘iscored’ recorded_at TIMESTAMP 🔒 UNIQUE (player_id, machine_id, high_score, event_code) → ON CONFLICT DO NOTHING players 🔑 player_id VARCHAR PK display_name initials stern_username avatar_url created_at machines 🔑 machine_id VARCHAR PK machine_name manufacturer image_url active BOOLEAN events 🔑 event_code VARCHAR PK event_name start_date end_date is_active BOOLEAN ▸ IScored MAPPING TABLES iscored_player_mapping 🔑 id SERIAL PK iscored_player_name → stern_player_id FK → players (nullable) match_status auto | manual | pending normalized_name matched_at TIMESTAMP iscored_machine_mapping 🔑 id SERIAL PK iscored_game_id → machine_id FK → machines iscored_game_name active BOOLEAN FK FK ▸ COMPUTED leaderboard_cache Pre-computed rankings · 1st=100, 2nd=90, 3rd+=88-rank
Database schema. high_scores_archive sits at the center; the UNIQUE constraint on (player_id, machine_id, high_score, event_code) is the deduplication mechanism.

The schema is small and unsurprising: players, machines, events (tournaments), high_scores_archive (the source of truth for individual scores), iscored_machine_mapping and iscored_player_mapping (the IScored translation tables), and leaderboard_cache for precomputed standings.

The interesting bits are deduplication and tournament switching.

Deduplication. Every score insert hits this constraint:

UNIQUE (player_id, machine_id, high_score, event_code)

Combined with ON CONFLICT DO NOTHING, that gives me idempotent ingestion: re-running yesterday’s pull is a no-op. The constraint includes the score value, which means a player improving on their own score is a new row, not an update. The model is “append every distinct score” rather than “keep only the best.” It’s slightly more storage in exchange for full history. With 6,673 scores ingested across 6 months, the storage cost is, generously, dust.

Tournament switching. Each month is a new event. The active event has is_active = true; the others don’t. The leaderboard cache filters by event. To roll a new month, I update the is_active flag, update the event_code in the workflow, and the next ingestion run starts populating the new tournament. The old scores stay archived under the old event_code — nothing gets deleted, ever.

Scoring. The leaderboard formula is dumb on purpose:

1st place: 100 points
2nd place: 90 points
3rd through 87th: 88 - rank (so 3rd = 85, 4th = 84, ...)
88th and below: 0 points

Same formula in SQL and Python so the cached leaderboard and the live recompute can’t disagree. Boring. Reliable. Easy to explain to Dennis.

The full schema with constraints and indexes

The pinball database has six tables that matter, plus a couple of auxiliary ones for snapshots and history. Here are the key ones, columns trimmed to the essentials:

-- Players: every human who has ever scored, Stern or IScored
CREATE TABLE players (
    player_id          VARCHAR(100) PRIMARY KEY,
    display_name       VARCHAR(100),
    normalized_name    VARCHAR(100),  -- materialized for fast match lookups
    avatar_url         TEXT,
    background_color_hex VARCHAR(7),
    is_all_access      BOOLEAN DEFAULT FALSE,
    last_seen          TIMESTAMP WITH TIME ZONE
);
CREATE INDEX idx_players_normalized ON players(normalized_name);

-- Machines: every active and historical pinball cabinet
CREATE TABLE machines (
    machine_id    VARCHAR(50) PRIMARY KEY,   -- short code: AFM, MDN, FOO, etc.
    machine_name  VARCHAR(100),              -- "Iron Maiden", "Attack from Mars"
    is_active     BOOLEAN DEFAULT TRUE
);

-- Events: monthly tournaments
CREATE TABLE events (
    event_code   VARCHAR(100) PRIMARY KEY,
    event_name   VARCHAR(200),
    start_date   TIMESTAMP WITH TIME ZONE,
    stop_date    TIMESTAMP WITH TIME ZONE,
    location_id  VARCHAR(20),
    event_type   VARCHAR(50),                -- 'STERN_LEAGUE', etc.
    is_active    BOOLEAN DEFAULT FALSE       -- exactly one row should be true
);

-- High_Scores_Archive: every score, ever, append-only
CREATE TABLE high_scores_archive (
    score_id      BIGSERIAL PRIMARY KEY,
    player_id     VARCHAR(100) REFERENCES players(player_id),
    machine_id    VARCHAR(50)  REFERENCES machines(machine_id),
    high_score    BIGINT NOT NULL,
    date_set      TIMESTAMP WITH TIME ZONE,
    event_code    VARCHAR(100) REFERENCES events(event_code),
    score_source  VARCHAR(20),                -- 'API' (Stern) or 'ISCORED'
    is_approved   BOOLEAN DEFAULT TRUE,
    CONSTRAINT unique_score_per_event
        UNIQUE (player_id, machine_id, high_score, event_code)
);
CREATE INDEX idx_hsa_event_player ON high_scores_archive(event_code, player_id);
CREATE INDEX idx_hsa_event_machine ON high_scores_archive(event_code, machine_id);

-- IScored mapping tables
CREATE TABLE iscored_machine_mapping (
    iscored_game_id VARCHAR(20) PRIMARY KEY,
    machine_id      VARCHAR(50) REFERENCES machines(machine_id),
    last_synced     TIMESTAMP WITH TIME ZONE
);

CREATE TABLE iscored_player_mapping (
    iscored_name    VARCHAR(100) PRIMARY KEY,
    normalized_name VARCHAR(100),
    player_id       VARCHAR(100) REFERENCES players(player_id),
    match_status    VARCHAR(20) DEFAULT 'pending',
    created_at      TIMESTAMP WITH TIME ZONE DEFAULT NOW()
);
CREATE INDEX idx_iscored_normalized ON iscored_player_mapping(normalized_name);

The interesting constraint is unique_score_per_event on high_scores_archive. It includes the score value, which is unusual — most “score table” designs would use (player_id, machine_id, event_code) and update the high score on conflict. I deliberately don’t. Including the score value means a player improving their own score creates a new row, not an update. The model is “append every distinct score” rather than “keep only the best.”

That gives me three things:

  1. Idempotent ingestion. Re-running yesterday’s pull is a no-op. Same player, same machine, same score, same tournament → blocked by the unique constraint, no action taken.
  2. Score progression history. I can answer “show me how this player improved on Iron Maiden over the month” because every score they posted is in the table.
  3. Cheap deduplication. No need for application logic. PostgreSQL refuses the duplicate at the constraint level. ON CONFLICT ON CONSTRAINT unique_score_per_event DO NOTHING and move on.

There’s also a leaderboard_cache table I didn’t list — it’s the precomputed standings for the active tournament, recalculated by a stored procedure (update_combined_leaderboard) at the end of every ingestion run. The Flask app reads from it directly. Recomputing it takes well under a second on a database this small, so I don’t bother with incremental updates. Full rebuild every 10 minutes is plenty cheap.

The security sidebar — a vulnerability I caught during testing

Pre-deployment review is the entire reason I work in InfoSec, so let me show what one looks like in practice.

While building the admin endpoints for the IScored UI, I sketched out a small POST endpoint that the n8n workflow could use to push article-style content into the database. Internal-only, of course. n8n is on the same Docker network, so it could just POST directly. Easy.

I named the route /api/save-article, wrote the handler, ran a manual curl against it from my laptop, and watched it accept anonymous unauthenticated input from outside the firewall.

That’s the moment to stop. Not the moment to ship.

The route as I’d written it had three problems, and I want to be precise about them because they’re common mistakes in homelab Flask apps:

  1. No authentication boundary. Cloudflare Access protects most of the site, but this specific path was sitting under /api/* which was being routed publicly. The internal/external distinction wasn’t being enforced — it was being assumed.
  2. No size limit. The handler was happy to accept a multi-megabyte body and try to write it. That’s a denial-of-service vector by accident.
  3. No input validation on identifiers. The article_id field was a free-text string going straight into a SQL query (parameterized, thankfully) and a filesystem path (not parameterized).

The fix wasn’t a single change. It was four:

  1. Move the route under /api/admin/. That namespace is covered by an existing Cloudflare Access policy. External requests get bounced at the edge before they ever reach Flask.
  2. n8n bypasses Cloudflare via internal IP. The workflow hits the Flask container directly on the Docker network. It doesn’t go through the public hostname, so the Cloudflare Access policy doesn’t care about it. That preserves the “internal automation can do its job” use case without re-introducing the public hole.
  3. Add a 1 MB size cap on the request body via Flask’s MAX_CONTENT_LENGTH. Anything bigger gets a 413 before the handler runs.
  4. Validate the identifier. The article_id parameter is now restricted to [a-zA-Z0-9_-]+ via a regex check at the top of the handler. Anything else gets a 400.

None of this is novel. Every one of these is in the OWASP top ten in some form. The point of the story isn’t “I invented secure coding.” The point is: this got caught before it shipped because I tested it. A curl from my laptop is not a fancy test rig. It’s two minutes of effort. The cost of finding this bug before deployment was negligible. The cost of finding it after deployment, when something on Reddit linked to my homelab and a bored teenager started posting test data, would have been… a lot more.

If you’re building HTTP endpoints in a homelab project, take this lesson seriously: do not trust your own assumptions about what’s reachable from the public internet. Test with curl from outside your network. Read the route handler with adversarial eyes. The bug you find in five minutes today is the incident you don’t have to write up next year.


$ journalctl -u n8n –since yesterday

The n8n Reality Check

Let me be honest about the part I’m not putting on a postcard.

n8n workflow design is not for the faint of heart. The visual editor is great until it isn’t. Credentials change quietly. Webhook IDs disappear. A Code node that worked yesterday throws on a malformed payload that never used to happen. AI tools help — I lean on Claude heavily — but they need your help. You have to give them exact inputs and exact expected outputs. They can’t debug blind any better than you can.

I have spent real hours chasing issues in circles inside this workflow. There’s no point pretending otherwise. The trick is to build small, test each node by itself, pin test data when you can, and read the actual output of the actual node instead of what you assumed it would produce. Discipline. Same as any other engineering work.

And then — and I cannot say this loudly enough — back the workflow up. I wrote a whole article about this. The TL;DR is: a backup workflow that backs up itself, runs at 2 AM, costs nothing, and has saved me more than once. Do not skip it. The day you delete a Code node by accident is the day you find out whether your future self is going to forgive your past self.

n8n debugging tips — what actually works

Six things that actually save time when something in n8n breaks. Earned the hard way.

1. Test one node at a time. The big green Execute Workflow button at the bottom is for production runs. While you’re debugging, click the play icon on the individual node that’s misbehaving. n8n will run just that node using whatever data is sitting in the previous node’s output. You don’t burn API calls, you don’t spam your database, you just see what this node does to that input.

2. Pin test data on the trigger node. When you’re iterating on a workflow that fires off an HTTP API call, every test run hits the API. Eventually you’ll get rate-limited, or the data will change, or the API will go down at the worst possible time. Right-click the trigger node, choose “Pin data,” and feed it a known-good payload from a previous successful run. Now your tests are deterministic and the API stops getting harassed.

3. Read the actual node output, not what you assumed it produced. This is the single most common mistake. You wired up an HTTP request, you “know” it returns a list of objects, you wired the next node to map over that list — but the API actually returned an object with a data property containing the list, and now your map gets one item with weird keys. Click the previous node, look at the Output tab, see what’s actually there. Believe your eyes, not your model of the API.

4. Use the execution log when things “work but don’t.” Sometimes a workflow runs to completion but produces no observable effect. Maybe the database insert silently no-op’d. Maybe the email node thinks it sent something but the SMTP server quietly dropped it. The Executions tab on the left shows every run with its full per-node input/output. You can replay history at the node level. Use it.

5. Watch out for the credential-change bug. Changing the credential on a node sometimes blanks the rest of the node’s parameters. (I’m not making this up; I’ve seen it on the Postgres node specifically.) Always save and reopen the node after a credential change to verify the SQL or HTTP body is still intact. If it’s blank, undo and retype.

6. Webhook 404s after creating via MCP. If you create a webhook node via the MCP API and the webhook returns 404 even though the workflow is active, it’s almost certainly missing the auto-generated webhookId field. Toggle the workflow off and back on to re-register it. That has fixed it for me every single time.

There’s one thing I would add that isn’t really a tip: Claude is good at this, but you have to feed it the right context. Show it the exact node configuration. Show it the actual error message. Show it the input the node received. Don’t ask “why is my workflow broken” — ask “given this input and this node config, why is this output not what I expected.” The more specific the question, the better the answer. AI is a force multiplier, not a force creator.


$ uptime && wc -l scores.log

The Payoff

Here is what 180 days of “set it and forget it” looks like, by the numbers:

MetricValue
First score ingestedOctober 26, 2025
Most recent scoreThis morning
Total scores6,673
Stern API scores6,137 (92%)
IScored scores536 (8%)
Unique players330
Active machines25
Monthly tournaments completed7, zero gaps
Manual interventions requiredA handful of admin-UI clicks

That’s it. That’s the whole thing. One pipeline, two ugly data sources, one well-behaved database, a Flask app I barely touch, and a kiosk that quietly serves the result back to the people who care about it. The pipeline didn’t pause for the holidays. It didn’t break when I rebooted the Docker host. It didn’t whine when I redeployed the Flask app in the middle of the day. It just kept going.

This is the same theme as the bulletproof bar kiosk: the best infrastructure is the kind you forget about. You build it once, you make it boring on purpose, and then you walk away. The day you stop thinking about it is the day it actually starts working.

Now go play some pinball. The leaderboard’s at pinball.thedelay.com.

Similar Posts

One Comment

Leave a Reply