The Bulletproof Kiosk

Building a Kiosk That Never Calls You at 2 AM
bulletproof-bar-kiosk.html

Building a Kiosk That Never Calls You at 2 AM

My pinball leaderboard kiosk sits in a bar three hours away. I can’t drive over and reboot it. I can’t ask the bar owner to SSH in and restart a service. If that screen goes dark during league night, nobody’s fixing it until I make a six-hour round trip.

So I engineered it to never need me.

This is the story of how I turned an Intel NUC into a bulletproof display kiosk — starting with a purpose-built kiosk OS that turned out to be a trap, pivoting to Ubuntu, and then building six layers of automatic recovery so deep that the thing practically resurrects itself.

If you’re building any kind of unattended display — a dashboard for your office, a menu board for a restaurant, a leaderboard for your bar — this is the reliability playbook.

Just want the architecture? Skip to Defense-in-Depth: Six Layers of “Please Don’t Call Me”.

$ cat backstory.md

What I’m Working With

The display is for Level Up Pinball at Midwest Aleworks — a pinball league at a craft brewery. The leaderboard app is a custom-built Flask application that rotates through player rankings, machine stats, and league standings on a portrait touchscreen.

The Hardware

Component Details
DeviceIntel NUC8i7HVK
RAM19 GB
Storage117 GB SSD (12 GB used)
DisplayWingCool touchscreen (portrait, 1080×1920)
NetworkWiFi only (Intel Wireless 8265)
Total cost$0 (repurposed hardware)

The NUC is massively overkill for displaying a webpage. That’s the point — headroom means fewer resource-related failures. And it was sitting in a drawer, so the price was right.

The Constraints

These aren’t hypothetical. Every design decision flows from these realities:

Design Constraints

  • WiFi only — No ethernet at the bar. The kiosk lives on the bar’s WiFi.
  • Non-technical owner — Dennis runs a pinball league, not a server room. “SSH in and check the logs” is not an acceptable support procedure.
  • 3 hours away — Physical intervention is a full-day commitment.
  • Bar environment — Patrons will touch the screen. Power might flicker. WiFi might drop when 50 people show up for trivia night.
  • Must survive reboots — Power outages happen. The kiosk comes back up clean with zero human intervention.

The design goal is simple: Dennis should never have to call me.


$ git log –oneline porteus/

The Kiosk OS That Wasn’t

My first instinct was to use a purpose-built kiosk operating system. Porteus Kiosk seemed perfect — it’s literally designed for this exact use case. Boot into a locked-down browser, display a URL, done.

And it worked. Really well, actually.

What Porteus Got Right
  • Boot straight to Chrome in kiosk mode — no desktop, no taskbar, no escape
  • Immutable filesystem — can’t be tampered with, survives dirty shutdowns
  • Hardware watchdog support (iTCO_wdt) for auto-reboot on kernel hangs
  • Tiny footprint — boots fast, runs lean

I had it fully operational: Chrome displaying the leaderboard, SSH reverse tunnel to my bastion host, browser refreshing daily at 5:30 PM, auto-detecting whether it was on my home network or deployed at the bar. The clever bit was a setup script hosted on the web server that ran at boot:

# Auto-detect network and connect to bastion
if ping -c 1 -W 2 192.168.86.110 >/dev/null 2>&1; then
    # Home network -- connect directly
    ssh -p 22 kiosk-pinball@Bastion-1
else
    # External network -- use NAT rule
    ssh -p 22999 [email protected]
fi

Smart. Portable. Worked from anywhere.

Where It Fell Apart

Then I read the fine print.

Problem 1: Licensing. Porteus Kiosk has a 30-day trial. After that, it’s €40/year per device, or the kiosk stops working. For a hobby project at a bar? Non-starter. There’s also a “Porteus Kiosk Server” option at €300/year, which somehow makes it worse.

Problem 2: The WiFi Bootstrap Trap. This was the real killer. Porteus bakes WiFi credentials into the ISO at build time. If the bar changes their WiFi password — or if I need to move the kiosk to a different location — the device becomes a brick. The only fix is to re-flash the USB drive with a new ISO containing the new credentials. From 3 hours away.

Why this matters: An immutable OS sounds great for security and reliability. But “immutable” means you can’t change anything — including the thing that connects you to the network. And if you can’t reach the network, you can’t reach the device. It’s a trap.

I spent about a week getting Porteus production-ready before discovering these issues. That week wasn’t wasted — I learned exactly what a kiosk needs to do. But the platform had to go.

USB Imaging Tools: What Works and What Doesn’t
Tool Result Verdict
ddCreates non-bootable imagesAvoid
VentoyCompatibility issues with Porteus KioskAvoid
Balena EtcherWorks perfectly, first tryUse this

I tried dd first (because of course I did), then Ventoy, then finally Balena Etcher. Should’ve started there.


$ diff porteus/ ubuntu/

The Full-OS Pivot

I scrapped Porteus and installed Ubuntu 24.04 LTS with KDE Plasma. A full desktop operating system. For a kiosk.

Sounds wrong, right? A kiosk should be locked down, minimal, purpose-built. Why would you put a full desktop environment on a device that only needs to display a webpage?

Because everything that went wrong with Porteus was about inflexibility. I need to:

  • Change WiFi remotely (nmcli — one command, done)
  • Debug issues with real tools (systemd, journalctl, htop)
  • Install new software without re-imaging
  • Let Dennis access a normal desktop if he needs to (unlikely, but possible)
  • Apply security updates without rebuilding from scratch
Why KDE Plasma? With 19 GB of RAM and an i7 processor, resource usage is irrelevant. KDE gives Dennis a familiar, polished desktop if he ever needs to log in directly. It’s also highly configurable — which matters for disabling all the “helpful” power management features that would put the display to sleep.

The Two-User Design

This is the architectural decision that makes everything work:

  • kiosk user — Auto-login at boot. Launches Chrome in fullscreen kiosk mode immediately. No desktop, no taskbar, no escape hatch. This is what the display shows 99.9% of the time.
  • admin user — Full KDE desktop for maintenance. Only accessible via SSH-triggered mode switch. Password-protected.

Bar patrons see a kiosk. I see a Linux box I can manage remotely.

Critical security decision: There is no physical keyboard shortcut to exit kiosk mode. No Alt+F4, no Ctrl+Alt+Del escape. The only way to switch modes is via SSH: ssh root@kiosk "kiosk-to-admin". Curious patrons can’t accidentally (or intentionally) break out of the leaderboard display.

The Boot Flow

Power On → BIOS/UEFI → GRUB → Ubuntu
  └&horz;&horz; SDDM (display manager)
      ├&horz;&horz; Runs Xsetup script (rotates display to portrait)
      └&horz;&horz; Auto-login 'kiosk' user
          └&horz;&horz; kiosk-session starts:
              ├&horz;&horz; xrandr → portrait rotation
              ├&horz;&horz; xinput → touchscreen calibration
              ├&horz;&horz; xset → disable screen blanking
              ├&horz;&horz; unclutter → hide cursor after 3 seconds
              └&horz;&horz; Chrome --kiosk https://pinball.thedelay.com

Every step is automated. From a cold power-on to a fully displayed leaderboard: 90 seconds, zero human intervention.


$ cat resilience-stack.md

Defense-in-Depth: Six Layers of “Please Don’t Call Me”

This is the heart of the project. Each layer catches failures that the previous layer can’t handle. The philosophy: assume everything will break, and build automatic recovery at every level.

LAYER 01
Application-Level Auto-Reload
every ~30 minutes // self-healing

Catches: Browser memory leaks, stale WebSocket connections, accumulated DOM nodes from hours of scene rotation.

The leaderboard app counts complete scene rotation cycles. After 5 full cycles, it triggers a hard page reload. The transition is invisible — the display goes from the last scene right back to the first.

const AUTO_RELOAD_CYCLES = 5;
let completedCycles = 0;

if (currentSceneIndex === 0) {
    completedCycles++;
    if (completedCycles >= AUTO_RELOAD_CYCLES) {
        setTimeout(() => location.reload(true), 1000);
        return;
    }
}

// handles 80% of browser issues before they become visible

Why app-level and not a cron job? Because the app knows its own health. A cron job would reload on a blind schedule, potentially interrupting mid-scene. The app reloads at the natural cycle boundary.
LAYER 02
Chrome Health Watchdog
every 5 minutes // detect + restart

Catches: Chrome process death, CPU pegging, memory runaway, HTTP unresponsiveness.

A systemd timer checks Chrome’s vitals every 5 minutes:

  • CPU usage averaged over 3 samples — threshold: >90%
  • Memory consumption — threshold: >4 GB
  • HTTP health check to the leaderboard URL
  • User activity check (won’t restart if someone’s using the touchscreen)

Recovery: Restarts SDDM, which kills the entire X session and auto-relaunches Chrome fresh via kiosk auto-login.

// the 10-minute idle threshold prevents restarting while Dennis is using the screen

LAYER 03
Periodic Session Refresh
every 6 hours // proactive reset

Catches: X11 session state accumulation, graphics driver leaks, SDDM display amnesia.

Restarts SDDM at 2 AM, 8 AM, 2 PM, and 8 PM. Respects user activity — if someone’s interacting with the display, it defers and retries in 10 minutes.

# Respect user activity before refreshing
IDLE=$(DISPLAY=:0 xprintidle 2>/dev/null || echo 0)
if [ "$IDLE" -gt 600000 ]; then  # 10 min in ms
    systemctl restart sddm
else
    # User active -- defer and retry
    systemd-run --on-active=10min /usr/local/bin/kiosk-refresh
fi

// SDDM restart: ~5 seconds downtime vs. 90 seconds for a full reboot

LAYER 04
Hardware Watchdog (iTCO_wdt)
15-second timeout // kernel-level reboot

Catches: Kernel panic. Complete system hang. CPU deadlock. Anything that makes userspace completely unreachable.

The Intel NUC has a hardware watchdog timer that operates independently of the operating system. A kernel module feeds the watchdog every few seconds. If the kernel stops feeding it (because it’s frozen), the hardware reboots the machine.

This is a physical timer on the motherboard. It doesn’t care about your kernel panic — it just reboots.

// the “nuclear option” that handles scenarios no software solution can address

LAYER 05
Smart Plug Remote Power Cycle
on-demand // remote power cycle

Catches: POST hang, BIOS lockup, hardware failure the watchdog can’t recover from, or a reboot loop.

The NUC is plugged into a Kasa smart plug (~$30). BIOS is set to “Power On After AC Loss.” Remote recovery procedure:

  1. Open the Kasa app on your phone
  2. Turn OFF the “Pinball Kiosk” plug
  3. Wait 10 seconds
  4. Turn ON the plug
  5. NUC auto-boots, kiosk is live in ~90 seconds

// the only layer that requires human intervention — but from my phone, not a 6-hour road trip

LAYER 06
Scheduled Weekly Reboot
Sunday 4 AM // clean slate

Catches: Kernel-level memory fragmentation, slab cache growth, file descriptor leaks that survive process restarts — the slow drift that causes mysterious failures after weeks of uptime.

Full system reboot when the bar is closed. Total recovery time: 90 seconds. Nobody’s watching at 4 AM.

// sometimes the best fix is turning it off and on again. on a schedule.

The Full Stack

Layer 1: App Reload         every ~30 min    (self-healing)
Layer 2: Chrome Watchdog     every 5 min      (detect + restart)
Layer 3: Session Refresh     every 6 hours    (proactive reset)
Layer 4: Hardware Watchdog   15-sec timeout   (kernel-level reboot)
Layer 5: Smart Plug          on-demand        (remote power cycle)
Layer 6: Weekly Reboot       Sunday 4 AM      (clean slate)

Each layer is independent. If Layer 1 fails, Layer 2 catches it. If Layer 2 fails, Layer 3 catches it. If the entire OS freezes (Layers 1–3 all useless), Layer 4 kicks in. If even the hardware watchdog can’t recover, Layer 5 lets me power-cycle remotely. And Layer 6 prevents the slow drift that causes weird failures after weeks of uptime.


$ ssh -J bastion kiosk

Remote Access Architecture

The kiosk is WiFi-only, behind a NAT, at a location I don’t control. Getting reliable remote access required some creative networking.

Three Tiers of Access

Direct SSH (Home Network)

When the kiosk is on my home network during testing:

ssh admin@kiosk-ip

// simple, fast, no tricks needed

Reverse SSH Tunnel (Production)

At the bar, the kiosk sits behind a NAT I can’t control. Solution: the kiosk reaches out to my bastion host and creates a reverse tunnel.

# Kiosk initiates outbound SSH to bastion
# Creates reverse tunnel on port 2222
Kiosk (bar WiFi) --SSH--> Bastion-1 (port 22999) --reverse--> port 2222

# To reach the kiosk from anywhere:
ssh -p 2222 root@Bastion-1

The tunnel runs as a systemd service with a watchdog timer that checks health every 10 minutes and auto-recovers with up to 3 retry attempts.

// no port forwarding needed on the bar’s router

Splashtop Remote Desktop

For visual troubleshooting — when I need to see what’s actually on the screen, interact with the touchscreen, or switch modes with a full GUI.

// because sometimes you need to see it to fix it

The NAT Hairpin Problem

Here’s a fun networking gotcha: when the kiosk is on my home network, it can’t connect to the bastion via the external address because of NAT hairpin issues on the router.

The fix? Two versions of the tunnel service — one for home (direct LAN connection) and one for production (external NAT rule). The auto-detection script handles this transparently:

if ping -c 1 -W 2 192.168.86.110 >/dev/null 2>&1; then
    # Home network -- use direct LAN path
    ssh -p 22 kiosk-pinball@Bastion-1
else
    # External -- use NAT rule
    ssh -p 22999 [email protected]
fi

Same kiosk, works from anywhere, no manual configuration changes.


$ cat /var/log/quirks.log

The Devil’s in the Details

Portrait Mode Touchscreen Calibration

The WingCool touchscreen registers as two input devices with identical names. When you rotate the display to portrait mode with xrandr, the touchscreen input doesn’t automatically follow — you have to apply a coordinate transformation matrix to remap the touch input.

And you have to apply it to both devices, because you don’t know which one is active:

for id in $(xinput list | grep 'WingCool Inc. TouchScreen' \
  | grep -oP 'id=\K\d+'); do
    xinput set-prop "$id" 'Coordinate Transformation Matrix' \
      0 1 0 -1 0 1 0 0 1
done

That matrix (0 1 0 -1 0 1 0 0 1) rotates touch input 90 degrees counterclockwise. I didn’t figure that out from documentation — I found it by trial and error after the touchscreen was responding to taps 90 degrees off from where I was actually touching.

Performance Tuning for “Set It and Forget It”

The NUC has 19 GB of RAM but only uses 1.1 GB in kiosk mode. Still, I stripped out everything that could cause problems over weeks of uptime:

Full tuning list
Optimization Why
Baloo (file indexer) disabledPointless CPU usage on a kiosk
Swappiness set to 10Keep everything in RAM
Apport (crash reporter) disabledNobody reads crash reports here
Journal size limited to 100MBPrevent logs from eating the SSD
Firefox & Thunderbird snaps removedUnnecessary attack surface
CUPS (printing) disabledIt’s a kiosk, not a print server
Bluetooth disabledWiFi-only device
Sleep/suspend/hibernate maskedThe display never sleeps
Chrome cache limited to 100MBPrevent unbounded growth
Weekly cleanup cronClears caches, temp files, vacuums journals
Why all this on a machine with 19 GB of RAM? Because this kiosk needs to run for weeks without intervention. It’s not about today’s resource usage — it’s about preventing the slow accumulation that causes mysterious failures at 3 AM on a Tuesday.

Monitoring

The kiosk pushes health metrics to Uptime Kuma every 5 minutes:

disk:23%,mem:17.8GB,chrome:running,tunnel:up

If disk usage exceeds 85%, available memory drops below 1 GB, Chrome dies, or the tunnel goes down, Uptime Kuma flags the kiosk as DOWN and I get an alert via push notification and email.

There’s also a keyword monitor that fetches the leaderboard URL and checks for specific text in the response — proving the app is actually rendering, not just returning a network-level “OK.”


$ cat lessons-learned.md

What I’d Do Differently

  • Evaluate licensing before building. I spent a week perfecting a Porteus deployment before discovering the €40/year license. Five minutes of reading the pricing page would have saved five days of work. Always check the business model first.
  • “Kiosk OS” is a marketing term, not an architecture. What I actually needed was a locked-down Linux box with auto-login and Chrome in fullscreen. That’s about 20 lines of systemd configuration. You don’t need a special OS for that.
  • Assume the network will change. The WiFi credentials baked into an immutable ISO was the critical failure. Any production kiosk design needs a way to change network settings remotely. Ubuntu + nmcli solved this trivially.
  • Idle detection before any restart. Early versions of my watchdog scripts would restart SDDM blindly. If someone was using the touchscreen, they’d get kicked back to the login screen. Adding a 10-minute idle threshold with xprintidle fixed this completely.
  • Test your USB imaging tool before you need it. dd, the “universal” tool, produced non-bootable images. Ventoy had compatibility issues. Balena Etcher worked first try. Don’t assume your go-to tool works for every use case.
  • Layer your resilience; don’t rely on any single mechanism. If I’d only built the Chrome watchdog, a kernel panic would still take the kiosk offline. If I’d only built the hardware watchdog, a Chrome memory leak would show a white screen until the reboot. Each layer covers a different failure mode.

$ cat invoice.txt

Total Cost

Item Cost
Intel NUC8i7HVK (repurposed)$0
Ubuntu 24.04 LTS$0
Kasa Smart Plug (KP115)~$30
Licensing fees$0
Monthly subscriptions$0
Total~$30

$ exit

The Result

The kiosk has been running for weeks. Dennis hasn’t called me. The monitoring dashboard shows steady green. When Chrome occasionally gets sluggish, the watchdog catches it. When the bar loses power during a storm, the NUC boots back up, auto-logins, launches Chrome, and the leaderboard is back before anyone notices.

The best infrastructure is the kind you forget about. Build it right, layer the recovery, and go enjoy league night.

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *