The Bulletproof Kiosk
Building a Kiosk That Never Calls You at 2 AM
My pinball leaderboard kiosk sits in a bar three hours away. I can’t drive over and reboot it. I can’t ask the bar owner to SSH in and restart a service. If that screen goes dark during league night, nobody’s fixing it until I make a six-hour round trip.
So I engineered it to never need me.
This is the story of how I turned an Intel NUC into a bulletproof display kiosk — starting with a purpose-built kiosk OS that turned out to be a trap, pivoting to Ubuntu, and then building six layers of automatic recovery so deep that the thing practically resurrects itself.
If you’re building any kind of unattended display — a dashboard for your office, a menu board for a restaurant, a leaderboard for your bar — this is the reliability playbook.
Just want the architecture? Skip to Defense-in-Depth: Six Layers of “Please Don’t Call Me”.
$ cat backstory.md
What I’m Working With
The display is for Level Up Pinball at Midwest Aleworks — a pinball league at a craft brewery. The leaderboard app is a custom-built Flask application that rotates through player rankings, machine stats, and league standings on a portrait touchscreen.
The Hardware
| Component | Details |
|---|---|
| Device | Intel NUC8i7HVK |
| RAM | 19 GB |
| Storage | 117 GB SSD (12 GB used) |
| Display | WingCool touchscreen (portrait, 1080×1920) |
| Network | WiFi only (Intel Wireless 8265) |
| Total cost | $0 (repurposed hardware) |
The NUC is massively overkill for displaying a webpage. That’s the point — headroom means fewer resource-related failures. And it was sitting in a drawer, so the price was right.
The Constraints
These aren’t hypothetical. Every design decision flows from these realities:
Design Constraints
- WiFi only — No ethernet at the bar. The kiosk lives on the bar’s WiFi.
- Non-technical owner — Dennis runs a pinball league, not a server room. “SSH in and check the logs” is not an acceptable support procedure.
- 3 hours away — Physical intervention is a full-day commitment.
- Bar environment — Patrons will touch the screen. Power might flicker. WiFi might drop when 50 people show up for trivia night.
- Must survive reboots — Power outages happen. The kiosk comes back up clean with zero human intervention.
The design goal is simple: Dennis should never have to call me.
$ git log –oneline porteus/
The Kiosk OS That Wasn’t
My first instinct was to use a purpose-built kiosk operating system. Porteus Kiosk seemed perfect — it’s literally designed for this exact use case. Boot into a locked-down browser, display a URL, done.
And it worked. Really well, actually.
What Porteus Got Right
- Boot straight to Chrome in kiosk mode — no desktop, no taskbar, no escape
- Immutable filesystem — can’t be tampered with, survives dirty shutdowns
- Hardware watchdog support (iTCO_wdt) for auto-reboot on kernel hangs
- Tiny footprint — boots fast, runs lean
I had it fully operational: Chrome displaying the leaderboard, SSH reverse tunnel to my bastion host, browser refreshing daily at 5:30 PM, auto-detecting whether it was on my home network or deployed at the bar. The clever bit was a setup script hosted on the web server that ran at boot:
# Auto-detect network and connect to bastion
if ping -c 1 -W 2 192.168.86.110 >/dev/null 2>&1; then
# Home network -- connect directly
ssh -p 22 kiosk-pinball@Bastion-1
else
# External network -- use NAT rule
ssh -p 22999 [email protected]
fi
Smart. Portable. Worked from anywhere.
Where It Fell Apart
Then I read the fine print.
Problem 1: Licensing. Porteus Kiosk has a 30-day trial. After that, it’s €40/year per device, or the kiosk stops working. For a hobby project at a bar? Non-starter. There’s also a “Porteus Kiosk Server” option at €300/year, which somehow makes it worse.
Problem 2: The WiFi Bootstrap Trap. This was the real killer. Porteus bakes WiFi credentials into the ISO at build time. If the bar changes their WiFi password — or if I need to move the kiosk to a different location — the device becomes a brick. The only fix is to re-flash the USB drive with a new ISO containing the new credentials. From 3 hours away.
I spent about a week getting Porteus production-ready before discovering these issues. That week wasn’t wasted — I learned exactly what a kiosk needs to do. But the platform had to go.
USB Imaging Tools: What Works and What Doesn’t
| Tool | Result | Verdict |
|---|---|---|
dd | Creates non-bootable images | Avoid |
| Ventoy | Compatibility issues with Porteus Kiosk | Avoid |
| Balena Etcher | Works perfectly, first try | Use this |
I tried dd first (because of course I did), then Ventoy, then finally Balena Etcher. Should’ve started there.
$ diff porteus/ ubuntu/
The Full-OS Pivot
I scrapped Porteus and installed Ubuntu 24.04 LTS with KDE Plasma. A full desktop operating system. For a kiosk.
Sounds wrong, right? A kiosk should be locked down, minimal, purpose-built. Why would you put a full desktop environment on a device that only needs to display a webpage?
Because everything that went wrong with Porteus was about inflexibility. I need to:
- Change WiFi remotely (
nmcli— one command, done) - Debug issues with real tools (
systemd,journalctl,htop) - Install new software without re-imaging
- Let Dennis access a normal desktop if he needs to (unlikely, but possible)
- Apply security updates without rebuilding from scratch
The Two-User Design
This is the architectural decision that makes everything work:
kioskuser — Auto-login at boot. Launches Chrome in fullscreen kiosk mode immediately. No desktop, no taskbar, no escape hatch. This is what the display shows 99.9% of the time.adminuser — Full KDE desktop for maintenance. Only accessible via SSH-triggered mode switch. Password-protected.
Bar patrons see a kiosk. I see a Linux box I can manage remotely.
ssh root@kiosk "kiosk-to-admin". Curious patrons can’t accidentally (or intentionally) break out of the leaderboard display.
The Boot Flow
Power On → BIOS/UEFI → GRUB → Ubuntu
└&horz;&horz; SDDM (display manager)
├&horz;&horz; Runs Xsetup script (rotates display to portrait)
└&horz;&horz; Auto-login 'kiosk' user
└&horz;&horz; kiosk-session starts:
├&horz;&horz; xrandr → portrait rotation
├&horz;&horz; xinput → touchscreen calibration
├&horz;&horz; xset → disable screen blanking
├&horz;&horz; unclutter → hide cursor after 3 seconds
└&horz;&horz; Chrome --kiosk https://pinball.thedelay.com
Every step is automated. From a cold power-on to a fully displayed leaderboard: 90 seconds, zero human intervention.
$ cat resilience-stack.md
Defense-in-Depth: Six Layers of “Please Don’t Call Me”
This is the heart of the project. Each layer catches failures that the previous layer can’t handle. The philosophy: assume everything will break, and build automatic recovery at every level.
Catches: Browser memory leaks, stale WebSocket connections, accumulated DOM nodes from hours of scene rotation.
The leaderboard app counts complete scene rotation cycles. After 5 full cycles, it triggers a hard page reload. The transition is invisible — the display goes from the last scene right back to the first.
const AUTO_RELOAD_CYCLES = 5;
let completedCycles = 0;
if (currentSceneIndex === 0) {
completedCycles++;
if (completedCycles >= AUTO_RELOAD_CYCLES) {
setTimeout(() => location.reload(true), 1000);
return;
}
}
// handles 80% of browser issues before they become visible
Catches: Chrome process death, CPU pegging, memory runaway, HTTP unresponsiveness.
A systemd timer checks Chrome’s vitals every 5 minutes:
- CPU usage averaged over 3 samples — threshold: >90%
- Memory consumption — threshold: >4 GB
- HTTP health check to the leaderboard URL
- User activity check (won’t restart if someone’s using the touchscreen)
Recovery: Restarts SDDM, which kills the entire X session and auto-relaunches Chrome fresh via kiosk auto-login.
// the 10-minute idle threshold prevents restarting while Dennis is using the screen
Catches: X11 session state accumulation, graphics driver leaks, SDDM display amnesia.
Restarts SDDM at 2 AM, 8 AM, 2 PM, and 8 PM. Respects user activity — if someone’s interacting with the display, it defers and retries in 10 minutes.
# Respect user activity before refreshing
IDLE=$(DISPLAY=:0 xprintidle 2>/dev/null || echo 0)
if [ "$IDLE" -gt 600000 ]; then # 10 min in ms
systemctl restart sddm
else
# User active -- defer and retry
systemd-run --on-active=10min /usr/local/bin/kiosk-refresh
fi
// SDDM restart: ~5 seconds downtime vs. 90 seconds for a full reboot
Catches: Kernel panic. Complete system hang. CPU deadlock. Anything that makes userspace completely unreachable.
The Intel NUC has a hardware watchdog timer that operates independently of the operating system. A kernel module feeds the watchdog every few seconds. If the kernel stops feeding it (because it’s frozen), the hardware reboots the machine.
This is a physical timer on the motherboard. It doesn’t care about your kernel panic — it just reboots.
// the “nuclear option” that handles scenarios no software solution can address
Catches: POST hang, BIOS lockup, hardware failure the watchdog can’t recover from, or a reboot loop.
The NUC is plugged into a Kasa smart plug (~$30). BIOS is set to “Power On After AC Loss.” Remote recovery procedure:
- Open the Kasa app on your phone
- Turn OFF the “Pinball Kiosk” plug
- Wait 10 seconds
- Turn ON the plug
- NUC auto-boots, kiosk is live in ~90 seconds
// the only layer that requires human intervention — but from my phone, not a 6-hour road trip
Catches: Kernel-level memory fragmentation, slab cache growth, file descriptor leaks that survive process restarts — the slow drift that causes mysterious failures after weeks of uptime.
Full system reboot when the bar is closed. Total recovery time: 90 seconds. Nobody’s watching at 4 AM.
// sometimes the best fix is turning it off and on again. on a schedule.
The Full Stack
Layer 1: App Reload every ~30 min (self-healing)
Layer 2: Chrome Watchdog every 5 min (detect + restart)
Layer 3: Session Refresh every 6 hours (proactive reset)
Layer 4: Hardware Watchdog 15-sec timeout (kernel-level reboot)
Layer 5: Smart Plug on-demand (remote power cycle)
Layer 6: Weekly Reboot Sunday 4 AM (clean slate)
Each layer is independent. If Layer 1 fails, Layer 2 catches it. If Layer 2 fails, Layer 3 catches it. If the entire OS freezes (Layers 1–3 all useless), Layer 4 kicks in. If even the hardware watchdog can’t recover, Layer 5 lets me power-cycle remotely. And Layer 6 prevents the slow drift that causes weird failures after weeks of uptime.
$ ssh -J bastion kiosk
Remote Access Architecture
The kiosk is WiFi-only, behind a NAT, at a location I don’t control. Getting reliable remote access required some creative networking.
Three Tiers of Access
When the kiosk is on my home network during testing:
ssh admin@kiosk-ip
// simple, fast, no tricks needed
At the bar, the kiosk sits behind a NAT I can’t control. Solution: the kiosk reaches out to my bastion host and creates a reverse tunnel.
# Kiosk initiates outbound SSH to bastion
# Creates reverse tunnel on port 2222
Kiosk (bar WiFi) --SSH--> Bastion-1 (port 22999) --reverse--> port 2222
# To reach the kiosk from anywhere:
ssh -p 2222 root@Bastion-1
The tunnel runs as a systemd service with a watchdog timer that checks health every 10 minutes and auto-recovers with up to 3 retry attempts.
// no port forwarding needed on the bar’s router
For visual troubleshooting — when I need to see what’s actually on the screen, interact with the touchscreen, or switch modes with a full GUI.
// because sometimes you need to see it to fix it
The NAT Hairpin Problem
Here’s a fun networking gotcha: when the kiosk is on my home network, it can’t connect to the bastion via the external address because of NAT hairpin issues on the router.
The fix? Two versions of the tunnel service — one for home (direct LAN connection) and one for production (external NAT rule). The auto-detection script handles this transparently:
if ping -c 1 -W 2 192.168.86.110 >/dev/null 2>&1; then
# Home network -- use direct LAN path
ssh -p 22 kiosk-pinball@Bastion-1
else
# External -- use NAT rule
ssh -p 22999 [email protected]
fi
Same kiosk, works from anywhere, no manual configuration changes.
$ cat /var/log/quirks.log
The Devil’s in the Details
Portrait Mode Touchscreen Calibration
The WingCool touchscreen registers as two input devices with identical names. When you rotate the display to portrait mode with xrandr, the touchscreen input doesn’t automatically follow — you have to apply a coordinate transformation matrix to remap the touch input.
And you have to apply it to both devices, because you don’t know which one is active:
for id in $(xinput list | grep 'WingCool Inc. TouchScreen' \
| grep -oP 'id=\K\d+'); do
xinput set-prop "$id" 'Coordinate Transformation Matrix' \
0 1 0 -1 0 1 0 0 1
done
That matrix (0 1 0 -1 0 1 0 0 1) rotates touch input 90 degrees counterclockwise. I didn’t figure that out from documentation — I found it by trial and error after the touchscreen was responding to taps 90 degrees off from where I was actually touching.
Performance Tuning for “Set It and Forget It”
The NUC has 19 GB of RAM but only uses 1.1 GB in kiosk mode. Still, I stripped out everything that could cause problems over weeks of uptime:
Full tuning list
| Optimization | Why |
|---|---|
| Baloo (file indexer) disabled | Pointless CPU usage on a kiosk |
| Swappiness set to 10 | Keep everything in RAM |
| Apport (crash reporter) disabled | Nobody reads crash reports here |
| Journal size limited to 100MB | Prevent logs from eating the SSD |
| Firefox & Thunderbird snaps removed | Unnecessary attack surface |
| CUPS (printing) disabled | It’s a kiosk, not a print server |
| Bluetooth disabled | WiFi-only device |
| Sleep/suspend/hibernate masked | The display never sleeps |
| Chrome cache limited to 100MB | Prevent unbounded growth |
| Weekly cleanup cron | Clears caches, temp files, vacuums journals |
Monitoring
The kiosk pushes health metrics to Uptime Kuma every 5 minutes:
disk:23%,mem:17.8GB,chrome:running,tunnel:up
If disk usage exceeds 85%, available memory drops below 1 GB, Chrome dies, or the tunnel goes down, Uptime Kuma flags the kiosk as DOWN and I get an alert via push notification and email.
There’s also a keyword monitor that fetches the leaderboard URL and checks for specific text in the response — proving the app is actually rendering, not just returning a network-level “OK.”
$ cat lessons-learned.md
What I’d Do Differently
- Evaluate licensing before building. I spent a week perfecting a Porteus deployment before discovering the €40/year license. Five minutes of reading the pricing page would have saved five days of work. Always check the business model first.
- “Kiosk OS” is a marketing term, not an architecture. What I actually needed was a locked-down Linux box with auto-login and Chrome in fullscreen. That’s about 20 lines of systemd configuration. You don’t need a special OS for that.
- Assume the network will change. The WiFi credentials baked into an immutable ISO was the critical failure. Any production kiosk design needs a way to change network settings remotely. Ubuntu +
nmclisolved this trivially. - Idle detection before any restart. Early versions of my watchdog scripts would restart SDDM blindly. If someone was using the touchscreen, they’d get kicked back to the login screen. Adding a 10-minute idle threshold with
xprintidlefixed this completely. - Test your USB imaging tool before you need it.
dd, the “universal” tool, produced non-bootable images. Ventoy had compatibility issues. Balena Etcher worked first try. Don’t assume your go-to tool works for every use case. - Layer your resilience; don’t rely on any single mechanism. If I’d only built the Chrome watchdog, a kernel panic would still take the kiosk offline. If I’d only built the hardware watchdog, a Chrome memory leak would show a white screen until the reboot. Each layer covers a different failure mode.
$ cat invoice.txt
Total Cost
| Item | Cost |
|---|---|
| Intel NUC8i7HVK (repurposed) | $0 |
| Ubuntu 24.04 LTS | $0 |
| Kasa Smart Plug (KP115) | ~$30 |
| Licensing fees | $0 |
| Monthly subscriptions | $0 |
| Total | ~$30 |
$ exit
The Result
The kiosk has been running for weeks. Dennis hasn’t called me. The monitoring dashboard shows steady green. When Chrome occasionally gets sluggish, the watchdog catches it. When the bar loses power during a storm, the NUC boots back up, auto-logins, launches Chrome, and the leaderboard is back before anyone notices.
The best infrastructure is the kind you forget about. Build it right, layer the recovery, and go enjoy league night.
