Design Lead · Surf Window · 2026

From Raw Forecast Data to Actionable Surf Decisions

Surf Window is a production surf alert PWA for Oʻahu: five forecast APIs, a bespoke design system, a multi-factor scoring engine, and a full alert pipeline. I led design across product, system architecture, and UX, working closely with engineering to translate inconsistent, multi-source data into clear, actionable decisions for surfers.

This case study is about the methodology behind that system and what tight design-engineering alignment made possible.

Role	Design lead: product, system, UX, research
Collaborator	Senior engineer: infrastructure, backend
Tools	Claude Code/Codex · Github Actions · Chrome DevTools MCP · Playwright
Platform	Next.js · TypeScript · Supabase · Vercel
Scope	Onboarding · surf report · alerts · email · design system · PWA

The surf report page: synthesizing swell, wind, tide, and wave height into a readable, decision-ready interface.

Methodology

From fragmented data to reliable decisions

Most case studies focus on outputs: screens, flows, and components.

This project required something different: a system for turning inconsistent, multi-source forecast data into reliable, user-facing decisions. The work wasn’t just interface design. It was defining how the system behaves.

Research + Prototyping

AI compressed the timeline. Domain expertise shaped the output.

Early in the project, I needed to understand five different surf forecast APIs, their data models, their reliability characteristics, and how to combine them into a single reliable surf signal. I used Claude Code to synthesize documentation, generate initial data contracts, and scaffold UI components against specs I wrote.

But the AI couldn’t tell me that 2–4ft is beginner-friendly at Waikiki and overhead at Pipe. It couldn’t decide which of the five sources to trust when they disagreed. It couldn’t define what “a good surf alert” means to a surfer who wants to be in the water by 7am.

Domain expertise was the constraint that shaped everything the AI produced. The faster the AI-assisted workflow became, the more time I had to think.

To make scoring decisions measurable rather than guessed, I built a structured scoring system that evaluates wave height, swell period, swell direction, wind speed and direction, and tide. Each factor contributes to a composite score, calibrated per location. I created a score-calibration workflow: a structured process that runs NOAA buoy backtesting, interprets the output, and updates spot quality profiles. With this workflow, scoring decisions can be trusted against actual buoy measurements.

Design methodology rigor

# Score Calibration

Run the backtesting pipeline, interpret the output, and decide
whether lib/scoring/spot-quality-profiles.ts needs adjustment.

## Two backtesting tools

| Script               | What it does                                          |
|----------------------|-------------------------------------------------------|
| run-buoy-diff.ts     | Diffs Stormglass + Open-Meteo against NDBC buoy       |
|                      | readings. Detects bias in swell height, period,       |
|                      | direction. Output: buoy-diff-report.json + .csv       |
| run-oahu-review.ts   | Cross-references DB forecasts against Surfline data.  |
|                      | Produces per-spot scoring rows with source calls.     |
|                      | Output: oahu-review-report.json                       |

Coverage: NDBC Station 51201 (Waimea Bay) covers North Shore
spots (001–008). South Shore (Diamond Head, Ala Moana Bowls:
009–010) has no nearby buoy; flag for manual review.

## Step 1: Confirm environment

node --version   # must support --experimental-strip-types
cat .env.local | grep SUPABASE  # SERVICE_ROLE_KEY required

## Step 2: Run the buoy diff

node --env-file=.env.local --experimental-strip-types \
  lib/scoring/backtesting/run-buoy-diff.ts --days 30

The score-calibrationworkflow: a structured process that runs NOAA buoy backtesting, interprets the output, and updates spot quality profiles. Scoring decisions weren’t judgment calls; they were measurements.

Rapid Iteration

Seven explorations in the time it normally takes to build one

The hardest design problem wasn’t the design system or the scoring engine. It was figuring out how to present six variables: wave height, swell period, direction, wind, tide, and a composite score, in a way that even a beginner surfer could understand and make a decision quickly.

Before I could design that interface, I needed to understand the raw data points: five forecast APIs with different data models, inconsistent units, and conflicting values. I used Claude to synthesize their documentation into a single typed data contract. That contract became the foundation everything else was built on.

With the data model settled, I moved into design exploration. I wrote a brief for each direction: information hierarchy, layout principle, primary interaction model, and used Claude to scaffold implementations quickly. Seven structurally distinct prototypes, each wired to real data.

Having seven high-fidelity prototypes instead of seven wireframes meant user testing could evaluate real information architectures. The decision cycle compressed from weeks of ambiguity to a structured test loop.

The constraint that kept this rigorous: Claude executed the briefs I wrote. The explorations were only as sharp as my thinking about what each direction was optimizing for.

Synthesizing five APIs into one contract

Field	Stormglass	Open-Meteo	OM Marine	PacIOOS SWAN	NOAA CO-OPS	→ ForecastUpsertRow
Swell Height	swellHeightm · multi-model	swell_wave_heightm	swell_wave_heightm	shgtm	—	swell_height_ftft
Swell Period	swellPerioddwd › noaa › sg	swell_wave_peak_periods	swell_wave_periods	mpers	—	swell_period_ss
Swell Direction	swellDirection°	swell_wave_direction°	swell_wave_direction°	mdir°	—	swell_direction_deg°
Wind Speed	windSpeedm/s	wind_speed_10mm/s	—	—	—	wind_speed_ktskts
Wind Direction	windDirection°	wind_direction_10m°	—	—	—	wind_direction_deg°
Tide Height	seaLevelm	—	—	—	vstring · ft · MLLW	tide_height_ftft

Three naming conventions. Two unit systems. Stormglass returns per-field objects across multiple forecast models — the normalizer picks dwd › noaa › sg for period values, validated against Surfline backtesting. NOAA CO-OPS tide arrives as a formatted string at datum MLLW. Everything normalizes to one typed row.

Seven explorations, one decision

Seven surf report design explorations: Quick Glance Dashboard, Sectioned Briefing, Single-Chart Outlook, Signal Stack, Briefing Board, Decision Rail, and Interactive Signal Charts

I wrote the brief for each direction. Claude scaffolded the implementation. Seven prototypes in the time I’d normally spend building one.

System Design

A design system with automated CI enforcement (Github Actions Workflow) is a design governance system.

A key early decision: every design choice would live in a shared token contract and not in static design files.

Single source of truth: tokens.v1.json. From this file, CSS custom properties are generated for the UI and TypeScript constants are generated for logic and tests. Nothing is manually edited downstream. If outputs drift, the build fails. This reduces the amount of time we had to spend doing manual QA.

This wasn’t engineering for its own sake. It eliminated ambiguity:

Engineers didn’t need to ask for design clarification
Design updates propagated instantly across the system
No visual drift between design and implementation

Design decisions became enforceable.

Three CI gates run on every PR touching the design system: check-tokens.mjs (contract integrity), check-contrast.mjs (WCAG enforcement), check-parity-scoreboard.mjs (component-to-usage alignment). Design quality is enforced by the system, not dependent on manual QA review cycles.

The pipeline

Token pipeline: tokens.v1.json → export-tokens.mjs → tokens.css and tokens.ts. Drift fails CI.

tokens.v1.json → export-tokens.mjs → tokens.css + tokens.ts. Drift fails CI.

Validation

I designed the testing workflow. The tools executed it.

Testing this system required more than visual QA. Key flows: authenticated user states, alert generation, preference updates: depend on real data and real conditions.

I designed a repeatable testing workflow with two layers:

UI verification

Verifying authenticated UI normally requires a manual browser session. I built a structured workflow: spin up an authenticated session against the live Supabase backend, scroll to the exact component, capture before/after screenshots, and inspect network requests and executed via Chrome DevTools MCP. I defined what “correct” looked like. The tooling executed the steps.

System-level validation

Playwright handled E2E regression safety on the full alert pipeline, including a production integration test harness that provisions a test environment, seeds real data, triggers the cron pipeline, and verifies that the right email reaches the right user.

The CI checks automated the rules. The structured workflows automated the execution. I owned the judgment.

Authenticated UI verification

Preference editor showing wave height, wind, and tide settings for authenticated user session

UI fix verification via Chrome DevTools: authenticated session, targeted component scroll, before/after screenshots, network request inspection.

Collaboration loop: Kyle (design direction) → Claude Code (execution agent) → Engineer (infrastructure), with feedback arrows between each

This ran at every scale: feature-level (one skill per Linear issue), system-level (design system CI), and product-level (SW-35 production harness).

AI fluency requires more design rigor, not less. The better I got at writing structured workflows, personas, and constraints, the better the output became. Vague prompts produced vague results. The workflows are the design work.

Design System

A design system built as a contract, not a library

Most design systems start with components. This one started with constraints.

The pipeline: tokens.v1.json → export scripts → tokens.css + tokens.ts. Tokens define all visual decisions. Outputs are generated automatically. Manual overrides are not allowed. Drift fails CI.

Three CI gates: Every PR touching the design system runs three automated checks: token contract integrity, WCAG contrast enforcement, and component-to-usage parity. Design decisions are enforced by the build, not by review.

The versioning arc: The design system shipped in three versions. v1.0 established the token source and initial component set. v1.1 integrated it into product routes and added CI. v1.2 added six core primitives (Button, Input, Select, Card, Stack, Section) and moved all control styling out of global CSS into component-owned modules. Each version is a design decision, not a code release.

Accessibility built in: 44px minimum touch targets across all interactive primitives. Non-color affordances for all state-bearing UI. Foreground/background token pairs on status chips enforce readable contrast rather than relying on per-component review. An accessibility matrix lives as a maintained document and not a one-time audit.

Component model: Each component is defined by a contract before implementation. Design and code evolve together as a single artifact.

CI gate output for every PR

$ npm run ci:design-system

> tokens:check
Design system token checks passed.

> contrast:check
PASS: Primary text on default surface (11.73:1)
PASS: Secondary text on card surface (5.42:1)
PASS: Status sent text on sent background (5.89:1)
PASS: Status delivered text on delivered background (5.42:1)
PASS: Status failed text on failed background (7.02:1)
PASS: Status pending text on pending background (4.58:1)
PASS: White text on interactive primary (5.42:1)
All WCAG contrast checks passed.

> a11y:check
Accessibility contract checks passed.

> parity:check
Parity scoreboard is in sync.

10+ components, each defined by a contracts.ts before implementation. Design + code as one artifact.

Component gallery: all design system components including Spot Card, Alert Card, Button, Select, Toggle Row, History Item, Range Slider, and Status Chips

Outcomes

What shipped

Production alert pipeline (3-hour cron cycles across five forecast sources)
Multi-factor scoring engine calibrated against real conditions
Design system at v1.2:10+ components, 3 CI gates, accessibility matrix
Full user flows: onboarding · spot catalog · surf report · preferences · alert history · unsubscribe
React Email template with its own TypeScript compilation pipeline
Production integration test harness: provision, seed, run, verify
PWA with installable manifest, OG/Twitter image generation, full responsive layout

The result wasn’t just a product. It was a system that could be tested, validated, and iterated with confidence.

What I learned

This project changed how I define design work

Contracts-first thinking

Defining shared structures early felt like overhead. It eliminated nearly all friction later. Shared contracts created a shared language between design and engineering. The upfront investment paid off across every week that followed.

Speed increases the cost of unclear thinking

As execution became faster, clarity became more important. Tools can accelerate output, but they don’t define what “good” means. That still requires judgment, constraints, and iteration against real-world behavior.

Designing meaning is harder than designing UI

The hardest problems weren’t visual. They were behavioral: When should an alert fire? What’s “one alert per day” when a user is in Honolulu and the cron runs in UTC? What defines a “good window”? These decisions shaped the product more than any interface detail. Tools executed them once they were specified. Specifying them was the real design work.

Surf Window taught me that AI raises the stakes for good design thinking. The clarity of your framework determines the quality of your output.