Back to Projects

B.Nye Real-Time Data Pipeline

Autonomous ETL + Fuzzy Entity Resolution

PythonPlaywrightGitHub ActionsNCAA APIReal-Time Data + Functional Micro-Interactions

Real-time data from external APIs arrives with inconsistent entity names, unpredictable update frequencies, and no guaranteed schema. I built an autonomous pipeline that resolves entities via 3-pass fuzzy matching, syncs every 5 minutes via GitHub Actions, and ran for 6 weeks with zero manual intervention — 912 autonomous commits, 95% test coverage.

912

Auto Commits

5 min

Sync Interval

80+

Entity Pool

95%

Test Coverage

Problem

Real-time data from external APIs arrived with inconsistent entity names, unpredictable update frequencies, and no guaranteed schema. The system needed to resolve entities across sources, track state changes, and serve a live dashboard — autonomously, for 6 weeks, with zero manual intervention.

Solution

I built an automated sync pipeline that polls every 5 minutes via GitHub Actions, resolves entity names using 3-pass fuzzy matching (exact name → last name + team → normalized variants), tracks elimination state, and generates dashboard-ready JSON. 912 autonomous commits over 6 weeks. Zero manual fixes required.

Architecture

NCAA API (box scores)
    │
    ▼
┌──────────────┐    ┌──────────────┐
│  Fuzzy Entity │───▶│ Score Engine  │
│   Resolver    │    │ (live/final)  │
└──────────────┘    └──────┬───────┘
                           │
                    ┌──────▼───────┐
                    │ Feed Builder  │
                    │ (JSON gen)    │
                    └──────┬───────┘
                           │
              ┌────────────┼────────────┐
              ▼            ▼            ▼
        leaderboard    games.json   meta.json
           .json           │
              │            │
              ▼            ▼
        ┌─────────────────────┐
        │   Live Dashboard     │
        │  (score animations)  │
        └─────────────────────┘
              ▲
              │
        GitHub Actions (5-min cron)
        912 autonomous commits

System Components

Integration

NCAA API Client

Fetches game results and box scores per round

Service

Fuzzy Entity Resolver

3-pass name matching: exact → last name + team → normalized (strips Jr/Sr/III)

Service

Score Engine

Merges round scores, tracks live vs. final, handles elimination detection

Service

Feed Builder

Generates leaderboard.json, games.json, meta.json from raw scores

Integration

GitHub Actions CI/CD

5-minute cron sync — 912 autonomous commits over tournament duration

Service

Live Dashboard

Vanilla JS SPA with score flash animations, expandable detail panels, projected finish

Key Engineering

Fuzzy Entity Resolution

NCAA box scores use inconsistent player name formats. A 3-pass matching algorithm resolves entities across sources without manual mapping.

def match_player(box_name: str, drafted: list[Player]) -> Player | None:
    # Pass 1: Exact full name
    for p in drafted:
        if normalize(p.name) == normalize(box_name):
            return p

    # Pass 2: Last name + team
    box_last = strip_suffix(box_name.split()[-1])  # Jr, Sr, II, III
    for p in drafted:
        p_last = strip_suffix(p.name.split()[-1])
        if box_last == p_last and teams_match(box_team, p.team):
            return p

    # Pass 3: Normalized team variants
    # "Miami (FL)" == "Miami FL" == "Miami"
    return fuzzy_team_match(box_name, box_team, drafted)

Autonomous CI/CD Pipeline

GitHub Actions runs every 5 minutes during tournament windows. 912 commits generated autonomously — zero manual intervention from first tip to championship.

Live Score Animations

Dashboard detects score changes between fetches and triggers CSS animations — pink highlight flash on update, float-up delta badges (+X points), and projected finish recalculation.

Elimination Tracking

When a team loses, all drafted players from that team are marked eliminated. The pipeline detects this from game results and updates the leaderboard in real-time.