B.Nye Real-Time Data Pipeline
Autonomous ETL + Fuzzy Entity Resolution
Real-time data from external APIs arrives with inconsistent entity names, unpredictable update frequencies, and no guaranteed schema. I built an autonomous pipeline that resolves entities via 3-pass fuzzy matching, syncs every 5 minutes via GitHub Actions, and ran for 6 weeks with zero manual intervention — 912 autonomous commits, 95% test coverage.
912
Auto Commits
5 min
Sync Interval
80+
Entity Pool
95%
Test Coverage
Problem
Real-time data from external APIs arrived with inconsistent entity names, unpredictable update frequencies, and no guaranteed schema. The system needed to resolve entities across sources, track state changes, and serve a live dashboard — autonomously, for 6 weeks, with zero manual intervention.
Solution
I built an automated sync pipeline that polls every 5 minutes via GitHub Actions, resolves entity names using 3-pass fuzzy matching (exact name → last name + team → normalized variants), tracks elimination state, and generates dashboard-ready JSON. 912 autonomous commits over 6 weeks. Zero manual fixes required.
Architecture
NCAA API (box scores)
│
▼
┌──────────────┐ ┌──────────────┐
│ Fuzzy Entity │───▶│ Score Engine │
│ Resolver │ │ (live/final) │
└──────────────┘ └──────┬───────┘
│
┌──────▼───────┐
│ Feed Builder │
│ (JSON gen) │
└──────┬───────┘
│
┌────────────┼────────────┐
▼ ▼ ▼
leaderboard games.json meta.json
.json │
│ │
▼ ▼
┌─────────────────────┐
│ Live Dashboard │
│ (score animations) │
└─────────────────────┘
▲
│
GitHub Actions (5-min cron)
912 autonomous commitsSystem Components
NCAA API Client
Fetches game results and box scores per round
Fuzzy Entity Resolver
3-pass name matching: exact → last name + team → normalized (strips Jr/Sr/III)
Score Engine
Merges round scores, tracks live vs. final, handles elimination detection
Feed Builder
Generates leaderboard.json, games.json, meta.json from raw scores
GitHub Actions CI/CD
5-minute cron sync — 912 autonomous commits over tournament duration
Live Dashboard
Vanilla JS SPA with score flash animations, expandable detail panels, projected finish
Key Engineering
Fuzzy Entity Resolution
NCAA box scores use inconsistent player name formats. A 3-pass matching algorithm resolves entities across sources without manual mapping.
def match_player(box_name: str, drafted: list[Player]) -> Player | None:
# Pass 1: Exact full name
for p in drafted:
if normalize(p.name) == normalize(box_name):
return p
# Pass 2: Last name + team
box_last = strip_suffix(box_name.split()[-1]) # Jr, Sr, II, III
for p in drafted:
p_last = strip_suffix(p.name.split()[-1])
if box_last == p_last and teams_match(box_team, p.team):
return p
# Pass 3: Normalized team variants
# "Miami (FL)" == "Miami FL" == "Miami"
return fuzzy_team_match(box_name, box_team, drafted)Autonomous CI/CD Pipeline
GitHub Actions runs every 5 minutes during tournament windows. 912 commits generated autonomously — zero manual intervention from first tip to championship.
Live Score Animations
Dashboard detects score changes between fetches and triggers CSS animations — pink highlight flash on update, float-up delta badges (+X points), and projected finish recalculation.
Elimination Tracking
When a team loses, all drafted players from that team are marked eliminated. The pipeline detects this from game results and updates the leaderboard in real-time.