Project Proposal — May 2026

Driver Performance Platform Upgrade

Replacing the broken survey dashboard with a proper, purpose-built customer service improvement tool inside command.p2ops.com

Prepared by Jane, Chief of Staff Input from Athena · Meg · Edna · Coach Beard May 16, 2026

Why we're doing this now

The old survey dashboard hit a hard wall. This is the right moment to build it correctly rather than patch something that was already overdue for retirement.

32 MiB
survey_data.json current size
25 MiB
Cloudflare Pages file size limit
52K
Total records (all carriers)
⚠ Dashboard is currently dark
survey.p2ops.com cannot deploy. The leaderboard is frozen on data from May 14. Every day this goes unfixed is a day Kirk and managers are flying blind on driver performance. This is not a future problem — it's active right now.
Why not just patch it?

A pre-aggregation fix (Option 3 from the original analysis) would work technically, but it buys maybe 60 days before the problem returns, wastes Meg's build time on a system already marked for sunset, and leaves the dashboard serving a 216KB single-file page that Meg called "embarrassing for external partners." The smart move is to build it once and build it right.

The ops dashboard at command.p2ops.com is live, stable, and already where Kirk spends his time. Survey features belong there.

✓ The real data picture (from Athena's analysis)
The 52K-record file is 99.5% competitor data. P2's actual survey records are 244 records across 28 drivers — the four MDOs combined. At the actual P2 volume (~66 records/day for the whole network), D1's free tier handles this comfortably for the next 18+ months. This is a clean, right-sized migration.

The Driver Performance section

A new primary tab in command.p2ops.com. Not a replacement dashboard — an integrated feature that lives where operations actually happens. Kirk's framing: "a useful, motivating tool to help us improve customer service."

command.p2ops.com
Service Area Map
Ops Dashboard
Driver Performance
Leaderboard
Categories
Markets
Trends
Comments
Market: All ▾ Date: Last 90 days ▾ 28 drivers · 244 surveys
01
Marcus Williams
Hurricane
9.4
↑ +0.3
47 surveys
02
David Reyes
Gypsum
9.2
↑ +0.5
31 surveys
03
James Carter
Wenatchee
9.1
→ 0.0
28 surveys
04
Luis Ortega
Ukiah
8.7
↑ +0.8
12 surveys
24 more drivers ↓

Wireframe — illustrative data only. Design: Edna Mode.

🏆
Driver Leaderboard Phase 1
Bayesian confidence-weighted composite scores. Trend arrow is the visual hero — rank is secondary. Market and date range filters. Top 3 get a distinguished visual treatment. No red for low scorers. Clicking a driver opens their detail page.
📊
Category Analysis Phase 1
Which delivery categories (item types) score lowest across all drivers. Surfaces training gaps. Bar chart in v1; heat map possible in v2. Answers: "Where do we keep losing points?"
🗺️
Market View Phase 1
Rankings and survey volume by MDO — Hurricane, Gypsum, Ukiah, Wenatchee side by side. Puts market-level differences in context. Useful for MDO-level coaching conversations.
💬
Comments Browser Phase 1
Browse customer verbatim feedback. Filter by driver, market, auto-tag, date. Negative reviews get a red left border — clearly visible, not alarming. Auto-classification at ingest (no manual tagging UI needed).
👤
Driver Detail Page Phase 1
Click any driver to see: composite score, 12-week trend, category breakdown, all their comments. URL-addressable (/performance/drivers/[id]) — linkable from a coaching conversation or text. Information hierarchy: score → trend → categories → comments.
📈
Trend Lines Deferred — Phase 2
Weekly score trends per driver and market. Deferred by recommendation: P2 data only goes back 6 weeks. Trend charts on thin data tell misleading stories. Build after 3 more months of accumulation — the feature is right, the timing is wrong.

How it's built

Clean migration from a broken flat-file system to a proper data store. Survey queries live in Pages Functions with a direct D1 binding — no Mac Mini dependency for reads.

📦 Data Store: Cloudflare D1

SQLite-based, serverless, no infrastructure to manage. Three-layer schema:

  • Dimension tables: mdos, drivers
  • Fact table: surveys — write-once, UNIQUE on order_number, idempotent ingest
  • 4 aggregate tables: driver lifetime, driver weekly, category weekly, MDO weekly
  • Comments queried live (paginated) — everything else reads from aggregates
🔌 Query Path: Pages Functions → D1

Survey endpoints live in functions/survey/*.js — separate from the existing API proxy. D1 binding is native; no HTTP calls to FastAPI backend required.

  • Leaderboard load: 1 row/driver from aggregate — near-instant
  • Market view: handful of rows from MDO aggregate — near-instant
  • Comments: paginated live query with indexes — fine at this scale
  • CF Access coverage is automatic (covers whole domain)
🔢 Bayesian Scoring (pre-computed at ingest)

Formula: (C × global_mean + n × driver_mean) / (C + n) — where C=10, global mean=4.52 (from actual data). A driver with 3 surveys barely moves the needle; 50 surveys converges to their raw mean. This prevents a driver with 2 five-star reviews from appearing above a driver with 50 consistent reviews. Recalculate global mean quarterly.

Scores are computed in Python at ingest time and stored. Dashboard reads a number, never recomputes.

📤 Migration: 244 P2 records → D1 in <5 minutes

Python prep script generates batch SQL files from the existing JSON, loaded via wrangler d1 execute. UNIQUE(order_number) constraint means re-running the migration is safe. Nightly processor (process_surveys.py) updated to write new records to D1 with the same dedup logic. JSON kept as parallel write for 30 days as safety net, then sunset.

⚠ Data quality issues to fix at migration
Athena found two structural issues that must be resolved before D1 goes live: (1) Carrier name normalization — the main JSON has carrier_normalized = "TopHat" for P2 records while the monthly _ours files use "P2 Last Mile - Ukiah", "Two Phillips Enterprises (Hurricane)", etc. These must be reconciled or the leaderboard splits P2 records across phantom carriers. (2) The 52K-record master file covers only 43 days (March 31–May 2026). P2's full history lives in the monthly _ours files going back to May 2025 — migration must include both, or the leaderboard starts from scratch in late March.

Built to motivate, not surveil

The design problem is real: a ranking tool inherently creates winners and losers. The goal is to make the winners feel recognized and the improvers feel seen — not to create a scoreboard that demoralizes.

⚠ Design decision required — Kirk's call
Edna recommends a ranked list (01, 02, 03…) with trend arrows as the visual hero — rank is secondary, movement is primary. Coach Beard recommends replacing numeric ranks entirely with tiers (High Performer / Solid / Developing) so drivers never see a position number. These are compatible on the management view but diverge on the driver-facing view. Kirk needs to decide: does the leaderboard show a ranked list or a tiered view? Both approaches are buildable. This decision affects the Phase 2 leaderboard spec before Meg touches it.
✓ What we're doing
  • Trend arrow is the visual hero — rank number is secondary
  • "Most Improved" sort as a secondary option alongside Score
  • Top 3: rank number in gold/silver/steel, 1px left border — understated recognition, no glow effects
  • Low scorers: score in secondary text color — they recede, not highlighted in red
  • Red left border stripe on negative comments only — visible but not alarming
  • Driver detail: trend first, then category breakdown, then comments
  • Sub-nav: pill-fill style, distinct from primary gold-underline nav
✗ What we're NOT doing
  • No gradient glow medals — cheap gamification, not recognition
  • No red rows for low-scoring drivers
  • No "wall of shame" framing — ever
  • No manual tag system (write endpoint + moderation UI = security surface + dev days)
  • No icon-only sub-nav on mobile (icons are ambiguous for this content)
  • No heatmap in v1 (deferrable — bar chart covers it adequately)
  • No keyword search in Comments v1 (D1 SQLite doesn't expose FTS5 in Workers runtime)
📱 Mobile considerations

Kirk checks this from his phone. Mobile-first decisions: sub-nav stays as text labels with horizontal scroll (no icons — they're ambiguous on a performance tool). Leaderboard collapses to 4 columns on mobile: RANK · DRIVER · SCORE · TREND. Category heatmap hidden on mobile with a tooltip directing to desktop. Comments browser works well on mobile — card format is naturally responsive.

🎨 Fits the existing platform

Same dark UI, P2 blue (#3b8def), same token system as the rest of command.p2ops.com. The Survey Performance tab slots into the existing nav as the third primary item, filling the "+" placeholder. No new design language — no visual discontinuity for the user.

Phased delivery — ~16 dev days over 4–6 weeks

Leaderboard is the first deliverable. Comments and Driver Detail follow. Trend lines wait for data accumulation. survey.p2ops.com is redirected only after the new section is verified.

1
Foundation
~3 dev days · Week 1
  • D1 database creation, schema, indexes
  • Historical migration: P2 records from _ours monthly files + master JSON
  • Carrier name normalization pass before D1 ingest
  • process_surveys.py updated to write to D1 nightly
  • wrangler.toml updated with D1 binding (carefully — not rushed)
  • Confirm CF Access active on command.p2ops.com before any feature work
2
Leaderboard, Market View, Category Analysis
~4 dev days · Weeks 2–3
  • Pages Functions survey endpoints (leaderboard, market, category)
  • Leaderboard UI with trend arrow hero, filters, top-3 treatment
  • Market View — per-MDO rankings and volumes
  • Category Analysis — bar chart by item type
  • Validate Bayesian scoring against existing survey.p2ops.com output before deploy
  • Coach Beard sign-off required before this ships
3
Driver Detail + Comments Browser + Nav Integration
~4 dev days · Weeks 3–4
  • Driver Detail page (URL-addressable: /performance/drivers/[id])
  • Comments Browser with auto-classification tags and sentiment filters
  • Auto-classifier in process_surveys.py: low_score, damage, timeliness_concern, positive
  • Full nav integration into command.p2ops.com
  • Persistent filter state across sub-nav tabs
4
QA + Sunset Old Dashboard
~2.5 dev days · Week 5
  • End-to-end testing across all four markets
  • Mobile QA — leaderboard, comments, filters
  • survey.p2ops.com → redirect to command.p2ops.com/performance
  • JSON parallel write stopped, D1 is sole source of truth
  • Deploy notes and runbook updated
5
Trend Lines — Phase 2 (Deferred)
~2 dev days · August+ 2026
  • Build after 3 additional months of P2 survey data accumulation
  • Weekly trend charts per driver and per market
  • The feature is right — the data isn't ready yet
Phase Deliverable Estimate Gate
Foundation D1 schema, migration, nightly writer 3 days Gandalf security review
Tier 1 Leaderboard, Markets, Categories 4 days Coach Beard leaderboard sign-off
Tier 2 Driver Detail, Comments, Nav 4 days None blocking (builds on Tier 1)
QA + Sunset Testing, redirect, cleanup 2.5 days Kirk confirmation before redirect
Trend Lines Weekly trend views 2 days Deferred to August 2026+

What the specialists said

Each agent went deep on their domain. These are their actual findings, not summaries of assumptions.

🔬
Athena
Business Intelligence & Data
Complete

The 99.5% revelation: The 52K-record file is almost entirely competitor data. P2's share is 244 records across 28 drivers — across all four MDOs combined. This changes the D1 scope entirely: the database stays small, query performance is trivial, and PII exposure is minimal.

On Bayesian scoring: Formula confirmed — C=10, global mean=4.52 from the full P2 dataset. A driver needs ~50 surveys to converge to their raw mean. This is the right formula; the implementation must be validated against the existing leaderboard before cutover.

Critical migration finding: The main survey_data.json only covers March 31–May 2026 (43 days). P2's full history going back to May 2025 lives in the monthly _ours files. Migration must include both or the new leaderboard starts with 6 weeks of history instead of 12 months.

Most valuable analysis we're not doing: Score-to-pullback correlation. Join survey scores to invoice pullbacks by delivery date + MDO. If lower-scoring deliveries generate more pullbacks, that's a direct dollar value on improving customer service. The data to do this is already in-house — no new collection needed.

D1 storage outlook: At current growth, the free tier (500MB) handles ~18 months. Budget conversation for the paid tier is a 2027 problem, not today's.

⚙️
Meg
Creative Technology & Engineering
Complete

Architecture call: Pages Functions → D1 direct binding. Not through FastAPI backend. Doing reads through the backend adds 150–500ms latency, makes the Mac Mini/cloudflared a dependency for leaderboard loads, and hits rate limits not designed for interactive traffic. Survey endpoints live in functions/survey/*.js, separate from the existing catch-all proxy — no conflict with current architecture.

Migration risk is low: 244 records, clean order_number dedup key, existing processor logic unchanged. Python one-time migration script, 3 batches of 100 records. Under 5 minutes.

Biggest risk #1: CF Access status on command.p2ops.com needs confirmation before a single line of feature code is written. Driver names, scores, and customer comments are behind this endpoint. If CF Access isn't live and validated, that's task one — everything else is blocked.

Biggest risk #2: wrangler.toml doesn't exist in command-pwa yet. Adding the D1 binding requires one. If misconfigured, it breaks the existing Pages deploy. This must be handled carefully and not rushed.

What to cut: Manual tag system → auto-classify at ingest instead (covers 80% of the use case, no write endpoint, no extra security surface). Keyword search in Comments → defer (SQLite FTS5 not available in Workers runtime). Trend Lines → defer until data is deeper.

Gates flagged by Meg: Gandalf must see the architecture before implementation. Coach Beard must sign off if leaderboard ordering differs from the current display. Belle should see the D1 resource addition (low stakes, appropriate visibility).

🎨
Edna Mode
Design Lead — UI/UX & Aesthetics
Complete

Navigation: Add "SURVEY PERFORMANCE" as the third primary tab — not "Rankings." That name front-loads the wrong framing. Five sub-nav tabs inside: LEADERBOARD · CATEGORIES · MARKETS · TRENDS · COMMENTS. Driver Detail is a drill-down destination from the leaderboard, not a peer navigation item. URL-addressable, full page.

Leaderboard framing call: The core design problem is that ranking inherently creates losers. Three decisions hold the "motivating" goal: (1) trend arrow is the visual anchor, rank number is present but not dominant; (2) no red for low scorers — score in secondary text color only; (3) top 3 get a 1px gold left border and rank in gold text — the same treatment as the active nav tab. Clean. Consistent. Not cartoonish.

On gradient glow medals: Rejected. Cheap gamification. Not what this platform's design language is.

Comments browser: 3px danger-red left border on negative reviews — nothing else changes. The customer's words are in standard text. The stripe says "needs attention" without being alarming. Asymmetry is intentional — negative cards need to be locatable in a scroll.

Edna's flags for routing: Coach Beard required before Meg touches the leaderboard (four specific policy questions: rank visibility for drivers 4+, whether below-rank-10 divider is appropriate, whether low-performer scores should appear at all, and whether drivers know they're being rated). Hermione for HR/privacy on individual performance data. Matilda for comms strategy — if drivers become aware this tool exists (they will), that changes behavior. Answer the comms question before launch, not after.

🧠
Coach Beard
Team Culture & People
Complete

Don't display a numeric rank. Display a tier. Three tiers: High Performer → Solid → Developing. Drivers see their tier, their score, and their trend arrow. They don't see "you are #17 of 28" — that number doesn't tell them what to do differently, it just tells them where they stand in a hierarchy. The trend arrow is what actually changes behavior: "I was in the middle and now I'm moving up" beats "I am ranked 12th" every time.

Minimum survey threshold before showing a tier: 10 confirmed surveys. Before that: "Building your baseline — not enough surveys yet for a reliable picture." Protects newer drivers and low-volume routes from unfair snapshots.

Absolute thresholds, not percentile rank. "Gold" should mean "your score is above 4.5 with at least 20 surveys" — achievable by everyone. If it means "you're in the top 20%", most of the team is structurally excluded no matter what they do. That's not motivating, that's a slow morale drain.

Two separate use cases that need separate designs: (1) Supervisor coaching aid — full data, individual scores, trends, specific comments, flag for 1:1. (2) Driver self-service — their own score, tier, trend, and anonymized positive feedback only. Drivers almost never hear directly from people they served. When they do, it lands. Don't show drivers their negative comments — that's for supervisors to deliver in context, not for a dashboard to surface without framing.

No cross-market driver ranking visible to drivers. Hurricane and Wenatchee are different routes, different customers, different volumes. Cross-market comparison belongs in management reporting only, with a disclaimer. Drivers seeing each other's market scores will generate grievances, not motivation.

Cadence: Monthly for formal review. Weekly is too reactive — one bad week creates anxiety, not insight. Supervisors get real-time access to catch a sharp decline early.

Hermione flag — Coach Beard's priority: "The Hermione flag is the one I'd move on before this tool goes live — if scores are touching compensation or scheduling in any way, she needs to see the design. Everything else can be iterative." This is his clearest directive. If there's any possibility survey scores influence pay, scheduling priority, or disciplinary action downstream, Hermione reviews the design before it ships. Non-negotiable.

What tools like this usually get wrong: Optimizing for reporting, not behavior change. Hiding the criteria (drivers don't know what it takes to move up). Deploying without explanation — Kirk's rollout message matters as much as the design. Treating recognition as a one-time leaderboard post rather than an ongoing conversation.

Customer service improvement — the real goal

Kirk's framing wasn't "build a nicer leaderboard." It was build a tool that improves customer service. That's a different goal and it shapes what we track.

💡 Score-to-pullback correlation
Athena's highest-impact opportunity: join survey scores to invoice pullbacks by delivery date + MDO. If lower-scoring deliveries generate more pullbacks, we have a direct dollar figure on what improving customer service is worth. This data is already in-house — no new collection, just a join. This is Athena's next task if Kirk greenlights the build.
💡 Comment auto-classification
56% of P2 survey records include customer comments. Right now, a 5/5 score with a comment about a damaged floor is invisible as a risk. Auto-classification at ingest (damage, communication, timeliness, positive) surfaces these patterns. A driver with 9.2 composite but 40% of comments tagged "damage" is a different conversation than their score suggests.
💡 Coaching loop, not surveillance
The most valuable use of this tool is a supervisor opening the Driver Detail page before a 1:1. Score trend + category breakdown + customer comments in one place = a specific, evidence-based conversation. "Your timeliness score dropped three weeks in a row" is more useful than "your overall score is 8.1." The design is built for this.
💡 Market-level pattern recognition
Markets differ. Category Analysis by MDO will surface whether Ukiah's lower scores in "communication" are a training issue vs. a routing issue vs. a driver issue. This is the kind of insight that currently takes Kirk asking Matilda to pull data manually. The dashboard makes it self-serve.

Gates before and during the build

These are not optional reviews. Each one is a hard gate for the relevant phase.

🔒
Gandalf — Security Review Required before Phase 1 starts
New public endpoint serving driver names, survey scores, and customer comments. CF Access status on command.p2ops.com must be confirmed. Architecture doc from Meg goes to Gandalf before any code is written. No exceptions — this is the lesson from the command-api fight last week.
🧠
Coach Beard — Leaderboard Display Required before Phase 2 ships
Standing sign-off authority on ranking display. Needs to see: rank visibility approach for drivers 4+, low-performer visual treatment, whether all drivers are visible or just top N, and whether drivers know this tool exists. If the ordering differs from the current leaderboard at all, he reviews it before it goes live.
⚖️
Hermione — HR/Privacy Review Required before Phase 2 ships
Individual performance data on a shared dashboard. Hermione assesses whether Cloudflare Access role restrictions are appropriate (ops managers only vs. general staff) and whether there are any state-level requirements around employee performance tracking in CA (Ukiah market) or UT/CO/WA.
📣
Matilda — Driver Awareness Comms Required before public launch
If drivers become aware this tool exists — and they will — that changes behavior. Kirk needs a deliberate answer to the question before launch, not after. Matilda advises on the operational angle. Coach Beard advises on the people angle. Kirk decides the policy.
💰
Belle — D1 Resource Addition Low stakes, appropriate visibility
New Cloudflare D1 resource added to the account. Free tier, no immediate cost. Belle should see it for completeness — if the project grows toward the paid tier, she should have tracked it from the start.
Kirk — Sunset Confirmation Before survey.p2ops.com redirect
Kirk confirms the new section is working as expected before survey.p2ops.com is redirected and the old file-based system is shut down. One-time verification, ~15 minutes.

The decision

This plan is ready to execute the moment you say go. The team is briefed. The architecture is designed. The gates are defined. Kirk decides.

Full Upgrade Now (recommended)Patch + Defer
Dashboard downtime2–3 weeks until Leaderboard is live48–72 hours for patch
Build investment~16 dev days over 4–6 weeks~3 dev days (patch), then ~16 days later
Total dev cost~16 dev days~19 dev days (patch + full build)
survey.p2ops.comRedirected and sunsetLives on for months
Data architectureD1 — scalable, queryable, no size ceilingPre-aggregated JSON — same fragility
Score-to-pullback analysisEnabled (Athena can run it)Not enabled
Platform directionOne dashboard for all of P2 opsTwo dashboards in parallel

Ready to proceed?

Jane's recommendation: greenlight the full upgrade. The patch wastes build time and leaves the architecture broken. This is the right moment — the tool is broken, the team is available, and the build is well-scoped.

First action after greenlight: Jane briefs Gandalf with Meg's architecture doc. Coach Beard and Hermione are briefed simultaneously. Phase 1 starts when Gandalf gives the all-clear.

✓ Proceed — Full Upgrade
↩ Back to discussion