From a broken survey dashboard to a unified driver intelligence system — survey scores, route performance, attendance, report cards, and coaching infrastructure in one platform.
Kirk's framing: "A useful, motivating tool to help our team thrive with data-driven decisions." That's not a leaderboard refresh. It's an operational intelligence platform.
Open a driver's profile before a 1:1 and see their survey score trend, stops-per-hour vs. market baseline, pullback count, attendance record, and every coaching note — all for a selected period. The conversation has specifics instead of impressions. "Your timeliness score dropped three weeks in a row and you had two missed stops in April" is a real coaching conversation.
Their own score, their trend, their tier — and positive customer comments from real people they served. Drivers almost never hear directly from customers. When they do, it lands. This is the motivating layer Coach Beard flagged as the most underused tool in physical labor environments.
Athena queried every file in the workspace before writing this. No assumptions — this is what's actually there.
This is the only risk in this plan where getting it wrong doesn't produce an error — it produces a confidently wrong report card. A coaching note on the wrong driver. A pullback attributed to someone who didn't cause it. Kirk acts on it. That's a people management error.
| Source | Driver Identifier | Format | Risk |
|---|---|---|---|
| Survey responses | Full name string | "Nicholas Kennedy" | Three-part names common in fleet |
| invoices.db route_stops | Full name string | "Jesse Garcia" | Consistent within source, not cross-source |
| ops_manifest routes | Full name string | "Jesse Garcia" (frequently NULL) | PWA form often left blank |
| Descartes routes CSV | Split first + last name | DriverFirstName + DriverLastName | Must concatenate before normalization |
| Descartes stops CSV | Numeric Driver Key | "173628" | No name field. Cannot join to routes CSV. Requires Descartes roster export. |
A drivers table is the linchpin of the entire platform. Every other table joins through driver_id. Raw driver names from any system are normalized through an alias resolver at ingest time — Levenshtein distance matching, with low-confidence matches flagged for human review before they're used in joins.
The one-time setup: populate the 28 P2 drivers as canonical records. Kirk (or a manager) maps each to their Descartes Driver Key — requires exporting a driver roster from Descartes once. Total time: 0.5 dev days to build + 1 hour of Kirk's time to verify the mapping. This is done before any cross-system join is trusted.
Specific danger from Athena's analysis: "Jesus Reyes-Ortega" vs. "Jesus Reyes Ortega" both appear in invoices.db. Without the resolver, those are two different drivers with split histories. The fleet is small enough to fix this once and maintain it — it gets harder the longer it waits.
When survey scores, route performance, attendance, and coaching notes are joined through a single driver ID, the questions that were previously unanswerable become routine.
URL-addressable, printable, browser-exportable to PDF. Pulls from pre-aggregated D1 tables — three queries, sub-100ms load, no Mac Mini dependency.
/performance/drivers/nicholas-kennedy/report?from=2026-04-01&to=2026-04-30/performance/drivers/nicholas-kennedy/report?period=last-30/performance/drivers/nicholas-kennedy/report?period=ytd
The URL is shareable as-is. CF Access gates it — authenticated users only. Print-to-PDF via browser (Ctrl+P) handles PDF export in v1; server-side PDF generation deferred to v2.
| Section | Metrics | Available from |
|---|---|---|
| Customer Satisfaction | Bayesian composite, category scores, 5-star %, comments, worst category | Phase 1 |
| Route Performance | Stops/hr (vs. market baseline), completion %, missed stops, pullback rate | Phase 2 |
| Economics | Revenue per stop, profit per route, profit per hour | Phase 2 |
| Attendance & Clock | Reliability %, tardiness count, no-shows, avg minutes late | Phase 3 (blocked on source definition) |
| Coaching & Conduct | Commendations, coaching sessions, warnings, open PIP status | Phase 4 |
Revised sequence from Meg. The canonical driver registry is the dependency blocker — it must exist before any cross-system join is trusted. Report card becomes useful from Phase 2 onward and improves with each phase.
drivers table populated with 28 P2 drivers/performance/drivers/[slug])| Phase | Deliverable | Dev Days | Report Card at End of Phase |
|---|---|---|---|
| 0 — Foundation | D1 schema, driver registry, survey migration | 3.5 | Not yet available |
| 1 — Survey | Leaderboard, comments, driver detail, report card v1 | 4 | Survey scores + comments (shareable URL) |
| 2 — Route + Damage | Route performance, pullbacks, market baselines | 5 | + Stops/hr, completion %, pullback rate |
| 3 — Attendance | Clock/attendance data (pending source) | 5 | + Reliability %, hours worked |
| 4 — Coaching | Coaching notes, complete report card | 4 | Full report card — all 5 data sources |
| 5 — Trends | Trend charts (deferred) | 2 | + Weekly trend lines added |
| Total | ~23.5 (Phases 0–4) |
Both Athena and Meg flagged these independently. Meg's #1 risk and Athena's #1 risk are the same thing — that's confirmation it's real.
These are hard gates, not courtesy reviews. The relevant phase doesn't move until the gate is cleared.
Phases 0–2 can proceed without any of these. Phases 3–4 are blocked until the first two are answered. The leaderboard design question needs to be answered before Phase 1 ships.
Phases 0–2 can start immediately — they don't depend on the answers above. The survey migration fixes the current broken dashboard. Route performance adds the operational layer. Phase 3 waits for your answers on clock/attendance.
First action after greenlight: Gandalf gets the architecture doc. Coach Beard and Hermione are briefed simultaneously. Kirk exports the Descartes driver roster. Phase 0 starts when Gandalf clears.
64,123 route stops in invoices.db. All four markets, 73 drivers, Oct 2024–May 2026. Completion rate, on-time %, and miss rate are all computable today from existing fields. No new data collection needed for route performance analytics.
Driver identity is the foundational risk. Five sources, five ID schemes. Route ID stored as float text ("4282486.0") in invoices.db vs. integer in ops_manifest vs. integer in Descartes CSV — same route, three formats. All join via CAST() but any ingest code that string-compares will fail silently. "Jesus Reyes-Ortega" vs. "Jesus Reyes Ortega" appear as different drivers in current data. The alias resolver must be built and reviewed before any cross-system join is trusted.
Pullback attribution is currently meaningless for driver accountability. All 248 records carry one reason code. Pullback count per driver is real; implication it reflects driver fault is not.
Highest-impact opportunity not currently being captured: Join survey scores to pullback events — if lower-scoring drivers generate more pullbacks, there's a dollar figure on improving scores. Data is already in-house.
Helper contribution analysis is available now. ops_manifest has helper_name on every recent route. Never been analyzed. A driver who scores 4.3 with Helper A and 3.9 with Helper B has a different problem than their composite score suggests.
Descartes CSV format is confirmed stable and ingestion is buildable now. Route summary CSV headers are known and haven't changed across sample files. The pipeline is: Kirk drops CSV → Python validates headers → filters P2 routes → resolves driver names through canonical registry → batch insert to D1. Kirk's CSV generation step is irreducible without API access — the processing after drop is automatable.
Stop-level on-time % per driver is blocked. The stops execution CSV uses numeric Driver Keys. The routes CSV uses first/last name. Their ID systems don't overlap — confirmed by direct file inspection. Resolving this requires a Descartes driver roster export that may not be available at the current tier.
Coaching notes: Discord slash command recommended for v1. Less build time, simpler authorization (Discord roles), managers are already there. Web form is better UX but adds a write endpoint and authorization layer requiring a separate Gandalf review. Build the Discord path first; migrate to web form when there's a concrete reason.
Revised estimate: ~23.5 dev days across Phases 0–4, 3–4 calendar months to full platform. Leaderboard and shareable report card are live by end of Phase 1 (~3 weeks). Route performance adds to the report card in Phase 2. Phases 3–4 wait on source definition and gate clearances.
Gandalf reviews for write endpoints are two separate events. Clock/attendance entry and coaching notes are distinct security surfaces. Don't plan them as one review.
Navigation: "SURVEY PERFORMANCE" as the third primary tab (not "Rankings"). Five sub-nav pills inside: LEADERBOARD · CATEGORIES · MARKETS · TRENDS · COMMENTS. Driver Detail is a drill-down from Leaderboard, URL-addressable, not a peer tab.
Leaderboard framing: Trend arrow is the visual hero, rank number is secondary. No gradient glow medals — replace with a 1px gold left border on rank 01, consistent with the active nav tab treatment. No red for low scorers — score in secondary text color only. "Sort by Most Improved" as a secondary sort option surfaces a different, equally valid story.
Comments browser: 3px danger-red left border on negative review cards — nothing else red. Customer words are in standard text. The stripe is locatable in a scroll without being alarming.
Mobile: Sub-nav stays as text labels with horizontal scroll, not icons. Leaderboard collapses to 4 columns on mobile: RANK · DRIVER · SCORE · TREND.
Flags for routing: Hermione on whether CF Access role restrictions are appropriate for the individual performance data. Matilda on driver awareness comms — if drivers learn this tool exists, what's the plan? That's a management decision, and it should be made before launch.
Don't show numeric rank — show a tier. Three tiers: High Performer / Solid / Developing. Drivers see their tier and trend, not "you are #17 of 28." That number doesn't tell them what to do differently; it just places them in a hierarchy. The trend arrow is what changes behavior — "I'm moving up" beats "I am ranked 12th" every time.
Minimum 10-survey threshold before showing a tier. Before threshold: "Building your baseline — not enough surveys yet for a reliable picture." Protects new drivers and low-volume routes from unfair snapshots.
Absolute thresholds for tier assignment, not percentile rank. "High Performer" = score above 4.5 with 20+ surveys — achievable by everyone. If it means "top 20%," most of the team is structurally excluded regardless of how well they perform. That's a slow morale drain, not motivation.
Two separate use cases need separate designs. Supervisor view: full data, trends, comments, coaching log. Driver self-service view: their own score, tier, trend — and anonymized positive comments only. Drivers almost never hear directly from customers they served. When they do, it lands. Negative comments are delivered by supervisors in context, not surfaced raw in a dashboard.
His clearest directive: "The Hermione flag is the one I'd move on before this tool goes live — if scores are touching compensation or scheduling in any way, she needs to see the design. Everything else can be iterative."