Full Platform Plan — Expanded Scope · May 2026

P2 Driver Performance Platform

From a broken survey dashboard to a unified driver intelligence system — survey scores, route performance, attendance, report cards, and coaching infrastructure in one platform.

Jane, Chief of Staff Athena · Meg · Edna · Coach Beard ~30 dev days · 3–4 months to full platform May 16, 2026

One platform. Every dimension of driver performance.

Kirk's framing: "A useful, motivating tool to help our team thrive with data-driven decisions." That's not a leaderboard refresh. It's an operational intelligence platform.

64K
Route stops in invoices.db
Oct 2024–May 2026
73
Unique drivers tracked
across all 4 markets
52K
Survey records
(244 P2-specific)
248
Pullback events
in invoices.db
✓ Most of this data already exists
Athena queried every data source in the workspace directly. Three of the five new data sources exist in some usable form right now — route performance (invoices.db + Descartes CSVs), pullback/damage events (invoices.db), and of course survey data. Clock-in/out, attendance, and coaching notes require new collection infrastructure. The platform is more ready than it looks.
⚠ Three questions only Kirk can answer — Phase 3 is blocked without them
Clock-in/out, attendance, and coaching notes have no existing data source anywhere in the workspace. Before those phases can be designed or built, Kirk needs to answer three specific questions. See the Decisions section →
📋 What the platform does for managers

Open a driver's profile before a 1:1 and see their survey score trend, stops-per-hour vs. market baseline, pullback count, attendance record, and every coaching note — all for a selected period. The conversation has specifics instead of impressions. "Your timeliness score dropped three weeks in a row and you had two missed stops in April" is a real coaching conversation.

🏆 What the platform does for drivers

Their own score, their trend, their tier — and positive customer comments from real people they served. Drivers almost never hear directly from customers. When they do, it lands. This is the motivating layer Coach Beard flagged as the most underused tool in physical labor environments.

What exists, what needs to be built

Athena queried every file in the workspace before writing this. No assumptions — this is what's actually there.

📊
Survey Data EXISTS
52,102 records (all carriers), 244 P2-specific records across 28 drivers. Monthly _ours files back to May 2025. Grows ~66 P2 records/day. Nightly processor handles ingestion. D1 migration is the Phase 0 blocker to resolve now.
Location: survey_data.json + monthly _ours files · 32 MiB (blocking deploy)
🛣️
Route Performance — invoices.db EXISTS
64,123 stop records across all 4 markets, Oct 2024–May 2026, 73 drivers. Completion rate, on-time %, miss rate all computable directly from stop_status and window_start/actual_arrive fields. Stops-per-hour derivable with planned time. This is the richest existing source after survey data.
Location: /workspace/data/invoices.db · route_stops table
📁
Route Summary — Descartes CSVs EXISTS
Known stable format: Actual Total Time, % Total Performance, NumberOfStops, Actual Distance, DriverFirstName + DriverLastName. Kirk generates and drops these manually — no Descartes API access at current tier. Format is confirmed; pipeline is buildable now. Important caveat: stop-level on-time % per driver requires a separate Descartes driver roster export (Driver Key → Name mapping) that may not be available at current access tier.
Location: /workspace/ops-data/ · routes CSVs + stops execution CSVs
⚠️
Damage / Pullback Events EXISTS Attribution gap
248 pullback records in invoices.db across all 4 markets. Pullback rate per driver is computable. Critical limitation: all 248 records carry the same reason code — "PULLBACK AT MDO - NOT PAID." There is no differentiation between driver-caused, item-condition, and customer-caused pullbacks. Attributing all equally to drivers would produce unfair report cards. Attribution needs to be added going forward.
Location: /workspace/data/invoices.db · pullbacks table
🗺️
ops_manifest — Route + Helper Data EXISTS (30 days)
232 routes from Apr 17–May 16, 2026. Has time_leaving_mdo, total_elapsed_time, stops, distance, driver_name — and crucially: helper_name. Helper tracking exists nowhere else in any system. Limited coverage (30 days) but joins to route_stops via route_id confirmed working. Stops-per-hour computable from total_elapsed_time ÷ total_stops.
Location: /workspace/p2-texting/db/ops_manifest.db · routes table
🕐
Clock-In / Clock-Out NO EXISTING SOURCE
Athena searched the entire workspace — zero data. No app, no spreadsheet, no CSV. Before a pipeline can be designed, Kirk must define what currently records this (if anything). Options for collection: a clock-in/out screen added to the P2 PWA, a manager-entry spreadsheet synced nightly, or integration with Gusto/payroll if clock data lives there. Phase 3 cannot start until this is answered.
Status: undefined. See Decisions → Question 1.
📅
Manager Attendance Tracking NO EXISTING SOURCE
No structured data anywhere. Needs new collection — options are a daily attendance form for managers (Google Form → sheet → D1 sync, or a web form in command.p2ops.com). Start simple: get data flowing first, build the proper UI later. Phase 3 blocked.
Status: undefined. See Decisions → Question 1.
📝
Manager Coaching Notes NO EXISTING SOURCE
No data source exists. This is entirely new infrastructure. Two implementation paths: (A) Discord slash command — low friction, managers are already there, authorization via Discord roles; (B) Web form in command.p2ops.com behind CF Access — better UX, requires write endpoint + manager authorization layer + Gandalf review. Meg recommends Option A for v1. Phase 4 blocked until Phase 3 foundation is built.
Status: undefined. See Decisions → Question 2.

The driver identity problem

This is the only risk in this plan where getting it wrong doesn't produce an error — it produces a confidently wrong report card. A coaching note on the wrong driver. A pullback attributed to someone who didn't cause it. Kirk acts on it. That's a people management error.

⚠ Five data sources. Five different driver identifiers. Zero overlap.
Every system stores driver identity differently. String-based joins across systems will silently split driver history.
SourceDriver IdentifierFormatRisk
Survey responsesFull name string"Nicholas Kennedy"Three-part names common in fleet
invoices.db route_stopsFull name string"Jesse Garcia"Consistent within source, not cross-source
ops_manifest routesFull name string"Jesse Garcia" (frequently NULL)PWA form often left blank
Descartes routes CSVSplit first + last nameDriverFirstName + DriverLastNameMust concatenate before normalization
Descartes stops CSVNumeric Driver Key"173628"No name field. Cannot join to routes CSV. Requires Descartes roster export.
🔑 The solution: canonical driver registry (non-optional)

A drivers table is the linchpin of the entire platform. Every other table joins through driver_id. Raw driver names from any system are normalized through an alias resolver at ingest time — Levenshtein distance matching, with low-confidence matches flagged for human review before they're used in joins.

The one-time setup: populate the 28 P2 drivers as canonical records. Kirk (or a manager) maps each to their Descartes Driver Key — requires exporting a driver roster from Descartes once. Total time: 0.5 dev days to build + 1 hour of Kirk's time to verify the mapping. This is done before any cross-system join is trusted.

Specific danger from Athena's analysis: "Jesus Reyes-Ortega" vs. "Jesus Reyes Ortega" both appear in invoices.db. Without the resolver, those are two different drivers with split histories. The fleet is small enough to fix this once and maintain it — it gets harder the longer it waits.

8 analytical opportunities unlocked

When survey scores, route performance, attendance, and coaching notes are joined through a single driver ID, the questions that were previously unanswerable become routine.

1
Revenue-per-hour efficiency ranking — available now, no new data needed
ops_data JSONs already compute hours_worked, revenue, and profit per route. Joining to driver_id gives revenue-per-hour and profit-per-hour per driver. This is more valuable than stops/hour alone — two drivers can have identical stop counts but different revenue mix (complex vs. standard). Buildable before any new infrastructure is in place.
2
Speed vs. quality tradeoff curve — do faster drivers score lower on surveys?
When route_performance joins surveys on driver_id + delivery week: does efficiency and quality correlate, or trade off? If the best drivers are both fast AND high-scoring, that's a hiring and training argument. If they trade off above a threshold, that's a scheduling argument — complex routes get quality drivers, standard routes get efficiency drivers. This shapes fleet strategy in a way that's currently invisible.
3
Score-to-pullback financial loop — dollar value of a score improvement
Join survey scores to pullback_events by driver_id + delivery date window. A pullback costs $200–400 in lost revenue + handling. If lower-scoring drivers generate measurably more pullbacks, there's a calculated dollar figure on a one-point score improvement. That number — "improving Driver X from 3.8 to 4.3 composite is worth $N/month in pullback reduction" — is the training investment argument.
4
Coaching intervention effectiveness — does coaching actually move scores?
coaching_notes with note_date enables before/after comparison: driver's mean composite score in the 6 weeks before a coaching session vs. 6 weeks after. If coaching consistently fails to move scores, the method is wrong. If it works, there's a repeatable playbook. No other tool Kirk currently uses can measure this. This turns coaching from a gut-feel activity into a measurable intervention.
5
Helper contribution isolation — helper A vs. helper B for the same driver
ops_manifest has helper_name on every route. This data exists now — it just hasn't been analyzed. A driver who averages 4.3 survey score with Helper A and 3.9 with Helper B has a different problem than one who scores consistently regardless of helper. Survey scores go to the primary driver, but the customer experienced both. Identifying helper contribution requires only routes with known helper assignments — already tracked in ops_manifest.
6
Attendance → performance decay chain — does being late produce worse outcomes?
If clock data shows late starts → rushed routes → missed stops or lower survey scores for end-of-route stops, there's a causal chain. A driver 20 minutes late may be rushing the last 3 stops. Check: JOIN clock_events ON route_date, JOIN route_stops ON route_id WHERE stop_number > (total_stops - 3), look at scores for stops in the final quartile when driver started late. Currently unobservable without both data streams. Once both exist, the analysis is a single query.
7
Market-adjusted performance normalization — fair cross-market comparison
Hurricane routes average 13.8 stops. Gypsum averages 8.6. Comparing raw stops/hour across markets is misleading. Build a market baseline for each KPI: stops/hour, completion %, pullback rate. Express every driver's metrics as a delta from their market baseline. A Gypsum driver at 9 stops/route is +5% above baseline; a Hurricane driver at 14 is +1%. Without this, the cross-market leaderboard is unfair and Kirk will notice — because it's wrong.
8
Comment auto-classification — surface hidden risk in high scores
56% of P2 survey records include customer comments. A driver with 9.2 composite but 40% of comments tagged "damage" is a different conversation than their score suggests. Auto-classification at ingest — score threshold for "low_score," keyword matching for "damage," "timeliness_concern," etc. — surfaces these patterns without requiring manual tagging. The signal is already in the data; it just needs a classifier at ingest.

A driver profile for any date range, shareable anywhere

URL-addressable, printable, browser-exportable to PDF. Pulls from pre-aggregated D1 tables — three queries, sub-100ms load, no Mac Mini dependency.

🔗 URL structure
/performance/drivers/nicholas-kennedy/report?from=2026-04-01&to=2026-04-30

/performance/drivers/nicholas-kennedy/report?period=last-30

/performance/drivers/nicholas-kennedy/report?period=ytd

The URL is shareable as-is. CF Access gates it — authenticated users only. Print-to-PDF via browser (Ctrl+P) handles PDF export in v1; server-side PDF generation deferred to v2.

SectionMetricsAvailable from
Customer SatisfactionBayesian composite, category scores, 5-star %, comments, worst categoryPhase 1
Route PerformanceStops/hr (vs. market baseline), completion %, missed stops, pullback ratePhase 2
EconomicsRevenue per stop, profit per route, profit per hourPhase 2
Attendance & ClockReliability %, tardiness count, no-shows, avg minutes latePhase 3 (blocked on source definition)
Coaching & ConductCommendations, coaching sessions, warnings, open PIP statusPhase 4
Design note from Athena's analysis
Three queries total for the full report card: (1) weekly aggregate rollup for the period, (2) coaching notes for the period, (3) market baseline for delta calculation. All hit indexed tables. Sub-100ms on D1 at any scale the fleet will reach in the next 2 years. The report card renders server-side HTML — no JS framework needed, it's a document not an app.

5 phases · ~30 dev days · 3–4 calendar months

Revised sequence from Meg. The canonical driver registry is the dependency blocker — it must exist before any cross-system join is trusted. Report card becomes useful from Phase 2 onward and improves with each phase.

0
Foundation — Survey Migration + Driver Registry
~3.5 dev days · Week 1
  • D1 database creation: all tables from full schema (not just survey — build the schema once, correctly)
  • Canonical drivers table populated with 28 P2 drivers
  • Kirk exports Descartes driver roster (1 hour) → Meg maps Driver Key to canonical names
  • Historical migration: P2 records from _ours monthly files + master JSON → D1
  • Carrier name normalization pass (TopHat vs. "P2 Last Mile - Ukiah" inconsistency fixed)
  • process_surveys.py updated to write to D1 nightly
  • CF Access confirmed active on command.p2ops.com before any code starts
  • Gandalf security review of architecture
1
Survey Features — Leaderboard, Markets, Categories, Comments, Driver Detail
~4 dev days · Weeks 2–3
  • Pages Functions survey endpoints (leaderboard, market, category, comments)
  • Leaderboard UI — tier vs. rank decision baked in (Coach Beard's input)
  • Market View, Category Analysis, Comments Browser with auto-classification
  • Driver Detail page — URL-addressable (/performance/drivers/[slug])
  • command.p2ops.com nav integration ("SURVEY PERFORMANCE" tab)
  • Report card v1: survey scores + comments, shareable URL, printable
  • Coach Beard sign-off before leaderboard ships
  • Hermione HR/privacy review before Phase 1 goes live
  • survey.p2ops.com redirect after Kirk confirms new section is working
2
Route Performance + Damage/Pullback Integration
~5 dev days · Weeks 4–6
  • Establish stable Descartes CSV drop location (Google Drive folder or local path)
  • Descartes routes CSV ingestion pipeline → D1 route_performance
  • Migrate pullbacks from invoices.db → D1 damage_records (nightly sync going forward)
  • invoices.db route_stops historical import → route_performance (Oct 2024–present)
  • ops_manifest routes import → enriches recent records with helper tracking
  • Enrich Driver Detail with route performance: stops/hr vs. market baseline, completion %, pullback rate
  • Report card v2: adds route performance + pullback data
  • Market-adjusted baseline calculations for cross-market fairness
  • Helper contribution flagging on report card ("X of Y routes worked with a helper")
  • Confirm Descartes Driver Key roster export availability before committing to stop-level on-time %
3
Attendance + Clock Data Blocked — source undefined
~5 dev days · Timeline TBD
  • Kirk defines clock-in/out source format and attendance tracking method first
  • Build ingestion pipeline based on defined format
  • Attendance records visible on Driver Detail and report card
  • Hours-per-route data enables true labor efficiency metrics
  • Attendance → performance analysis becomes possible (Opportunity #6)
  • Matilda reviews clock/attendance collection workflow before deploy — changes how drivers report time
  • Gandalf reviews any write endpoint added for clock/attendance entry
4
Coaching Notes + Complete Report Card
~4 dev days · Timeline TBD
  • Discord slash command for coaching note entry (recommended v1 path)
  • Coaching note visible on Driver Detail and report card
  • Before/after analysis: coaching sessions vs. score movement
  • Complete report card: all five data sources in one printable view
  • Gandalf reviews write endpoint (separate review from Phase 3)
  • Coach Beard reviews any changes to what drivers can see vs. what managers see
5
Trend Lines (Deferred — data not deep enough yet)
~2 dev days · August 2026+
  • Weekly trend charts per driver: survey score, stops/hr, attendance reliability
  • P2 survey data only goes back 6 weeks — trend charts on thin data tell misleading stories
  • Build after 3 additional months of data accumulation across all sources
PhaseDeliverableDev DaysReport Card at End of Phase
0 — FoundationD1 schema, driver registry, survey migration3.5Not yet available
1 — SurveyLeaderboard, comments, driver detail, report card v14Survey scores + comments (shareable URL)
2 — Route + DamageRoute performance, pullbacks, market baselines5+ Stops/hr, completion %, pullback rate
3 — AttendanceClock/attendance data (pending source)5+ Reliability %, hours worked
4 — CoachingCoaching notes, complete report card4Full report card — all 5 data sources
5 — TrendsTrend charts (deferred)2+ Weekly trend lines added
Total~23.5 (Phases 0–4)

What could go wrong

Both Athena and Meg flagged these independently. Meg's #1 risk and Athena's #1 risk are the same thing — that's confirmation it's real.

#1
Driver identity fragmentation — silently wrong data
Five data sources, five ID schemes. A string-based join that mismatches "Isaias Martinez" in survey data with "Isaias J. Martinez" in another system produces a wrong report card without an error. Kirk acts on it. That's a people management error. Mitigation: canonical driver registry with human validation before any cross-system join is trusted. Non-optional.
#2
Clock/attendance source is undefined — Phase 3 is unknowable until answered
If Kirk's answer to "what records clock-in/out today" is "nothing structured," then Phase 3 starts with building a time-tracking mechanism for a 35-driver operation, not just a data pipeline. That's a different project requiring Edna (UI design), Matilda (workflow design), Coach Beard review (changes how drivers report to work). Could add 4–6 weeks before the first record is ingested. Define the source format first.
#3
Descartes CSV generation remains a manual bottleneck
No Descartes API access at current tier. Kirk must generate and drop CSVs — the route performance section of the report card is always as stale as the last CSV drop. This is a human process dependency that can't be automated without Descartes access improvement. Worth asking what partner-level API access costs — if the number is reasonable, the manual step disappears entirely.
#4
Stop-level on-time % per driver requires a Descartes roster export
The stops execution CSV uses numeric Driver Keys. The routes summary CSV uses first/last name. Their ID systems don't overlap — confirmed by direct file inspection. Computing on-time % per driver from stop-level data requires a Descartes driver roster export (Driver Key → Name). If that export isn't available at current access tier, on-time % stays at route aggregate level, not per-driver. Confirm before committing to this as a report card metric.
#5
Pullback attribution is currently meaningless for individual accountability
All 248 pullbacks carry one reason code. The pullback count per driver is real; the implication that it reflects driver performance is not necessarily true — some pullbacks are item-condition or customer-caused. Report cards should display pullback count and rate without implying causation. Going forward, add a manager attribution step at pullback logging.
#6
Write endpoints in Phases 3–4 are two separate Gandalf reviews, not one
Clock/attendance entry (if web-based) and coaching note input are distinct write endpoints with different authorization patterns. Each requires its own security review. Don't plan these as a single review — they're two gate events on the calendar, separated by the time it takes to implement and test each one.

Every gate, every phase

These are hard gates, not courtesy reviews. The relevant phase doesn't move until the gate is cleared.

🔒
Gandalf — Architecture Review Before Phase 0 starts
CF Access status on command.p2ops.com must be confirmed active before a line of code is written. PII (driver names, scores, customer comments) behind runtime query endpoints. Meg's architecture doc goes to Gandalf first.
🧠
Coach Beard — Leaderboard Display Before Phase 1 ships
Standing sign-off on ranking display. Has unresolved design tension to resolve: rank-based list (Edna) vs. tier-based display (Coach Beard's recommendation). Both are buildable — Kirk decides, Coach Beard approves what ships. If leaderboard ordering differs from current survey.p2ops.com in any way, he reviews it first.
⚖️
Hermione — HR/Privacy Review Before Phase 1 ships
Coach Beard's clearest directive: "If scores are touching compensation or scheduling in any way, she needs to see the design before this goes live." Individual performance data on a shared platform. Cloudflare Access role restrictions (ops managers only vs. general staff). State-level requirements in CA (Ukiah) and UT/CO/WA for employee performance tracking.
📣
Matilda — Driver Awareness Comms Before public launch
If drivers become aware this tool exists — and they will — that changes behavior. That may be intentional. Matilda advises on operational implications; Coach Beard advises on the people angle. Kirk decides the policy. Answer must be decided before launch, not after drivers ask.
🔒
Gandalf — Write Endpoint #1 (Clock/Attendance) Before Phase 3 ships
Separate review from the Phase 0 architecture review. A write endpoint for clock/attendance data is a different attack surface from read-only survey endpoints. Requires input validation, authorization (not every CF Access user should write), and sanitization review.
🔒
Gandalf — Write Endpoint #2 (Coaching Notes) Before Phase 4 ships
Third separate security review. Free-text coaching notes introduce prompt injection risk if the text is ever fed to an LLM. Manager-role authorization must be enforced at the write endpoint, not just assumed from CF Access identity.
💰
Belle — Resource Additions Low stakes, appropriate visibility
D1 database, potential Descartes access upgrade, any new Cloudflare resources. Free tier for now; Belle tracks from the start so cost surprises don't happen later.
Kirk — Phase confirmations and source definitions At each phase boundary
Confirm the new section is working as expected before survey.p2ops.com is redirected. And critically: Phase 3 literally cannot start until Kirk answers the clock/attendance source question. This is a planning gate, not just a QA gate.

What only you can answer

Phases 0–2 can proceed without any of these. Phases 3–4 are blocked until the first two are answered. The leaderboard design question needs to be answered before Phase 1 ships.

1
What records clock-in/out and attendance today?
This is the Phase 3 gate. Options: (A) nothing structured exists — we build a simple clock-in/out screen in the P2 PWA and a daily attendance form for managers; (B) a spreadsheet or app already tracks this — share the format and the pipeline is designable immediately; (C) payroll/Gusto tracks clock time — pull from there. The answer determines whether Phase 3 is a 5-day data pipeline or a 10-day product build first.
2
Coaching notes: Discord slash command or web form in command.p2ops.com?
Meg recommends Option A (Discord slash command) for v1 — managers are already in Discord, Discord role-based authorization is simpler, no new web UI needed. Option B (web form) is better UX but adds a write endpoint + manager authorization layer + Gandalf review that adds calendar time. Both are buildable — what matters is where managers actually spend their day.
3
Leaderboard: ranked list or tier-based display?
Edna designed a ranked list (01, 02, 03…) with trend arrows as the visual hero — rank is secondary but visible. Coach Beard recommends replacing numeric ranks entirely with tiers (High Performer / Solid / Developing) — drivers see their tier and trend, not their position number. Both motivate differently. Both are buildable. Coach Beard signs off on whatever ships, so aligning with him here saves a round trip. This must be decided before Phase 1 leaderboard build starts.
4
Pullback attribution: retroactive or forward-only?
248 existing pullback records have no causal attribution. Athena recommends adding a manager attribution step going forward (driver-caused / item-condition / customer-caused). The retroactive question: does Kirk want managers to classify the 248 existing records (a few hours of work but adds historical fairness), or start clean from go-live? This affects what's shown on historical report cards.
5
Can you export a driver roster from Descartes?
Meg needs a Driver Key → driver name mapping to enable stop-level analytics (individual stop on-time %, service time variance). This requires one export from the Descartes admin panel — a roster of all drivers with their numeric keys. If this export is available, stop-level per-driver analytics unlock. If not, on-time % stays at route aggregate. Takes about 5 minutes if the export exists.

Ready to proceed?

Phases 0–2 can start immediately — they don't depend on the answers above. The survey migration fixes the current broken dashboard. Route performance adds the operational layer. Phase 3 waits for your answers on clock/attendance.

First action after greenlight: Gandalf gets the architecture doc. Coach Beard and Hermione are briefed simultaneously. Kirk exports the Descartes driver roster. Phase 0 starts when Gandalf clears.

What the specialists found

🔬
Athena
Business Intelligence & Data — queried actual files
Complete

64,123 route stops in invoices.db. All four markets, 73 drivers, Oct 2024–May 2026. Completion rate, on-time %, and miss rate are all computable today from existing fields. No new data collection needed for route performance analytics.

Driver identity is the foundational risk. Five sources, five ID schemes. Route ID stored as float text ("4282486.0") in invoices.db vs. integer in ops_manifest vs. integer in Descartes CSV — same route, three formats. All join via CAST() but any ingest code that string-compares will fail silently. "Jesus Reyes-Ortega" vs. "Jesus Reyes Ortega" appear as different drivers in current data. The alias resolver must be built and reviewed before any cross-system join is trusted.

Pullback attribution is currently meaningless for driver accountability. All 248 records carry one reason code. Pullback count per driver is real; implication it reflects driver fault is not.

Highest-impact opportunity not currently being captured: Join survey scores to pullback events — if lower-scoring drivers generate more pullbacks, there's a dollar figure on improving scores. Data is already in-house.

Helper contribution analysis is available now. ops_manifest has helper_name on every recent route. Never been analyzed. A driver who scores 4.3 with Helper A and 3.9 with Helper B has a different problem than their composite score suggests.

⚙️
Meg
Engineering — inspected actual codebase and CSV files
Complete

Descartes CSV format is confirmed stable and ingestion is buildable now. Route summary CSV headers are known and haven't changed across sample files. The pipeline is: Kirk drops CSV → Python validates headers → filters P2 routes → resolves driver names through canonical registry → batch insert to D1. Kirk's CSV generation step is irreducible without API access — the processing after drop is automatable.

Stop-level on-time % per driver is blocked. The stops execution CSV uses numeric Driver Keys. The routes CSV uses first/last name. Their ID systems don't overlap — confirmed by direct file inspection. Resolving this requires a Descartes driver roster export that may not be available at the current tier.

Coaching notes: Discord slash command recommended for v1. Less build time, simpler authorization (Discord roles), managers are already there. Web form is better UX but adds a write endpoint and authorization layer requiring a separate Gandalf review. Build the Discord path first; migrate to web form when there's a concrete reason.

Revised estimate: ~23.5 dev days across Phases 0–4, 3–4 calendar months to full platform. Leaderboard and shareable report card are live by end of Phase 1 (~3 weeks). Route performance adds to the report card in Phase 2. Phases 3–4 wait on source definition and gate clearances.

Gandalf reviews for write endpoints are two separate events. Clock/attendance entry and coaching notes are distinct security surfaces. Don't plan them as one review.

🎨
Edna Mode
Design Lead
Complete

Navigation: "SURVEY PERFORMANCE" as the third primary tab (not "Rankings"). Five sub-nav pills inside: LEADERBOARD · CATEGORIES · MARKETS · TRENDS · COMMENTS. Driver Detail is a drill-down from Leaderboard, URL-addressable, not a peer tab.

Leaderboard framing: Trend arrow is the visual hero, rank number is secondary. No gradient glow medals — replace with a 1px gold left border on rank 01, consistent with the active nav tab treatment. No red for low scorers — score in secondary text color only. "Sort by Most Improved" as a secondary sort option surfaces a different, equally valid story.

Comments browser: 3px danger-red left border on negative review cards — nothing else red. Customer words are in standard text. The stripe is locatable in a scroll without being alarming.

Mobile: Sub-nav stays as text labels with horizontal scroll, not icons. Leaderboard collapses to 4 columns on mobile: RANK · DRIVER · SCORE · TREND.

Flags for routing: Hermione on whether CF Access role restrictions are appropriate for the individual performance data. Matilda on driver awareness comms — if drivers learn this tool exists, what's the plan? That's a management decision, and it should be made before launch.

🧠
Coach Beard
Team Culture & People
Complete

Don't show numeric rank — show a tier. Three tiers: High Performer / Solid / Developing. Drivers see their tier and trend, not "you are #17 of 28." That number doesn't tell them what to do differently; it just places them in a hierarchy. The trend arrow is what changes behavior — "I'm moving up" beats "I am ranked 12th" every time.

Minimum 10-survey threshold before showing a tier. Before threshold: "Building your baseline — not enough surveys yet for a reliable picture." Protects new drivers and low-volume routes from unfair snapshots.

Absolute thresholds for tier assignment, not percentile rank. "High Performer" = score above 4.5 with 20+ surveys — achievable by everyone. If it means "top 20%," most of the team is structurally excluded regardless of how well they perform. That's a slow morale drain, not motivation.

Two separate use cases need separate designs. Supervisor view: full data, trends, comments, coaching log. Driver self-service view: their own score, tier, trend — and anonymized positive comments only. Drivers almost never hear directly from customers they served. When they do, it lands. Negative comments are delivered by supervisors in context, not surfaced raw in a dashboard.

His clearest directive: "The Hermione flag is the one I'd move on before this tool goes live — if scores are touching compensation or scheduling in any way, she needs to see the design. Everything else can be iterative."