MIS-Contro-Tower/fix4.md at 864be8d93283048f9ab0d8d18331eee4c1f0f93a

mdares/MIS-Contro-Tower

Fork 0

Files

Marcelo 5e7ddaa0db changes

2026-04-29 07:13:42 +00:00

8.6 KiB

Raw Blame History

Task: Implement Control Tower changes only (no Node-RED edits), then run full verification with SQL + backfill script.

Repository context:

Workspace root: Plastic-Dashboard
Target branch assumption: sandbox-main
Database: PostgreSQL via Prisma
Scope strictly limited to Control Tower code and scripts in this repo

Hard constraints:

Do NOT edit any Node-RED flow files or Node-RED runtime code.
Do NOT change behavior outside the requested areas unless required for correctness.
Preserve existing non-authoritative guard behavior for downtime reasons (PENDIENTE / UNCLASSIFIED).
Run verification before and after backfill, and report results clearly.
If lint/test has unrelated pre-existing failures, do not refactor unrelated modules.

Implementation requirements:

A) Downtime continuity fallback key fix File:

app/api/ingest/event/route.ts

Goal:

Ensure fallback downtime reason identity/continuity uses episode continuity key (incidentKey) whenever present.
Use row.id only when incidentKey is truly absent.
Preserve guard that prevents non-authoritative values from overwriting authoritative manual reasons.

Details:

In the event ingestion logic where ReasonEntry payload is created for downtime-like events (including fallback UNCLASSIFIED and mold-change):

Derive a fallbackIncidentKey from available payload fields in this preference order:
- evData.incidentKey
- dataObj.incidentKey
- evDowntime?.incidentKey
- evReason?.incidentKey (if available)
Only if all are missing, fallback to row.id.

For fallback reasonRaw objects:

For mold-change fallback, set incidentKey to moldIncidentKey ?? fallbackIncidentKey ?? row.id.
For unclassified fallback, set incidentKey to fallbackIncidentKey ?? row.id.

Create one continuityIncidentKey (single source of truth) used consistently for:

downtime reasonId construction (evt::downtime:)
ReasonEntry episodeId for downtime
meta.incidentKey in reason entry writes
manual-preservation guard queries by episodeId

Keep non-authoritative guard semantics unchanged:

incoming non-authoritative reason should not overwrite existing authoritative reason for same episode
downtime-acknowledged/manual authoritative path remains preserved

B) OEE trend from production-only snapshots File:

app/api/reports/route.ts

Goal:

Build OEE trend from production-only snapshots:
- trackingEnabled = true
- productionStarted = true
Keep summary metrics behavior explicit and consistent with this filtering decision.

Details:

Include trackingEnabled and productionStarted in KPI snapshot select.
Add helper like isProductionSnapshot(trackingEnabled, productionStarted).
Compute OEE/Availability/Performance/Quality averages using production-only rows.
For trend generation:

Iterate timeline in ts order.
For non-production snapshots, emit null points (for OEE and related KPI lines) so chart can render true gaps.
For production snapshots, emit actual numeric values (or null if value is missing).

Keep downtime/event aggregates and cycle-based totals behavior intact unless explicitly tied to OEE production-only requirement.
Keep logic explicit in code comments (short, concrete comments only where needed).

C) Chart rendering behavior: no smoothing across gaps Files:

app/(app)/reports/ReportsCharts.tsx
app/(app)/reports/ReportsPageClient.tsx (if types/downsampling need updates)

Goal:

OEE line interpolation must be linear.
Gaps must be rendered as gaps (no fake continuity through filtered/non-production windows).

Details:

In OEE line chart:

change Line type from monotone to linear
set connectNulls={false}

Ensure frontend types allow nullable trend values for OEE points.
If downsampling exists, preserve gap markers so null separators are not removed.

Keep null transition points when reducing point count.

Ensure tooltip/value formatting handles nulls gracefully.

Verification and execution steps:

Run targeted checks first

run tests related to downtime guard if available:
- npm run test:downtime-reason-guard
run lint at least for changed files (or full lint if practical):
- npx eslint app/api/ingest/event/route.ts app/api/reports/route.ts app/(app)/reports/ReportsCharts.tsx app/(app)/reports/ReportsPageClient.tsx

SQL Verification Pack (PRE-BACKFILL) Execute these exactly and capture output snapshots:

A. Recent downtime reason quality mix SELECT reasonCode, COUNT(*) AS rows FROM "ReasonEntry" WHERE kind = 'downtime' AND "capturedAt" >= NOW() - INTERVAL '7 days' GROUP BY reasonCode ORDER BY rows DESC;

B. Episodes with conflicting reason codes SELECT "orgId", "machineId", "episodeId", COUNT(DISTINCT "reasonCode") AS distinct_codes, MIN("capturedAt") AS first_seen, MAX("capturedAt") AS last_seen FROM "ReasonEntry" WHERE kind = 'downtime' AND "episodeId" IS NOT NULL AND "capturedAt" >= NOW() - INTERVAL '14 days' GROUP BY "orgId", "machineId", "episodeId" HAVING COUNT(DISTINCT "reasonCode") > 1 ORDER BY last_seen DESC LIMIT 200;

C. Potential manual overwritten by non-authoritative check SELECT re."orgId", re."machineId", re."episodeId", re."reasonCode", re."capturedAt", re.meta FROM "ReasonEntry" re WHERE re.kind = 'downtime' AND re."capturedAt" >= NOW() - INTERVAL '14 days' AND re."reasonCode" IN ('PENDIENTE', 'UNCLASSIFIED') ORDER BY re."capturedAt" DESC LIMIT 200;

D. Event continuity around downtime + ack SELECT "machineId", "eventType", ts, data->>'incidentKey' AS incident_key, data->>'status' AS status, data->>'is_update' AS is_update, data->>'is_auto_ack' AS is_auto_ack FROM "MachineEvent" WHERE ts >= NOW() - INTERVAL '3 days' AND "eventType" IN ('microstop', 'macrostop', 'downtime-acknowledged') ORDER BY ts DESC LIMIT 500;

E. KPI production vs non-production counts SELECT COALESCE("trackingEnabled", false) AS tracking_enabled, COALESCE("productionStarted", false) AS production_started, COUNT(*) AS rows FROM "MachineKpiSnapshot" WHERE ts >= NOW() - INTERVAL '7 days' GROUP BY 1,2 ORDER BY rows DESC;

F. Sharp OEE jumps in production snapshots WITH k AS ( SELECT "machineId", ts, oee, LAG(oee) OVER (PARTITION BY "machineId" ORDER BY ts) AS prev_oee FROM "MachineKpiSnapshot" WHERE ts >= NOW() - INTERVAL '7 days' AND "trackingEnabled" = true AND "productionStarted" = true AND oee IS NOT NULL ) SELECT "machineId", ts, prev_oee, oee, ABS(oee - prev_oee) AS delta FROM k WHERE prev_oee IS NOT NULL AND ABS(oee - prev_oee) >= 25 ORDER BY delta DESC, ts DESC LIMIT 200;

G. Trend point count comparison SELECT 'all' AS series, COUNT() AS points FROM "MachineKpiSnapshot" WHERE ts >= NOW() - INTERVAL '24 hours' AND oee IS NOT NULL UNION ALL SELECT 'production_only' AS series, COUNT() AS points FROM "MachineKpiSnapshot" WHERE ts >= NOW() - INTERVAL '24 hours' AND oee IS NOT NULL AND "trackingEnabled" = true AND "productionStarted" = true;

Backfill run plan (must follow this order) A. Dry-run first: node scripts/backfill-downtime-reasons.mjs --dry-run --since 30d

B. Review dry-run output:

candidates
sampleUpdates
incident distribution by machine
any suspicious replacements

C. Apply scoped first (single machine from dry-run sample): node scripts/backfill-downtime-reasons.mjs --since 30d --machine-id <machine_uuid>

SQL Verification Pack (POST-BACKFILL)

Re-run queries A, B, C at minimum.
Optionally rerun D/F/G for confidence.
Confirm reduction in stale PENDIENTE/UNCLASSIFIED rows where authoritative reason exists.
Confirm conflicting episode reason cases reduced or shifted as expected.

Acceptance criteria checklist:

New downtime episodes retain authoritative manual reason and do not regress to PENDIENTE/UNCLASSIFIED.
Fallback downtime continuity now keys by incidentKey whenever available; row.id only when absent.
OEE trend no longer shows implausible 0/100 jumps from non-production snapshots.
OEE chart is linear and visually shows true gaps (no smoothing continuity across filtered windows).
Backfill dry-run and scoped apply outputs are captured and reasonable.
Post-run SQL confirms expected improvements without obvious regressions.

Output format required from you:

Files changed with concise reason per file.
Exact diff summary for each modified file.
Test/lint commands run + result.
Pre-backfill SQL results (compact tables or summarized counts).
Dry-run output summary (key fields + sample updates).
Scoped apply command used and output summary.
Post-backfill SQL delta summary (before vs after).
Any blockers (env vars, DB auth, migration state, etc.) and exactly what is needed to unblock.

8.6 KiB Raw Blame History

8.6 KiB

Raw Blame History