Files
Projects-plastic/FIX_PLAN.md
Marcelo b66cb97f16 MVP
2025-11-28 09:11:59 -06:00

42 KiB

OEE Dashboard Fix Plan

Comprehensive Strategy for Resolving All Issues


Executive Summary

We have identified 5 distinct issues affecting your OEE dashboard. This plan addresses each systematically, ordered by priority based on impact, risk, and dependencies.

Estimated Total Implementation Time: 2-3 hours Recommended Approach: Sequential implementation with testing between each phase

Key Improvements in This Updated Plan

This plan has been enhanced based on critical friction point analysis for Node-RED environments:

  1. Global Context Persistence - Added robust initialization logic for all global variables to handle Node-RED restarts and deploys without data loss or spikes

  2. State Synchronization (Push + Pull Model) - Enhanced START/STOP button state tracking with both push notifications AND pull requests to handle mid-production dashboard loads

  3. Angular Timing Issues - Replaced brittle fixed timeouts with data-driven initialization and polling fallback for reliable chart loading across all system speeds

  4. Dual-Path KPI Architecture - Implemented separate paths for live display (real-time, unthrottled) and historical graphs (averaged, smooth) to eliminate the stale-data vs jerky-graphs trade-off

  5. Time-Based Availability Logic - Enhanced availability calculation with configurable time thresholds to distinguish brief pauses from legitimate shutdowns

  6. LLM Implementation Guide - Added comprehensive best practices section for working with LLMs to implement this plan with precise, defensive code

Critical Refinements (Final Review)

Based on final review, these critical refinements have been integrated:

  1. Clear Buffer on Production START - Prevents stale data from skewing averages if Node-RED restarts mid-production and context is restored from disk

  2. Consolidated lastMachineCycleTime Updates - Now updated ONLY in Machine Cycles function (not Calculate KPIs) to maintain clean "machine pulse" signal, initialized to Date.now() on startup to prevent immediate 0% availability

  3. Combined Initialization Strategy - Graphs now use BOTH data-driven initialization (fast when production is running) AND 5-second safety timeout (for idle machine scenarios)

  4. Multi-Source KPI Calculation - Calculate KPIs now explicitly handles triggers from both Machine Cycles (continuous) and Scrap Submission (event-based) with proper guards

  5. Complete Init Node - Added production-ready initialization function with all global variables (kpiBuffer, lastKPIRecordTime, lastMachineCycleTime, lastKPIValues) properly initialized with correct default values and logging


Issue Breakdown & Root Causes

Issue 1: KPI Updates Only on Scrap Submission

Symptom: KPIs stay static during production, only update when scrap is submitted or START/STOP clicked Root Cause:

  • Machine Cycles function has multiple return paths with [null, ...] outputs
  • Output to Calculate KPIs (output port 2) only happens in specific conditions
  • When trackingEnabled is false or no active order, KPI calculation is skipped
  • Critical line: if (!trackingEnabled) return [null, stateMsg]; prevents KPI updates

Sub-issue 1b: START/STOP Button State

  • Button state not persisting because UI doesn't track trackingEnabled global variable
  • Home template needs to watch for tracking state changes

Issue 2: Graphs Empty on First Load, Sidebar Broken

Symptom: Graphs tab shows blank, navigation doesn't work until refresh Root Causes:

  1. Timing Issue: Charts created before Angular/scope is fully ready
  2. Scope Isolation: scope.gotoTab might not be accessible immediately
  3. Data Race: Charts created before first KPI data arrives

Why refresh works: Second load benefits from cached scope and existing data


Issue 3: Availability & OEE Drop to 0%

Symptom: Metrics incorrectly show 0% during active production Root Cause:

  • Calculate KPIs function has logic that sets availability to 0 when certain conditions aren't met
  • Need to verify: When does trackingEnabled check fail?
  • Hypothesis: When production is running but tracking flag isn't properly set, availability defaults to 0

Issue 4: Graph Updates Too Frequent/Jerky

Symptom: Data points recorded too often, causing choppy visualization Root Cause:

  • Record KPI History is called on EVERY Calculate KPIs output
  • With machine cycles happening every ~1 second, KPIs recorded every second
  • Need time-based throttling (1-minute intervals) instead of event-based recording

Issue 5: Time Range Filters Not Working

Symptom: Shift/Day/Week/Month/Year buttons don't change graph display Root Cause:

  • build(metric, range) function receives range parameter but ignores it
  • Function always returns ALL data from realtimeData[metric]
  • Need to filter data based on selected time range

Fix Plan - Phased Approach

PHASE 1: Low-Risk Quick Wins

Estimated Time: 30 minutes Risk Level: LOW

1.1 Fix Graph Filters (Issue 5)

Files: projects/Plastico/flows.json → Graphs Template

Changes:

// BEFORE
function build(metric, range){
  const arr = realtimeData[metric];
  if (!arr || arr.length === 0) return [];
  return arr.map(d=>({x:d.timestamp, y:d.value}));
}

// AFTER
function build(metric, range){
  const arr = realtimeData[metric];
  if (!arr || arr.length === 0) return [];

  // Calculate time cutoff based on range
  const now = Date.now();
  const cutoffs = {
    shift: 8 * 60 * 60 * 1000,      // 8 hours
    day: 24 * 60 * 60 * 1000,       // 24 hours
    week: 7 * 24 * 60 * 60 * 1000,  // 7 days
    month: 30 * 24 * 60 * 60 * 1000, // 30 days
    year: 365 * 24 * 60 * 60 * 1000  // 365 days
  };

  const cutoffTime = now - (cutoffs[range] || cutoffs.shift);

  // Filter data to selected time range
  return arr
    .filter(d => d.timestamp >= cutoffTime)
    .map(d => ({x: d.timestamp, y: d.value}));
}

Testing:

  • Click each filter button
  • Verify data range changes in charts
  • Check that no errors occur

Potential Issues:

  • If no data exists in selected range, chart might be empty (expected behavior)

Rollback: Easy - revert to original build() function


1.2 Fix Empty Graphs on First Load (Issue 2)

Files: projects/Plastico/flows.json → Graphs Template

Strategy: Use data-driven initialization instead of fixed timeout for reliability

Changes:

A) Combined Data-Driven + Safety Timeout (RECOMMENDED)

// BEFORE
setTimeout(()=>{
  initFilters();
  createCharts(currentRange);
},300);

// AFTER - Wait for first data message OR timeout
let chartsInitialized = false;

scope.$watch('msg', function(msg) {
  // Initialize on first KPI data arrival
  if (msg && msg.payload && msg.payload.kpis && !chartsInitialized) {
    // Scope and data are both ready
    initFilters();
    createCharts(currentRange);
    chartsInitialized = true;
    console.log('[Graphs] Charts initialized via data-driven approach');
  }

  // Update charts if already initialized
  if (chartsInitialized && msg && msg.payload && msg.payload.kpis) {
    updateCharts(msg);
  }
});

// ADDED: Safety timer for when machine is idle (no KPI messages flowing)
setTimeout(() => {
  if (!chartsInitialized) {
    console.warn('[Graphs] Charts initialized via safety timer (machine idle)');
    initFilters();
    createCharts(currentRange);
    chartsInitialized = true;
  }
}, 5000); // 5 seconds grace period for KPI messages

Why Both?

  • Data-driven: Ensures charts initialize as soon as data is available (fast, reliable)
  • Safety timeout: Handles "dashboard loaded but machine is idle" scenario (no KPI messages)
  • Together they cover both active production and idle machine scenarios

B) Fallback: Polling with timeout (if data-driven doesn't work)

function initWhenReady(attempts = 0) {
  const oeeEl = document.getElementById("chart-oee");
  const availEl = document.getElementById("chart-availability");

  if (oeeEl && availEl && scope.gotoTab) {
    // Both DOM and scope ready
    initFilters();
    createCharts(currentRange);
  } else if (attempts < 20) {
    // Retry every 100ms, max 2 seconds
    setTimeout(() => initWhenReady(attempts + 1), 100);
  } else {
    console.error("[Graphs] Failed to initialize charts after 2 seconds");
  }
}

// Start polling on load
initWhenReady();

C) Ensure scope.gotoTab is properly bound

// BEFORE
(function(scope){
  scope.gotoTab = t => scope.send({ui_control:{tab:t}});
})(scope);

// AFTER
(function(s){
  if (!s.gotoTab) {
    s.gotoTab = function(t) {
      s.send({ui_control: {tab: t}});
    };
  }
})(scope);

D) Add defensive chart creation with retry

function createCharts(range){
  // Ensure DOM elements exist
  const oeeEl = document.getElementById("chart-oee");
  const availEl = document.getElementById("chart-availability");

  if (!oeeEl || !availEl) {
    console.warn("[Graphs] Chart elements not ready, retrying...");
    setTimeout(() => createCharts(range), 200);
    return;
  }

  // ... rest of existing chart creation logic
}

Testing:

  • Clear browser cache
  • Navigate to Graphs tab from fresh load
  • Test sidebar navigation
  • Verify charts appear without refresh
  • Test on slow network/system

Potential Issues:

  • Data-driven approach requires KPI messages flowing
  • If no production running, charts won't initialize (add timeout fallback)

Recommended Implementation:

  1. Start with data-driven approach (Option A)
  2. Add polling fallback (Option B) as safety net
  3. Implement defensive checks (Options C & D)

Rollback: Easy - revert to original setTimeout logic


PHASE 2: Medium-Risk Data Flow Improvements 🔧

Estimated Time: 45 minutes Risk Level: MEDIUM

2.1 Implement KPI Update Throttling with Dual-Path Architecture (Issue 4)

Files:

  • projects/Plastico/flows.json → Calculate KPIs function (add second output)
  • projects/Plastico/flows.json → Record KPI History function (add averaging)

Strategy: Dual-path updates solve the stale display vs jerky graphs trade-off

  • Path 1: Unthrottled live KPIs to Home Template for real-time display
  • Path 2: Throttled/averaged KPIs to Record History for smooth graphs

Part A: Modify Calculate KPIs to Output on Two Paths

// At the end of Calculate KPIs function

// Prepare the KPI message
const kpiMsg = {
  topic: "kpis",
  payload: {
    timestamp: Date.now(),
    kpis: {
      oee: msg.kpis.oee,
      availability: msg.kpis.availability,
      performance: msg.kpis.performance,
      quality: msg.kpis.quality
    }
  }
};

// Return to TWO outputs:
// Output 1: Live KPI to Home Template (real-time, unthrottled)
// Output 2: KPI to Record History (will be averaged/throttled)
return [
  kpiMsg,           // Path 1: Live display
  { ...kpiMsg }     // Path 2: History recording (clone to prevent mutation)
];

Wiring Changes:

  • Calculate KPIs node needs 2 outputs (add one more)
  • Output 1 → Home Template (existing connection)
  • Output 2 → Record KPI History (new connection)

Part B: Add Averaging Logic to Record KPI History

// Complete Record KPI History function with robust initialization

// ========== INITIALIZATION ==========
// Initialize buffer
let buffer = global.get("kpiBuffer");
if (!buffer || !Array.isArray(buffer)) {
  buffer = [];
  global.set("kpiBuffer", buffer);
  node.warn('[KPI History] Initialized kpiBuffer');
}

// Initialize last record time
let lastRecordTime = global.get("lastKPIRecordTime");
if (!lastRecordTime || typeof lastRecordTime !== 'number') {
  // Set to 1 minute ago to ensure immediate recording on startup
  lastRecordTime = Date.now() - 60000;
  global.set("lastKPIRecordTime", lastRecordTime);
  node.warn('[KPI History] Initialized lastKPIRecordTime');
}

// ========== ACCUMULATE ==========
const kpis = msg.payload.kpis;
if (!kpis) {
  node.warn('[KPI History] No KPIs in message, skipping');
  return null;
}

buffer.push({
  timestamp: Date.now(),
  oee: kpis.oee || 0,
  availability: kpis.availability || 0,
  performance: kpis.performance || 0,
  quality: kpis.quality || 0
});

// Prevent buffer from growing too large (safety limit)
if (buffer.length > 100) {
  buffer = buffer.slice(-60); // Keep last 60 entries
  node.warn('[KPI History] Buffer exceeded 100 entries, trimmed to 60');
}

global.set("kpiBuffer", buffer);

// ========== CHECK IF TIME TO RECORD ==========
const now = Date.now();
const timeSinceLastRecord = now - lastRecordTime;
const ONE_MINUTE = 60 * 1000;

if (timeSinceLastRecord < ONE_MINUTE) {
  // Not time to record yet
  const secondsRemaining = Math.ceil((ONE_MINUTE - timeSinceLastRecord) / 1000);
  // Debug log (can remove in production)
  // node.warn(`[KPI History] Buffer: ${buffer.length} entries, recording in ${secondsRemaining}s`);
  return null; // Don't send to charts yet
}

// ========== CALCULATE AVERAGES ==========
if (buffer.length === 0) {
  node.warn('[KPI History] Buffer empty at recording time, skipping');
  return null;
}

const avg = {
  oee: buffer.reduce((sum, d) => sum + d.oee, 0) / buffer.length,
  availability: buffer.reduce((sum, d) => sum + d.availability, 0) / buffer.length,
  performance: buffer.reduce((sum, d) => sum + d.performance, 0) / buffer.length,
  quality: buffer.reduce((sum, d) => sum + d.quality, 0) / buffer.length
};

node.warn(`[KPI History] Recording averaged KPIs from ${buffer.length} samples: OEE=${avg.oee.toFixed(1)}%`);

// ========== RECORD TO HISTORY ==========
// Update global state
global.set("lastKPIRecordTime", now);
global.set("kpiBuffer", []); // Clear buffer

// Send averaged values to graphs and database
return {
  topic: "kpi-history",
  payload: {
    timestamp: now,
    kpis: {
      oee: Math.round(avg.oee * 10) / 10,           // Round to 1 decimal
      availability: Math.round(avg.availability * 10) / 10,
      performance: Math.round(avg.performance * 10) / 10,
      quality: Math.round(avg.quality * 10) / 10
    },
    sampleCount: buffer.length  // Metadata for debugging
  }
};

Recommendation: This dual-path approach provides the best of both worlds

Testing:

  • Start production
  • Observe KPI update frequency in graphs
  • Verify updates occur approximately every 60 seconds
  • Check that no spikes/gaps appear in data

Potential Issues:

  • First data point might take up to 1 minute to appear
  • Rapid production changes might not be immediately visible
  • Buffer could grow large if production runs without recording

Mitigation:

  • Set buffer max size (e.g., 100 entries)
  • Force record on production stop/start

Rollback: Medium difficulty - remove throttling logic, clear global variables


PHASE 3: High-Risk Core Logic Fixes ⚠️

Estimated Time: 60 minutes Risk Level: HIGH

⚠️ CRITICAL: Backup flows.json before proceeding

3.1 Fix KPI Continuous Updates (Issue 1)

Files: projects/Plastico/flows.json → Machine Cycles function

Problem: Machine Cycles has multiple early returns that skip KPI calculation

Current Logic:

// Line ~36: No active order
if (!activeOrder || !activeOrder.id || cavities <= 0) {
    return [null, stateMsg];  // ❌ Skips KPI calculation
}

// Line ~43: Tracking not enabled
if (!trackingEnabled) {
    return [null, stateMsg];  // ❌ Skips KPI calculation
}

Solution Options:

Option A: Always Calculate KPIs (Recommended)

// Always prepare a message for Calculate KPIs on output 2
const kpiTrigger = { _triggerKPI: true };

// Change all returns to include kpiTrigger
if (!activeOrder || !activeOrder.id || cavities <= 0) {
    return [null, stateMsg, kpiTrigger];  // ✓ Triggers KPI calculation
}

if (!trackingEnabled) {
    return [null, stateMsg, kpiTrigger];  // ✓ Triggers KPI calculation
}

// Update last machine cycle time when a successful cycle occurs
// This is used for time-based availability logic
if (trackingEnabled && dbMsg) {
    // dbMsg being non-null implies a cycle was recorded
    global.set("lastMachineCycleTime", Date.now());
}

// ... final return
return [dbMsg, stateMsg, kpiTrigger];

Critical: The lastMachineCycleTime update must happen ONLY in Machine Cycles function to maintain a clean "machine pulse" signal separate from KPI calculation triggers.

Wire Configuration Change:

  • Add third output wire to Machine Cycles node
  • Connect output 3 → Calculate KPIs

Option B: Calculate KPIs in Parallel (Alternative)

  • Add an inject node that triggers Calculate KPIs every 5 seconds
  • Less coupled, but might calculate with stale data

Recommendation: Option A - ensures KPIs calculated with real-time data

Testing:

  1. Start production with START button
  2. Observe KPI values on Home page
  3. Verify continuous updates (every ~1 second before throttling)
  4. Check that scrap submission still works
  5. Test production stop/start

Potential Issues:

  • Calculate KPIs might need to handle cases with no active order
  • Could calculate KPIs unnecessarily when machine is idle
  • Performance impact if calculating too frequently

Mitigation:

  • Add guards in Calculate KPIs to handle null/undefined inputs
  • Implement Phase 2 throttling first to reduce calculation frequency
  • Monitor system performance

CRITICAL: Calculate KPIs Multi-Source Handling

The Calculate KPIs function will now receive triggers from TWO sources:

  1. Machine Cycles (continuous, real-time) - via new output 3
  2. Scrap Submission (event-based) - existing connection

Required Change in Calculate KPIs:

// At the start of Calculate KPIs function
// Must handle both trigger types

// The function should execute regardless of message content
// as long as it receives ANY trigger

const trackingEnabled = global.get("trackingEnabled");
const activeOrder = global.get("activeOrder") || {};
const productionStartTime = global.get("productionStartTime");

// Guard against missing critical data
if (!trackingEnabled || !activeOrder.id) {
  // Can't calculate meaningful KPIs without tracking or active order
  // But don't error - just skip calculation
  return null;
}

// ... rest of existing KPI calculation logic
// This logic will now run for BOTH continuous and event-based triggers

This ensures availability and OEE calculations work correctly whether triggered by machine cycles or scrap submission.

Side Effects:

  • Will trigger Issue 4 more severely → MUST implement Phase 2 throttling first
  • Database might receive more frequent updates
  • Global variables will change more often

Rollback: Medium difficulty - requires restoring original return statements and wire configuration


3.2 Fix Availability/OEE Drops to 0 (Issue 3)

Files: projects/Plastico/flows.json → Calculate KPIs function

Investigation Steps:

  1. Read full Calculate KPIs function
  2. Identify all paths that set msg.kpis.availability = 0
  3. Add logging to track when this occurs
  4. Understand state flow: trackingEnabled, productionStartTime, operatingTime

Hypothesis Testing:

// Add debug logging at the start
node.warn(`[KPI] trackingEnabled=${trackingEnabled}, startTime=${productionStartTime}, opTime=${operatingTime}`);

// Before setting availability to 0
if (/* condition that causes 0 */) {
    node.warn(`[KPI] Setting availability to 0 because: [reason]`);
    msg.kpis.availability = 0;
}

Likely Fix:

// BEFORE
} else {
    msg.kpis.availability = 0; // Not running
}

// AFTER
} else {
    // Check if production was recently active
    const prev = global.get("lastKPIValues") || {};
    if (prev.availability > 0 && operatingTime > 0) {
        // Maintain last availability if we have operating time
        msg.kpis.availability = prev.availability;
    } else {
        msg.kpis.availability = 0;
    }
}

// Store KPIs for next iteration
global.set("lastKPIValues", msg.kpis);

Testing:

  1. Start production
  2. Monitor availability values
  3. Trigger scrap prompt
  4. Verify availability doesn't drop to 0
  5. Check OEE calculation

Potential Issues:

  • Might mask legitimate 0% availability (machine actually stopped)
  • Could create artificially high availability readings
  • State persistence might cause issues after restart

Mitigation:

  • Add clear conditions for when availability should legitimately be 0
  • Reset lastKPIValues on work order completion
  • Add production state tracking

Rollback: Easy if logging added first - can revert based on log analysis


3.3 Fix START/STOP Button State (Issue 1b)

Files: projects/Plastico/flows.json → Home Template

Problem: Button doesn't show correct state (STOP when production running)

Investigation:

  • Find button rendering logic in Home template
  • Check how trackingEnabled or productionStarted is tracked
  • Verify message handler receives state updates

Changes:

// In Home Template scope.$watch
if (msg.topic === 'machineStatus') {
  window.machineOnline = msg.payload.machineOnline;
  window.productionStarted = msg.payload.productionStarted;

  // NEW: Track tracking state for button display
  window.trackingEnabled = msg.payload.trackingEnabled || window.productionStarted;

  scope.renderDashboard();
  return;
}

Button HTML Update:

<!-- BEFORE -->
<button ng-click="handleStart()">START</button>

<!-- AFTER -->
<button ng-click="handleStart()" ng-show="!trackingEnabled">START</button>
<button ng-click="handleStop()" ng-show="trackingEnabled" class="stop-btn">STOP</button>

Backend Update (Work Order buttons):

// When START clicked, also set trackingEnabled flag
if (action === "start-tracking") {
    global.set("trackingEnabled", true);

    // CRITICAL: Clear KPI buffer on production start
    // Prevents stale data from skewing averages if Node-RED was restarted mid-production
    global.set("kpiBuffer", []);
    node.warn('[START] Cleared kpiBuffer for fresh production run');

    // Optional: Reset last record time to ensure immediate data point
    global.set("lastKPIRecordTime", Date.now() - 60000);

    // Send state update to UI
    const stateMsg = {
        topic: "machineStatus",
        payload: {
            machineOnline: true,
            productionStarted: true,
            trackingEnabled: true
        }
    };
    // ... send stateMsg to Home template
}

Why Clear Buffer on START: If Node-RED restarts during a production run and context is restored from disk, the kpiBuffer might contain stale data from before the restart. When production resumes, new data would be mixed with old data, skewing the averages. Clearing on START ensures a clean slate for each production session.

Testing:

  1. Load dashboard
  2. Start work order
  3. Verify START button changes to STOP
  4. Click STOP (if implemented)
  5. Verify button changes back to START

Potential Issues:

  • Need to implement STOP button handler if it doesn't exist
  • State sync between backend and frontend
  • Button might flicker during state transitions

Rollback: Easy - remove button visibility conditions


Implementation Order & Dependencies

  1. Phase 1.1 - Fix Filters (Independent, low risk)
  2. Phase 1.2 - Fix Empty Graphs (Independent, low risk)
  3. Phase 2.1 - Add Throttling (Required before Phase 3.1)
  4. Phase 3.2 - Fix Availability Calculation (Add logging first)
  5. Phase 3.1 - Fix Continuous KPI Updates (Depends on throttling)
  6. Phase 3.3 - Fix Button State (Can be done anytime)

Why This Order?

  1. Quick wins first - Build confidence, improve UX immediately
  2. Throttling before continuous updates - Prevent performance issues
  3. Logging before logic changes - Understand problem before fixing
  4. Independent fixes can run parallel - Save time

Testing Strategy

Per-Phase Testing:

  • Test each phase independently
  • Don't proceed to next phase if current fails
  • Keep backup of working state

Integration Testing (After All Phases):

  1. Fresh Start Test

    • Clear browser cache
    • Restart Node-RED
    • Load dashboard
    • Navigate all tabs
  2. Production Cycle Test

    • Start new work order
    • Click START
    • Let run for 2-3 minutes
    • Submit scrap
    • Verify KPIs update
    • Check graphs show data
    • Test time filters
  3. State Persistence Test

    • Refresh page during production
    • Verify state restores correctly
    • Check button shows STOP if running
  4. Edge Cases

    • No active work order
    • Machine offline
    • Zero production time
    • Rapid start/stop

Rollback Plan

Per-Phase Rollback:

Each phase documents its rollback procedure. In general:

  1. Stop Node-RED
  2. Restore flows.json from backup
    cp projects/Plastico/flows.json.backup projects/Plastico/flows.json
    
  3. Clear global context (if needed)
    // In a debug node
    global.set("lastKPIRecordTime", null);
    global.set("kpiBuffer", null);
    global.set("lastKPIValues", null);
    
  4. Restart Node-RED
  5. Clear browser cache

Emergency Full Rollback:

# Restore from most recent backup
cp projects/Plastico/Respaldo_MVP_Complete_11_23_25.json projects/Plastico/flows.json
# Restart Node-RED
node-red-restart

Potential Roadblocks & Mitigations

Roadblock 1: Global Context Persistence on Deploy/Restart ⚠️ CRITICAL

Symptom: After Node-RED restart or deploy, throttling/averaging/availability logic breaks or shows incorrect data Root Cause: Global variables (lastKPIRecordTime, kpiBuffer, lastKPIValues, trackingEnabled) may be reset or restored from file/memory store depending on settings.js configuration

Mitigation:

  1. Add Robust Initialization Logic:
// In Record KPI History function - ALWAYS check and initialize
let buffer = global.get("kpiBuffer");
if (!buffer || !Array.isArray(buffer)) {
  buffer = [];
  global.set("kpiBuffer", buffer);
}

let lastRecordTime = global.get("lastKPIRecordTime");
if (!lastRecordTime || typeof lastRecordTime !== 'number') {
  // Set to 1 minute ago to ensure immediate recording on startup
  lastRecordTime = Date.now() - 60000;
  global.set("lastKPIRecordTime", lastRecordTime);
}
  1. Create an Init Node:
    • Add a dedicated "Initialize Global Variables" function node
    • Trigger on deploy using an inject node (inject once, delay 0)
    • Wire to all critical nodes to ensure state is set before first execution

Complete Init Node Code:

// Initialize Global Variables - Run on Deploy
node.warn('[INIT] Initializing global variables');

// KPI Buffer for averaging
if (!global.get("kpiBuffer")) {
  global.set("kpiBuffer", []);
  node.warn('[INIT] Set kpiBuffer to []');
}

// Last KPI record time - set to 1 min ago for immediate first record
if (!global.get("lastKPIRecordTime")) {
  global.set("lastKPIRecordTime", Date.now() - 60000);
  node.warn('[INIT] Set lastKPIRecordTime');
}

// Last machine cycle time - set to now to prevent immediate 0% availability
if (!global.get("lastMachineCycleTime")) {
  global.set("lastMachineCycleTime", Date.now());
  node.warn('[INIT] Set lastMachineCycleTime to prevent 0% availability on startup');
}

// Last KPI values
if (!global.get("lastKPIValues")) {
  global.set("lastKPIValues", {});
  node.warn('[INIT] Set lastKPIValues to {}');
}

node.warn('[INIT] Global variable initialization complete');
return msg;
  1. Check settings.js:
    • Verify contextStorage configuration
    • Consider using file storage for persistence if using memory (default)

Testing:

  • Deploy changes multiple times
  • Restart Node-RED
  • Verify variables persist/initialize correctly
  • Check debug logs for initialization messages

Roadblock 2: State Sync Between Flow and Dashboard (Push vs Pull Model)

Symptom: START/STOP button shows wrong state when user loads dashboard mid-production Root Cause: Relying on push model (messages sent during state changes) - if user loads page after tracking started, initial message is missed

Mitigation:

  1. Add Pull Mechanism in Home Template:
// In Home Template initialization
(function(scope) {
  // Request current state on load
  scope.send({
    topic: "requestState",
    payload: {}
  });

  // Handle state response
  scope.$watch('msg', function(msg) {
    if (msg && msg.topic === 'currentState') {
      window.trackingEnabled = msg.payload.trackingEnabled;
      window.productionStarted = msg.payload.productionStarted;
      window.machineOnline = msg.payload.machineOnline;
      scope.renderDashboard();
    }
    // ... rest of watch logic
  });
})(scope);
  1. Add State Response Handler:
    • Create function node that listens for requestState topic
    • Responds with current global state values
    • Wire to Home template

Testing:

  • Start production
  • Open dashboard in new browser tab
  • Verify button shows STOP immediately
  • Test with multiple browser sessions

Roadblock 3: UI/Angular Timing Races in ui-template ⚠️ HIGH IMPACT

Symptom: Charts sometimes load, sometimes don't - fixed timeout (500ms) is unreliable on slow systems or complex templates Root Cause: Node-RED Dashboard uses AngularJS - digest cycle and DOM rendering timing is unpredictable

Mitigation Option A - Data-Driven Initialization (RECOMMENDED):

// Instead of fixed timeout, wait for first data
let chartsInitialized = false;

scope.$watch('msg', function(msg) {
  if (msg && msg.kpis && !chartsInitialized) {
    // First data arrived, scope is ready
    initFilters();
    createCharts(currentRange);
    chartsInitialized = true;
  }

  if (chartsInitialized && msg && msg.kpis) {
    updateCharts(msg);
  }
});

Mitigation Option B - Angular Lifecycle Hook:

// Hook into Angular's ready state
scope.$applyAsync(function() {
  // DOM and scope guaranteed ready
  initFilters();
  createCharts(currentRange);
});

Mitigation Option C - Polling with Timeout:

function initWhenReady(attempts = 0) {
  const oeeEl = document.getElementById("chart-oee");

  if (oeeEl && scope.gotoTab) {
    // Both DOM and scope ready
    initFilters();
    createCharts(currentRange);
  } else if (attempts < 20) {
    // Retry every 100ms, max 2 seconds
    setTimeout(() => initWhenReady(attempts + 1), 100);
  } else {
    console.error("Failed to initialize charts after 2 seconds");
  }
}

// Start polling
initWhenReady();

Recommendation: Use Option A for most reliable results


Roadblock 4: Throttling vs Live Display Trade-off

Symptom: With averaging, displayed KPIs are stale (up to 59 seconds old), but without averaging, graphs are jerky Root Cause: OEE is a real-time snapshot - averaging smooths graphs but delays live feedback

Solution: Dual-Path KPI Updates

Architecture:

  • Path 1 (Live): Machine Cycles → Calculate KPIs → Home Template (no throttling)
  • Path 2 (History): Machine Cycles → Calculate KPIs → Averaging Buffer → Record History (throttled to 1 min)

Implementation:

// In Calculate KPIs function - send to TWO outputs
return [
  msg,              // Output 1: Live KPI to Home Template (unthrottled)
  { ...msg }        // Output 2: KPI to History (will be throttled)
];

In Record KPI History - add averaging logic:

// Only this node has averaging/throttling
let buffer = global.get("kpiBuffer") || [];
buffer.push({
  timestamp: Date.now(),
  oee: msg.kpis.oee,
  availability: msg.kpis.availability,
  performance: msg.kpis.performance,
  quality: msg.kpis.quality
});

const lastRecord = global.get("lastKPIRecordTime") || 0;
const now = Date.now();

if (now - lastRecord >= 60000) {
  // Average the buffer
  const avg = {
    oee: buffer.reduce((sum, d) => sum + d.oee, 0) / buffer.length,
    // ... other metrics
  };

  // Record averaged values to history
  // Send to Graphs template
  global.set("lastKPIRecordTime", now);
  global.set("kpiBuffer", []);
  return { kpis: avg };
} else {
  global.set("kpiBuffer", buffer);
  return null; // Don't record yet
}

Benefits:

  • Live display always shows current OEE
  • Graphs are smooth with averaged data
  • No UX compromise

Roadblock 5: Availability 0% Logic Too Simplistic

Symptom: Availability drops to 0% during brief pauses (scrap submission) but also might NOT drop to 0% during legitimate stops (breaks, maintenance) Root Cause: Using previous value without time-based threshold can't distinguish brief interruption from actual shutdown

Improved Logic:

// In Calculate KPIs function
const now = Date.now();
const lastCycleTime = global.get("lastMachineCycleTime") || now;
const timeSinceLastCycle = now - lastCycleTime;

const BRIEF_PAUSE_THRESHOLD = 5 * 60 * 1000; // 5 minutes

if (!trackingEnabled || timeSinceLastCycle > BRIEF_PAUSE_THRESHOLD) {
  // Legitimately stopped or long pause
  msg.kpis.availability = 0;
  global.set("lastKPIValues", null); // Clear history
} else if (operatingTime > 0) {
  // Calculate normally
  msg.kpis.availability = calculateAvailability(operatingTime, plannedTime);
  global.set("lastKPIValues", msg.kpis);
} else {
  // Brief pause - maintain last known value
  const prev = global.get("lastKPIValues") || {};
  msg.kpis.availability = prev.availability || 0;
}

// NOTE: lastMachineCycleTime is updated in Machine Cycles function ONLY
// This keeps the "machine pulse" signal clean and separate from KPI calculation

Configuration:

  • Adjust BRIEF_PAUSE_THRESHOLD based on your production environment
  • Consider making it configurable via dashboard setting

Roadblock 6: KPI Calculation Performance

Symptom: System slow after implementing continuous KPI updates Mitigation:

  • Implement Phase 2 throttling FIRST (now with dual-path approach)
  • Ensure Calculate KPIs has guards for null/undefined inputs
  • Profile Calculate KPIs function for optimization
  • Monitor Node-RED CPU usage during production

Roadblock 7: Browser Cache Issues

Symptom: Changes don't appear after deployment Mitigation:

  • Clear browser cache during testing (Ctrl+Shift+R / Cmd+Shift+R)
  • Add cache-busting version to template (optional):
// In template header
<!-- Version: 1.1 - {{Date.now()}} -->
  • Use incognito/private browsing for testing
  • Test on different browsers/devices

Success Criteria

Phase 1:

  • Time filters change graph display correctly
  • Graphs load on first visit without refresh
  • Sidebar navigation works immediately

Phase 2:

  • Graph updates occur at ~1 minute intervals
  • Graphs are smooth, not jerky
  • No performance degradation

Phase 3:

  • KPIs update continuously during production
  • Availability never incorrectly shows 0%
  • START button shows STOP when production running
  • OEE calculation is accurate

Integration:

  • All features work together without conflicts
  • No console errors
  • Production tracking works end-to-end
  • Data persists correctly

Estimated Timeline

Phase Task Time Cumulative
1.1 Fix Filters 15 min 15 min
1.2 Fix Empty Graphs 15 min 30 min
2.1 Add Throttling 45 min 1h 15m
3.2 Fix Availability (with logging) 30 min 1h 45m
3.1 Fix Continuous Updates 30 min 2h 15m
3.3 Fix Button State 20 min 2h 35m
Testing Integration Testing 30 min 3h 5m

Total: ~3 hours (assuming no major roadblocks)


Best Practices for LLM-Assisted Implementation

When working with an LLM to implement this plan, use these strategies for best results:

1. Isolate Logic Focus (Function Node Precision)

DO:

  • Ask for specific function node code: "Write the Record KPI History function with averaging logic including global.get initialization"
  • Provide exact input/output requirements: "This function receives msg.kpis object and must return msg or null"
  • Request one change at a time

DON'T:

  • Ask vague questions like "fix my dashboard"
  • Request multiple phase changes in one prompt
  • Assume LLM knows your flow structure

2. Explicitly Define Global Variables

Template for LLM prompts:

Global variable: kpiBuffer
Type: Array of objects
Structure: [{timestamp: number, oee: number, availability: number, performance: number, quality: number}]
Lifecycle: Initialized to [] if null, cleared after recording to history
Purpose: Accumulates KPI values for 1-minute averaging

Always specify:

  • Variable name
  • Data type
  • Default/initial value
  • When it's read/written
  • When it should be cleared

3. Specify Node-RED Input/Output Requirements

Example prompt:

The Machine Cycles function node must have 3 outputs:
- Output 1: DB write message (only when tracking enabled)
- Output 2: State update message (always sent)
- Output 3: KPI trigger message (always sent for continuous updates)

The return statement should be:
return [dbMsg, stateMsg, kpiTrigger];

4. Request Defensive Code

Always ask for:

  • Null/undefined checks before accessing properties
  • Type validation for global variables
  • Initialization logic at the start of functions
  • Error handling for edge cases

Example:

// BAD (LLM might generate)
const buffer = global.get("kpiBuffer");
buffer.push(newValue);

// GOOD (what you should request)
let buffer = global.get("kpiBuffer");
if (!buffer || !Array.isArray(buffer)) {
  buffer = [];
}
buffer.push(newValue);
global.set("kpiBuffer", buffer);

5. Break Down Complex Changes

For Phase 3.1 (Continuous KPI Updates), ask in sequence:

  1. "Show me the current return statements in Machine Cycles function"
  2. "Modify the function to add a third output for KPI trigger"
  3. "Update all return statements to include kpiTrigger message"
  4. "Show me how to wire the third output to Calculate KPIs node"

6. Request Testing/Debugging Code

Ask LLM to include:

  • Debug logging: node.warn('[KPI] Buffer size: ' + buffer.length);
  • State validation: Check that variables have expected values
  • Error messages: Descriptive messages for troubleshooting

7. Validate Against Node-RED Constraints

Remind LLM of Node-RED specifics:

  • "This is a Node-RED function node, not regular JavaScript"
  • "Global context uses global.get/set, not regular variables"
  • "The msg object must be returned to send to next node"
  • "Use node.warn() for logging, not console.log()"

8. Phase-by-Phase Verification

After each LLM response:

  1. Verify the code matches the plan
  2. Check for initialization logic
  3. Confirm output structure matches wiring
  4. Ask: "What edge cases does this handle?"

9. Example: Perfect LLM Prompt for Phase 2.1

I need to implement KPI throttling with averaging in Node-RED.

Context:
- Function node: "Record KPI History"
- Input: msg.kpis object with {oee, availability, performance, quality}
- Output: Averaged KPI values sent to Graphs template (or null if not ready to record)

Global variables needed:
1. kpiBuffer (Array): Accumulates KPI snapshots. Initialize to [] if null.
2. lastKPIRecordTime (Number): Last timestamp when history was recorded. Initialize to (Date.now() - 60000) if null for immediate first recording.

Requirements:
- Accumulate incoming KPIs in kpiBuffer
- Every 60 seconds (60000ms), calculate average of all buffer values
- Send averaged KPIs to output
- Clear buffer after sending
- If less than 60 seconds since last record, return null (don't send)

Please write the complete function with:
- Robust initialization (check and set defaults)
- Debug logging (buffer size, time until next record)
- Comments explaining each section
- Edge case handling (empty buffer, first run)

10. Common Pitfalls to Avoid

  1. Assuming LLM knows your flow structure - Always describe node connections
  2. Not specifying Node-RED context - LLM might give generic JavaScript instead
  3. Requesting too many changes at once - Break into single-phase requests
  4. Forgetting to mention global variable persistence - Specify initialization needs
  5. Not asking for defensive code - Request null checks and type validation
  6. Vague success criteria - Define exactly what "working" means


Quick Reference: Key Code Snippets

1. Init Node (Run on Deploy)

// Initialize Global Variables - Inject Once on Deploy
node.warn('[INIT] Initializing global variables');

if (!global.get("kpiBuffer")) global.set("kpiBuffer", []);
if (!global.get("lastKPIRecordTime")) global.set("lastKPIRecordTime", Date.now() - 60000);
if (!global.get("lastMachineCycleTime")) global.set("lastMachineCycleTime", Date.now());
if (!global.get("lastKPIValues")) global.set("lastKPIValues", {});

node.warn('[INIT] Complete');
return msg;

2. Machine Cycles - Add to Final Return

// Update last machine cycle time when a successful cycle occurs
if (trackingEnabled && dbMsg) {
    global.set("lastMachineCycleTime", Date.now());
}
return [dbMsg, stateMsg, kpiTrigger];

3. Calculate KPIs - Multi-Source Guard

const trackingEnabled = global.get("trackingEnabled");
const activeOrder = global.get("activeOrder") || {};
if (!trackingEnabled || !activeOrder.id) return null;
// ... rest of calculation

4. Work Order START Button - Clear Buffer

if (action === "start-tracking") {
    global.set("trackingEnabled", true);
    global.set("kpiBuffer", []); // Clear stale data
    global.set("lastKPIRecordTime", Date.now() - 60000);
    // ... send state update
}

5. Graphs Template - Combined Init

let chartsInitialized = false;

scope.$watch('msg', function(msg) {
  if (msg && msg.payload && msg.payload.kpis && !chartsInitialized) {
    initFilters();
    createCharts(currentRange);
    chartsInitialized = true;
  }
  if (chartsInitialized && msg && msg.payload && msg.payload.kpis) {
    updateCharts(msg);
  }
});

setTimeout(() => {
  if (!chartsInitialized) {
    initFilters();
    createCharts(currentRange);
    chartsInitialized = true;
  }
}, 5000);

Final Notes

  1. Backup First: Always backup flows.json before starting each phase
  2. Test Incrementally: Don't skip testing between phases
  3. Document Changes: Note any deviations from plan
  4. Monitor Logs: Watch Node-RED debug output during testing
  5. Clear Cache: Browser cache can mask issues
  6. Use LLM Strategically: Follow the best practices above for precise, working code

If you encounter issues not covered in this plan, STOP and ask for help before proceeding.