mdares/Projects-plastic

Fork 0

Files

Marcelo b66cb97f16 MVP

2025-11-28 09:11:59 -06:00

42 KiB

Raw Blame History

OEE Dashboard Fix Plan

Comprehensive Strategy for Resolving All Issues

Executive Summary

We have identified 5 distinct issues affecting your OEE dashboard. This plan addresses each systematically, ordered by priority based on impact, risk, and dependencies.

Estimated Total Implementation Time: 2-3 hours Recommended Approach: Sequential implementation with testing between each phase

Key Improvements in This Updated Plan

This plan has been enhanced based on critical friction point analysis for Node-RED environments:

Global Context Persistence - Added robust initialization logic for all global variables to handle Node-RED restarts and deploys without data loss or spikes
State Synchronization (Push + Pull Model) - Enhanced START/STOP button state tracking with both push notifications AND pull requests to handle mid-production dashboard loads
Angular Timing Issues - Replaced brittle fixed timeouts with data-driven initialization and polling fallback for reliable chart loading across all system speeds
Dual-Path KPI Architecture - Implemented separate paths for live display (real-time, unthrottled) and historical graphs (averaged, smooth) to eliminate the stale-data vs jerky-graphs trade-off
Time-Based Availability Logic - Enhanced availability calculation with configurable time thresholds to distinguish brief pauses from legitimate shutdowns
LLM Implementation Guide - Added comprehensive best practices section for working with LLMs to implement this plan with precise, defensive code

Based on final review, these critical refinements have been integrated:

Clear Buffer on Production START - Prevents stale data from skewing averages if Node-RED restarts mid-production and context is restored from disk
Consolidated lastMachineCycleTime Updates - Now updated ONLY in Machine Cycles function (not Calculate KPIs) to maintain clean "machine pulse" signal, initialized to Date.now() on startup to prevent immediate 0% availability
Combined Initialization Strategy - Graphs now use BOTH data-driven initialization (fast when production is running) AND 5-second safety timeout (for idle machine scenarios)
Multi-Source KPI Calculation - Calculate KPIs now explicitly handles triggers from both Machine Cycles (continuous) and Scrap Submission (event-based) with proper guards
Complete Init Node - Added production-ready initialization function with all global variables (kpiBuffer, lastKPIRecordTime, lastMachineCycleTime, lastKPIValues) properly initialized with correct default values and logging

Issue Breakdown & Root Causes

Issue 1: KPI Updates Only on Scrap Submission

Symptom: KPIs stay static during production, only update when scrap is submitted or START/STOP clicked Root Cause:

Machine Cycles function has multiple return paths with [null, ...] outputs
Output to Calculate KPIs (output port 2) only happens in specific conditions
When trackingEnabled is false or no active order, KPI calculation is skipped
Critical line: if (!trackingEnabled) return [null, stateMsg]; prevents KPI updates

Sub-issue 1b: START/STOP Button State

Button state not persisting because UI doesn't track trackingEnabled global variable
Home template needs to watch for tracking state changes

Issue 2: Graphs Empty on First Load, Sidebar Broken

Symptom: Graphs tab shows blank, navigation doesn't work until refresh Root Causes:

Timing Issue: Charts created before Angular/scope is fully ready
Scope Isolation: scope.gotoTab might not be accessible immediately
Data Race: Charts created before first KPI data arrives

Why refresh works: Second load benefits from cached scope and existing data

Issue 3: Availability & OEE Drop to 0%

Symptom: Metrics incorrectly show 0% during active production Root Cause:

Calculate KPIs function has logic that sets availability to 0 when certain conditions aren't met
Need to verify: When does trackingEnabled check fail?
Hypothesis: When production is running but tracking flag isn't properly set, availability defaults to 0

Issue 4: Graph Updates Too Frequent/Jerky

Symptom: Data points recorded too often, causing choppy visualization Root Cause:

Record KPI History is called on EVERY Calculate KPIs output
With machine cycles happening every ~1 second, KPIs recorded every second
Need time-based throttling (1-minute intervals) instead of event-based recording

Issue 5: Time Range Filters Not Working

Symptom: Shift/Day/Week/Month/Year buttons don't change graph display Root Cause:

build(metric, range) function receives range parameter but ignores it
Function always returns ALL data from realtimeData[metric]
Need to filter data based on selected time range

Fix Plan - Phased Approach

PHASE 1: Low-Risk Quick Wins ⚡

Estimated Time: 30 minutes Risk Level: LOW

1.1 Fix Graph Filters (Issue 5)

Files: projects/Plastico/flows.json → Graphs Template

Changes:

// BEFORE
function build(metric, range){
  const arr = realtimeData[metric];
  if (!arr || arr.length === 0) return [];
  return arr.map(d=>({x:d.timestamp, y:d.value}));
}

// AFTER
function build(metric, range){
  const arr = realtimeData[metric];
  if (!arr || arr.length === 0) return [];

  // Calculate time cutoff based on range
  const now = Date.now();
  const cutoffs = {
    shift: 8 * 60 * 60 * 1000,      // 8 hours
    day: 24 * 60 * 60 * 1000,       // 24 hours
    week: 7 * 24 * 60 * 60 * 1000,  // 7 days
    month: 30 * 24 * 60 * 60 * 1000, // 30 days
    year: 365 * 24 * 60 * 60 * 1000  // 365 days
  };

  const cutoffTime = now - (cutoffs[range] || cutoffs.shift);

  // Filter data to selected time range
  return arr
    .filter(d => d.timestamp >= cutoffTime)
    .map(d => ({x: d.timestamp, y: d.value}));
}

Testing:

Click each filter button
Verify data range changes in charts
Check that no errors occur

Potential Issues:

If no data exists in selected range, chart might be empty (expected behavior)

Rollback: Easy - revert to original build() function

1.2 Fix Empty Graphs on First Load (Issue 2)

Files: projects/Plastico/flows.json → Graphs Template

Strategy: Use data-driven initialization instead of fixed timeout for reliability

Changes:

A) Combined Data-Driven + Safety Timeout (RECOMMENDED)

// BEFORE
setTimeout(()=>{
  initFilters();
  createCharts(currentRange);
},300);

// AFTER - Wait for first data message OR timeout
let chartsInitialized = false;

scope.$watch('msg', function(msg) {
  // Initialize on first KPI data arrival
  if (msg && msg.payload && msg.payload.kpis && !chartsInitialized) {
    // Scope and data are both ready
    initFilters();
    createCharts(currentRange);
    chartsInitialized = true;
    console.log('[Graphs] Charts initialized via data-driven approach');
  }

  // Update charts if already initialized
  if (chartsInitialized && msg && msg.payload && msg.payload.kpis) {
    updateCharts(msg);
  }
});

// ADDED: Safety timer for when machine is idle (no KPI messages flowing)
setTimeout(() => {
  if (!chartsInitialized) {
    console.warn('[Graphs] Charts initialized via safety timer (machine idle)');
    initFilters();
    createCharts(currentRange);
    chartsInitialized = true;
  }
}, 5000); // 5 seconds grace period for KPI messages

Why Both?

Data-driven: Ensures charts initialize as soon as data is available (fast, reliable)
Safety timeout: Handles "dashboard loaded but machine is idle" scenario (no KPI messages)
Together they cover both active production and idle machine scenarios

B) Fallback: Polling with timeout (if data-driven doesn't work)

function initWhenReady(attempts = 0) {
  const oeeEl = document.getElementById("chart-oee");
  const availEl = document.getElementById("chart-availability");

  if (oeeEl && availEl && scope.gotoTab) {
    // Both DOM and scope ready
    initFilters();
    createCharts(currentRange);
  } else if (attempts < 20) {
    // Retry every 100ms, max 2 seconds
    setTimeout(() => initWhenReady(attempts + 1), 100);
  } else {
    console.error("[Graphs] Failed to initialize charts after 2 seconds");
  }
}

// Start polling on load
initWhenReady();

C) Ensure scope.gotoTab is properly bound

// BEFORE
(function(scope){
  scope.gotoTab = t => scope.send({ui_control:{tab:t}});
})(scope);

// AFTER
(function(s){
  if (!s.gotoTab) {
    s.gotoTab = function(t) {
      s.send({ui_control: {tab: t}});
    };
  }
})(scope);

D) Add defensive chart creation with retry

function createCharts(range){
  // Ensure DOM elements exist
  const oeeEl = document.getElementById("chart-oee");
  const availEl = document.getElementById("chart-availability");

  if (!oeeEl || !availEl) {
    console.warn("[Graphs] Chart elements not ready, retrying...");
    setTimeout(() => createCharts(range), 200);
    return;
  }

  // ... rest of existing chart creation logic
}

Testing:

Clear browser cache
Navigate to Graphs tab from fresh load
Test sidebar navigation
Verify charts appear without refresh
Test on slow network/system

Potential Issues:

Data-driven approach requires KPI messages flowing
If no production running, charts won't initialize (add timeout fallback)

Recommended Implementation:

Start with data-driven approach (Option A)
Add polling fallback (Option B) as safety net
Implement defensive checks (Options C & D)

Rollback: Easy - revert to original setTimeout logic

PHASE 2: Medium-Risk Data Flow Improvements 🔧

Estimated Time: 45 minutes Risk Level: MEDIUM

2.1 Implement KPI Update Throttling with Dual-Path Architecture (Issue 4)

Files:

projects/Plastico/flows.json → Calculate KPIs function (add second output)
projects/Plastico/flows.json → Record KPI History function (add averaging)

Strategy: Dual-path updates solve the stale display vs jerky graphs trade-off

Path 1: Unthrottled live KPIs to Home Template for real-time display
Path 2: Throttled/averaged KPIs to Record History for smooth graphs

Part A: Modify Calculate KPIs to Output on Two Paths

// At the end of Calculate KPIs function

// Prepare the KPI message
const kpiMsg = {
  topic: "kpis",
  payload: {
    timestamp: Date.now(),
    kpis: {
      oee: msg.kpis.oee,
      availability: msg.kpis.availability,
      performance: msg.kpis.performance,
      quality: msg.kpis.quality
    }
  }
};

// Return to TWO outputs:
// Output 1: Live KPI to Home Template (real-time, unthrottled)
// Output 2: KPI to Record History (will be averaged/throttled)
return [
  kpiMsg,           // Path 1: Live display
  { ...kpiMsg }     // Path 2: History recording (clone to prevent mutation)
];

Wiring Changes:

Calculate KPIs node needs 2 outputs (add one more)
Output 1 → Home Template (existing connection)
Output 2 → Record KPI History (new connection)

Part B: Add Averaging Logic to Record KPI History

// Complete Record KPI History function with robust initialization

// ========== INITIALIZATION ==========
// Initialize buffer
let buffer = global.get("kpiBuffer");
if (!buffer || !Array.isArray(buffer)) {
  buffer = [];
  global.set("kpiBuffer", buffer);
  node.warn('[KPI History] Initialized kpiBuffer');
}

// Initialize last record time
let lastRecordTime = global.get("lastKPIRecordTime");
if (!lastRecordTime || typeof lastRecordTime !== 'number') {
  // Set to 1 minute ago to ensure immediate recording on startup
  lastRecordTime = Date.now() - 60000;
  global.set("lastKPIRecordTime", lastRecordTime);
  node.warn('[KPI History] Initialized lastKPIRecordTime');
}

// ========== ACCUMULATE ==========
const kpis = msg.payload.kpis;
if (!kpis) {
  node.warn('[KPI History] No KPIs in message, skipping');
  return null;
}

buffer.push({
  timestamp: Date.now(),
  oee: kpis.oee || 0,
  availability: kpis.availability || 0,
  performance: kpis.performance || 0,
  quality: kpis.quality || 0
});

// Prevent buffer from growing too large (safety limit)
if (buffer.length > 100) {
  buffer = buffer.slice(-60); // Keep last 60 entries
  node.warn('[KPI History] Buffer exceeded 100 entries, trimmed to 60');
}

global.set("kpiBuffer", buffer);

// ========== CHECK IF TIME TO RECORD ==========
const now = Date.now();
const timeSinceLastRecord = now - lastRecordTime;
const ONE_MINUTE = 60 * 1000;

if (timeSinceLastRecord < ONE_MINUTE) {
  // Not time to record yet
  const secondsRemaining = Math.ceil((ONE_MINUTE - timeSinceLastRecord) / 1000);
  // Debug log (can remove in production)
  // node.warn(`[KPI History] Buffer: ${buffer.length} entries, recording in ${secondsRemaining}s`);
  return null; // Don't send to charts yet
}

// ========== CALCULATE AVERAGES ==========
if (buffer.length === 0) {
  node.warn('[KPI History] Buffer empty at recording time, skipping');
  return null;
}

const avg = {
  oee: buffer.reduce((sum, d) => sum + d.oee, 0) / buffer.length,
  availability: buffer.reduce((sum, d) => sum + d.availability, 0) / buffer.length,
  performance: buffer.reduce((sum, d) => sum + d.performance, 0) / buffer.length,
  quality: buffer.reduce((sum, d) => sum + d.quality, 0) / buffer.length
};

node.warn(`[KPI History] Recording averaged KPIs from ${buffer.length} samples: OEE=${avg.oee.toFixed(1)}%`);

// ========== RECORD TO HISTORY ==========
// Update global state
global.set("lastKPIRecordTime", now);
global.set("kpiBuffer", []); // Clear buffer

// Send averaged values to graphs and database
return {
  topic: "kpi-history",
  payload: {
    timestamp: now,
    kpis: {
      oee: Math.round(avg.oee * 10) / 10,           // Round to 1 decimal
      availability: Math.round(avg.availability * 10) / 10,
      performance: Math.round(avg.performance * 10) / 10,
      quality: Math.round(avg.quality * 10) / 10
    },
    sampleCount: buffer.length  // Metadata for debugging
  }
};

Recommendation: This dual-path approach provides the best of both worlds

Testing:

Start production
Observe KPI update frequency in graphs
Verify updates occur approximately every 60 seconds
Check that no spikes/gaps appear in data

Potential Issues:

First data point might take up to 1 minute to appear
Rapid production changes might not be immediately visible
Buffer could grow large if production runs without recording

Mitigation:

Set buffer max size (e.g., 100 entries)
Force record on production stop/start

Rollback: Medium difficulty - remove throttling logic, clear global variables

PHASE 3: High-Risk Core Logic Fixes ⚠️

Estimated Time: 60 minutes Risk Level: HIGH

⚠️ CRITICAL: Backup flows.json before proceeding

3.1 Fix KPI Continuous Updates (Issue 1)

Files: projects/Plastico/flows.json → Machine Cycles function

Problem: Machine Cycles has multiple early returns that skip KPI calculation

Current Logic:

// Line ~36: No active order
if (!activeOrder || !activeOrder.id || cavities <= 0) {
    return [null, stateMsg];  // ❌ Skips KPI calculation
}

// Line ~43: Tracking not enabled
if (!trackingEnabled) {
    return [null, stateMsg];  // ❌ Skips KPI calculation
}

Solution Options:

Option A: Always Calculate KPIs (Recommended)

// Always prepare a message for Calculate KPIs on output 2
const kpiTrigger = { _triggerKPI: true };

// Change all returns to include kpiTrigger
if (!activeOrder || !activeOrder.id || cavities <= 0) {
    return [null, stateMsg, kpiTrigger];  // ✓ Triggers KPI calculation
}

if (!trackingEnabled) {
    return [null, stateMsg, kpiTrigger];  // ✓ Triggers KPI calculation
}

// Update last machine cycle time when a successful cycle occurs
// This is used for time-based availability logic
if (trackingEnabled && dbMsg) {
    // dbMsg being non-null implies a cycle was recorded
    global.set("lastMachineCycleTime", Date.now());
}

// ... final return
return [dbMsg, stateMsg, kpiTrigger];

Critical: The lastMachineCycleTime update must happen ONLY in Machine Cycles function to maintain a clean "machine pulse" signal separate from KPI calculation triggers.

Wire Configuration Change:

Add third output wire to Machine Cycles node
Connect output 3 → Calculate KPIs

Option B: Calculate KPIs in Parallel (Alternative)

Add an inject node that triggers Calculate KPIs every 5 seconds
Less coupled, but might calculate with stale data

Recommendation: Option A - ensures KPIs calculated with real-time data

Testing:

Start production with START button
Observe KPI values on Home page
Verify continuous updates (every ~1 second before throttling)
Check that scrap submission still works
Test production stop/start

Potential Issues:

Calculate KPIs might need to handle cases with no active order
Could calculate KPIs unnecessarily when machine is idle
Performance impact if calculating too frequently

Mitigation:

Add guards in Calculate KPIs to handle null/undefined inputs
Implement Phase 2 throttling first to reduce calculation frequency
Monitor system performance

CRITICAL: Calculate KPIs Multi-Source Handling

The Calculate KPIs function will now receive triggers from TWO sources:

Machine Cycles (continuous, real-time) - via new output 3
Scrap Submission (event-based) - existing connection

Required Change in Calculate KPIs:

// At the start of Calculate KPIs function
// Must handle both trigger types

// The function should execute regardless of message content
// as long as it receives ANY trigger

const trackingEnabled = global.get("trackingEnabled");
const activeOrder = global.get("activeOrder") || {};
const productionStartTime = global.get("productionStartTime");

// Guard against missing critical data
if (!trackingEnabled || !activeOrder.id) {
  // Can't calculate meaningful KPIs without tracking or active order
  // But don't error - just skip calculation
  return null;
}

// ... rest of existing KPI calculation logic
// This logic will now run for BOTH continuous and event-based triggers

This ensures availability and OEE calculations work correctly whether triggered by machine cycles or scrap submission.

Side Effects:

Will trigger Issue 4 more severely → MUST implement Phase 2 throttling first
Database might receive more frequent updates
Global variables will change more often

Rollback: Medium difficulty - requires restoring original return statements and wire configuration

3.2 Fix Availability/OEE Drops to 0 (Issue 3)

Files: projects/Plastico/flows.json → Calculate KPIs function

Investigation Steps:

Read full Calculate KPIs function
Identify all paths that set msg.kpis.availability = 0
Add logging to track when this occurs
Understand state flow: trackingEnabled, productionStartTime, operatingTime

Hypothesis Testing:

// Add debug logging at the start
node.warn(`[KPI] trackingEnabled=${trackingEnabled}, startTime=${productionStartTime}, opTime=${operatingTime}`);

// Before setting availability to 0
if (/* condition that causes 0 */) {
    node.warn(`[KPI] Setting availability to 0 because: [reason]`);
    msg.kpis.availability = 0;
}

Likely Fix:

// BEFORE
} else {
    msg.kpis.availability = 0; // Not running
}

// AFTER
} else {
    // Check if production was recently active
    const prev = global.get("lastKPIValues") || {};
    if (prev.availability > 0 && operatingTime > 0) {
        // Maintain last availability if we have operating time
        msg.kpis.availability = prev.availability;
    } else {
        msg.kpis.availability = 0;
    }
}

// Store KPIs for next iteration
global.set("lastKPIValues", msg.kpis);

Testing:

Start production
Monitor availability values
Trigger scrap prompt
Verify availability doesn't drop to 0
Check OEE calculation

Potential Issues:

Might mask legitimate 0% availability (machine actually stopped)
Could create artificially high availability readings
State persistence might cause issues after restart

Mitigation:

Add clear conditions for when availability should legitimately be 0
Reset lastKPIValues on work order completion
Add production state tracking

Rollback: Easy if logging added first - can revert based on log analysis

3.3 Fix START/STOP Button State (Issue 1b)

Files: projects/Plastico/flows.json → Home Template

Problem: Button doesn't show correct state (STOP when production running)

Investigation:

Find button rendering logic in Home template
Check how trackingEnabled or productionStarted is tracked
Verify message handler receives state updates

Changes:

// In Home Template scope.$watch
if (msg.topic === 'machineStatus') {
  window.machineOnline = msg.payload.machineOnline;
  window.productionStarted = msg.payload.productionStarted;

  // NEW: Track tracking state for button display
  window.trackingEnabled = msg.payload.trackingEnabled || window.productionStarted;

  scope.renderDashboard();
  return;
}

Button HTML Update:

<!-- BEFORE -->
<button ng-click="handleStart()">START</button>

<!-- AFTER -->
<button ng-click="handleStart()" ng-show="!trackingEnabled">START</button>
<button ng-click="handleStop()" ng-show="trackingEnabled" class="stop-btn">STOP</button>

Backend Update (Work Order buttons):

// When START clicked, also set trackingEnabled flag
if (action === "start-tracking") {
    global.set("trackingEnabled", true);

    // CRITICAL: Clear KPI buffer on production start
    // Prevents stale data from skewing averages if Node-RED was restarted mid-production
    global.set("kpiBuffer", []);
    node.warn('[START] Cleared kpiBuffer for fresh production run');

    // Optional: Reset last record time to ensure immediate data point
    global.set("lastKPIRecordTime", Date.now() - 60000);

    // Send state update to UI
    const stateMsg = {
        topic: "machineStatus",
        payload: {
            machineOnline: true,
            productionStarted: true,
            trackingEnabled: true
        }
    };
    // ... send stateMsg to Home template
}

Why Clear Buffer on START: If Node-RED restarts during a production run and context is restored from disk, the kpiBuffer might contain stale data from before the restart. When production resumes, new data would be mixed with old data, skewing the averages. Clearing on START ensures a clean slate for each production session.

Testing:

Load dashboard
Start work order
Verify START button changes to STOP
Click STOP (if implemented)
Verify button changes back to START

Potential Issues:

Need to implement STOP button handler if it doesn't exist
State sync between backend and frontend
Button might flicker during state transitions

Rollback: Easy - remove button visibility conditions

Implementation Order & Dependencies

Recommended Sequence:

Phase 1.1 - Fix Filters (Independent, low risk)
Phase 1.2 - Fix Empty Graphs (Independent, low risk)
Phase 2.1 - Add Throttling (Required before Phase 3.1)
Phase 3.2 - Fix Availability Calculation (Add logging first)
Phase 3.1 - Fix Continuous KPI Updates (Depends on throttling)
Phase 3.3 - Fix Button State (Can be done anytime)

Why This Order?

Quick wins first - Build confidence, improve UX immediately
Throttling before continuous updates - Prevent performance issues
Logging before logic changes - Understand problem before fixing
Independent fixes can run parallel - Save time

Testing Strategy

Per-Phase Testing:

Test each phase independently
Don't proceed to next phase if current fails
Keep backup of working state

Integration Testing (After All Phases):

Fresh Start Test
- Clear browser cache
- Restart Node-RED
- Load dashboard
- Navigate all tabs
Production Cycle Test
- Start new work order
- Click START
- Let run for 2-3 minutes
- Submit scrap
- Verify KPIs update
- Check graphs show data
- Test time filters
State Persistence Test
- Refresh page during production
- Verify state restores correctly
- Check button shows STOP if running
Edge Cases
- No active work order
- Machine offline
- Zero production time
- Rapid start/stop

Rollback Plan

Per-Phase Rollback:

Each phase documents its rollback procedure. In general:

Stop Node-RED

Restore flows.json from backup

cp projects/Plastico/flows.json.backup projects/Plastico/flows.json

Clear global context (if needed)

// In a debug node
global.set("lastKPIRecordTime", null);
global.set("kpiBuffer", null);
global.set("lastKPIValues", null);

Restart Node-RED
Clear browser cache

Emergency Full Rollback:

# Restore from most recent backup
cp projects/Plastico/Respaldo_MVP_Complete_11_23_25.json projects/Plastico/flows.json
# Restart Node-RED
node-red-restart

Potential Roadblocks & Mitigations

Roadblock 1: Global Context Persistence on Deploy/Restart ⚠️ CRITICAL

Symptom: After Node-RED restart or deploy, throttling/averaging/availability logic breaks or shows incorrect data Root Cause: Global variables (lastKPIRecordTime, kpiBuffer, lastKPIValues, trackingEnabled) may be reset or restored from file/memory store depending on settings.js configuration

Mitigation:

Add Robust Initialization Logic:

// In Record KPI History function - ALWAYS check and initialize
let buffer = global.get("kpiBuffer");
if (!buffer || !Array.isArray(buffer)) {
  buffer = [];
  global.set("kpiBuffer", buffer);
}

let lastRecordTime = global.get("lastKPIRecordTime");
if (!lastRecordTime || typeof lastRecordTime !== 'number') {
  // Set to 1 minute ago to ensure immediate recording on startup
  lastRecordTime = Date.now() - 60000;
  global.set("lastKPIRecordTime", lastRecordTime);
}

Create an Init Node:
- Add a dedicated "Initialize Global Variables" function node
- Trigger on deploy using an inject node (inject once, delay 0)
- Wire to all critical nodes to ensure state is set before first execution

Complete Init Node Code:

// Initialize Global Variables - Run on Deploy
node.warn('[INIT] Initializing global variables');

// KPI Buffer for averaging
if (!global.get("kpiBuffer")) {
  global.set("kpiBuffer", []);
  node.warn('[INIT] Set kpiBuffer to []');
}

// Last KPI record time - set to 1 min ago for immediate first record
if (!global.get("lastKPIRecordTime")) {
  global.set("lastKPIRecordTime", Date.now() - 60000);
  node.warn('[INIT] Set lastKPIRecordTime');
}

// Last machine cycle time - set to now to prevent immediate 0% availability
if (!global.get("lastMachineCycleTime")) {
  global.set("lastMachineCycleTime", Date.now());
  node.warn('[INIT] Set lastMachineCycleTime to prevent 0% availability on startup');
}

// Last KPI values
if (!global.get("lastKPIValues")) {
  global.set("lastKPIValues", {});
  node.warn('[INIT] Set lastKPIValues to {}');
}

node.warn('[INIT] Global variable initialization complete');
return msg;

Check settings.js:
- Verify contextStorage configuration
- Consider using file storage for persistence if using memory (default)

Testing:

Deploy changes multiple times
Restart Node-RED
Verify variables persist/initialize correctly
Check debug logs for initialization messages

Roadblock 2: State Sync Between Flow and Dashboard (Push vs Pull Model)

Symptom: START/STOP button shows wrong state when user loads dashboard mid-production Root Cause: Relying on push model (messages sent during state changes) - if user loads page after tracking started, initial message is missed

Mitigation:

Add Pull Mechanism in Home Template:

// In Home Template initialization
(function(scope) {
  // Request current state on load
  scope.send({
    topic: "requestState",
    payload: {}
  });

  // Handle state response
  scope.$watch('msg', function(msg) {
    if (msg && msg.topic === 'currentState') {
      window.trackingEnabled = msg.payload.trackingEnabled;
      window.productionStarted = msg.payload.productionStarted;
      window.machineOnline = msg.payload.machineOnline;
      scope.renderDashboard();
    }
    // ... rest of watch logic
  });
})(scope);

Add State Response Handler:
- Create function node that listens for requestState topic
- Responds with current global state values
- Wire to Home template

Testing:

Start production
Open dashboard in new browser tab
Verify button shows STOP immediately
Test with multiple browser sessions

Roadblock 3: UI/Angular Timing Races in ui-template ⚠️ HIGH IMPACT

Symptom: Charts sometimes load, sometimes don't - fixed timeout (500ms) is unreliable on slow systems or complex templates Root Cause: Node-RED Dashboard uses AngularJS - digest cycle and DOM rendering timing is unpredictable

Mitigation Option A - Data-Driven Initialization (RECOMMENDED):

// Instead of fixed timeout, wait for first data
let chartsInitialized = false;

scope.$watch('msg', function(msg) {
  if (msg && msg.kpis && !chartsInitialized) {
    // First data arrived, scope is ready
    initFilters();
    createCharts(currentRange);
    chartsInitialized = true;
  }

  if (chartsInitialized && msg && msg.kpis) {
    updateCharts(msg);
  }
});

Mitigation Option B - Angular Lifecycle Hook:

// Hook into Angular's ready state
scope.$applyAsync(function() {
  // DOM and scope guaranteed ready
  initFilters();
  createCharts(currentRange);
});

Mitigation Option C - Polling with Timeout:

function initWhenReady(attempts = 0) {
  const oeeEl = document.getElementById("chart-oee");

  if (oeeEl && scope.gotoTab) {
    // Both DOM and scope ready
    initFilters();
    createCharts(currentRange);
  } else if (attempts < 20) {
    // Retry every 100ms, max 2 seconds
    setTimeout(() => initWhenReady(attempts + 1), 100);
  } else {
    console.error("Failed to initialize charts after 2 seconds");
  }
}

// Start polling
initWhenReady();

Recommendation: Use Option A for most reliable results

Roadblock 4: Throttling vs Live Display Trade-off

Symptom: With averaging, displayed KPIs are stale (up to 59 seconds old), but without averaging, graphs are jerky Root Cause: OEE is a real-time snapshot - averaging smooths graphs but delays live feedback

Solution: Dual-Path KPI Updates

Architecture:

Path 1 (Live): Machine Cycles → Calculate KPIs → Home Template (no throttling)
Path 2 (History): Machine Cycles → Calculate KPIs → Averaging Buffer → Record History (throttled to 1 min)

Implementation:

// In Calculate KPIs function - send to TWO outputs
return [
  msg,              // Output 1: Live KPI to Home Template (unthrottled)
  { ...msg }        // Output 2: KPI to History (will be throttled)
];

In Record KPI History - add averaging logic:

// Only this node has averaging/throttling
let buffer = global.get("kpiBuffer") || [];
buffer.push({
  timestamp: Date.now(),
  oee: msg.kpis.oee,
  availability: msg.kpis.availability,
  performance: msg.kpis.performance,
  quality: msg.kpis.quality
});

const lastRecord = global.get("lastKPIRecordTime") || 0;
const now = Date.now();

if (now - lastRecord >= 60000) {
  // Average the buffer
  const avg = {
    oee: buffer.reduce((sum, d) => sum + d.oee, 0) / buffer.length,
    // ... other metrics
  };

  // Record averaged values to history
  // Send to Graphs template
  global.set("lastKPIRecordTime", now);
  global.set("kpiBuffer", []);
  return { kpis: avg };
} else {
  global.set("kpiBuffer", buffer);
  return null; // Don't record yet
}

Benefits:

Live display always shows current OEE
Graphs are smooth with averaged data
No UX compromise

Roadblock 5: Availability 0% Logic Too Simplistic

Symptom: Availability drops to 0% during brief pauses (scrap submission) but also might NOT drop to 0% during legitimate stops (breaks, maintenance) Root Cause: Using previous value without time-based threshold can't distinguish brief interruption from actual shutdown

Improved Logic:

// In Calculate KPIs function
const now = Date.now();
const lastCycleTime = global.get("lastMachineCycleTime") || now;
const timeSinceLastCycle = now - lastCycleTime;

const BRIEF_PAUSE_THRESHOLD = 5 * 60 * 1000; // 5 minutes

if (!trackingEnabled || timeSinceLastCycle > BRIEF_PAUSE_THRESHOLD) {
  // Legitimately stopped or long pause
  msg.kpis.availability = 0;
  global.set("lastKPIValues", null); // Clear history
} else if (operatingTime > 0) {
  // Calculate normally
  msg.kpis.availability = calculateAvailability(operatingTime, plannedTime);
  global.set("lastKPIValues", msg.kpis);
} else {
  // Brief pause - maintain last known value
  const prev = global.get("lastKPIValues") || {};
  msg.kpis.availability = prev.availability || 0;
}

// NOTE: lastMachineCycleTime is updated in Machine Cycles function ONLY
// This keeps the "machine pulse" signal clean and separate from KPI calculation

Configuration:

Adjust BRIEF_PAUSE_THRESHOLD based on your production environment
Consider making it configurable via dashboard setting

Roadblock 6: KPI Calculation Performance

Symptom: System slow after implementing continuous KPI updates Mitigation:

Implement Phase 2 throttling FIRST (now with dual-path approach)
Ensure Calculate KPIs has guards for null/undefined inputs
Profile Calculate KPIs function for optimization
Monitor Node-RED CPU usage during production

Roadblock 7: Browser Cache Issues

Symptom: Changes don't appear after deployment Mitigation:

Clear browser cache during testing (Ctrl+Shift+R / Cmd+Shift+R)
Add cache-busting version to template (optional):

// In template header
<!-- Version: 1.1 - {{Date.now()}} -->

Use incognito/private browsing for testing
Test on different browsers/devices

Success Criteria

Phase 1:

✅ Time filters change graph display correctly
✅ Graphs load on first visit without refresh
✅ Sidebar navigation works immediately

Phase 2:

✅ Graph updates occur at ~1 minute intervals
✅ Graphs are smooth, not jerky
✅ No performance degradation

Phase 3:

✅ KPIs update continuously during production
✅ Availability never incorrectly shows 0%
✅ START button shows STOP when production running
✅ OEE calculation is accurate

Integration:

✅ All features work together without conflicts
✅ No console errors
✅ Production tracking works end-to-end
✅ Data persists correctly

Estimated Timeline

Phase	Task	Time	Cumulative
1.1	Fix Filters	15 min	15 min
1.2	Fix Empty Graphs	15 min	30 min
2.1	Add Throttling	45 min	1h 15m
3.2	Fix Availability (with logging)	30 min	1h 45m
3.1	Fix Continuous Updates	30 min	2h 15m
3.3	Fix Button State	20 min	2h 35m
Testing	Integration Testing	30 min	3h 5m

Total: ~3 hours (assuming no major roadblocks)

Best Practices for LLM-Assisted Implementation

When working with an LLM to implement this plan, use these strategies for best results:

1. Isolate Logic Focus (Function Node Precision)

DO:

Ask for specific function node code: "Write the Record KPI History function with averaging logic including global.get initialization"
Provide exact input/output requirements: "This function receives msg.kpis object and must return msg or null"
Request one change at a time

DON'T:

Ask vague questions like "fix my dashboard"
Request multiple phase changes in one prompt
Assume LLM knows your flow structure

2. Explicitly Define Global Variables

Template for LLM prompts:

Global variable: kpiBuffer
Type: Array of objects
Structure: [{timestamp: number, oee: number, availability: number, performance: number, quality: number}]
Lifecycle: Initialized to [] if null, cleared after recording to history
Purpose: Accumulates KPI values for 1-minute averaging

Always specify:

Variable name
Data type
Default/initial value
When it's read/written
When it should be cleared

3. Specify Node-RED Input/Output Requirements

Example prompt:

The Machine Cycles function node must have 3 outputs:
- Output 1: DB write message (only when tracking enabled)
- Output 2: State update message (always sent)
- Output 3: KPI trigger message (always sent for continuous updates)

The return statement should be:
return [dbMsg, stateMsg, kpiTrigger];

4. Request Defensive Code

Always ask for:

Null/undefined checks before accessing properties
Type validation for global variables
Initialization logic at the start of functions
Error handling for edge cases

Example:

// BAD (LLM might generate)
const buffer = global.get("kpiBuffer");
buffer.push(newValue);

// GOOD (what you should request)
let buffer = global.get("kpiBuffer");
if (!buffer || !Array.isArray(buffer)) {
  buffer = [];
}
buffer.push(newValue);
global.set("kpiBuffer", buffer);

5. Break Down Complex Changes

For Phase 3.1 (Continuous KPI Updates), ask in sequence:

"Show me the current return statements in Machine Cycles function"
"Modify the function to add a third output for KPI trigger"
"Update all return statements to include kpiTrigger message"
"Show me how to wire the third output to Calculate KPIs node"

6. Request Testing/Debugging Code

Ask LLM to include:

Debug logging: node.warn('[KPI] Buffer size: ' + buffer.length);
State validation: Check that variables have expected values
Error messages: Descriptive messages for troubleshooting

7. Validate Against Node-RED Constraints

Remind LLM of Node-RED specifics:

"This is a Node-RED function node, not regular JavaScript"
"Global context uses global.get/set, not regular variables"
"The msg object must be returned to send to next node"
"Use node.warn() for logging, not console.log()"

8. Phase-by-Phase Verification

After each LLM response:

Verify the code matches the plan
Check for initialization logic
Confirm output structure matches wiring
Ask: "What edge cases does this handle?"

9. Example: Perfect LLM Prompt for Phase 2.1

I need to implement KPI throttling with averaging in Node-RED.

Context:
- Function node: "Record KPI History"
- Input: msg.kpis object with {oee, availability, performance, quality}
- Output: Averaged KPI values sent to Graphs template (or null if not ready to record)

Global variables needed:
1. kpiBuffer (Array): Accumulates KPI snapshots. Initialize to [] if null.
2. lastKPIRecordTime (Number): Last timestamp when history was recorded. Initialize to (Date.now() - 60000) if null for immediate first recording.

Requirements:
- Accumulate incoming KPIs in kpiBuffer
- Every 60 seconds (60000ms), calculate average of all buffer values
- Send averaged KPIs to output
- Clear buffer after sending
- If less than 60 seconds since last record, return null (don't send)

Please write the complete function with:
- Robust initialization (check and set defaults)
- Debug logging (buffer size, time until next record)
- Comments explaining each section
- Edge case handling (empty buffer, first run)

10. Common Pitfalls to Avoid

Assuming LLM knows your flow structure - Always describe node connections
Not specifying Node-RED context - LLM might give generic JavaScript instead
Requesting too many changes at once - Break into single-phase requests
Forgetting to mention global variable persistence - Specify initialization needs
Not asking for defensive code - Request null checks and type validation
Vague success criteria - Define exactly what "working" means

Quick Reference: Key Code Snippets

1. Init Node (Run on Deploy)

// Initialize Global Variables - Inject Once on Deploy
node.warn('[INIT] Initializing global variables');

if (!global.get("kpiBuffer")) global.set("kpiBuffer", []);
if (!global.get("lastKPIRecordTime")) global.set("lastKPIRecordTime", Date.now() - 60000);
if (!global.get("lastMachineCycleTime")) global.set("lastMachineCycleTime", Date.now());
if (!global.get("lastKPIValues")) global.set("lastKPIValues", {});

node.warn('[INIT] Complete');
return msg;

2. Machine Cycles - Add to Final Return

// Update last machine cycle time when a successful cycle occurs
if (trackingEnabled && dbMsg) {
    global.set("lastMachineCycleTime", Date.now());
}
return [dbMsg, stateMsg, kpiTrigger];

3. Calculate KPIs - Multi-Source Guard

const trackingEnabled = global.get("trackingEnabled");
const activeOrder = global.get("activeOrder") || {};
if (!trackingEnabled || !activeOrder.id) return null;
// ... rest of calculation

4. Work Order START Button - Clear Buffer

if (action === "start-tracking") {
    global.set("trackingEnabled", true);
    global.set("kpiBuffer", []); // Clear stale data
    global.set("lastKPIRecordTime", Date.now() - 60000);
    // ... send state update
}

5. Graphs Template - Combined Init

let chartsInitialized = false;

scope.$watch('msg', function(msg) {
  if (msg && msg.payload && msg.payload.kpis && !chartsInitialized) {
    initFilters();
    createCharts(currentRange);
    chartsInitialized = true;
  }
  if (chartsInitialized && msg && msg.payload && msg.payload.kpis) {
    updateCharts(msg);
  }
});

setTimeout(() => {
  if (!chartsInitialized) {
    initFilters();
    createCharts(currentRange);
    chartsInitialized = true;
  }
}, 5000);

Final Notes

Backup First: Always backup flows.json before starting each phase
Test Incrementally: Don't skip testing between phases
Document Changes: Note any deviations from plan
Monitor Logs: Watch Node-RED debug output during testing
Clear Cache: Browser cache can mask issues
Use LLM Strategically: Follow the best practices above for precise, working code

If you encounter issues not covered in this plan, STOP and ask for help before proceeding.

42 KiB Raw Blame History

OEE Dashboard Fix Plan

Comprehensive Strategy for Resolving All Issues

Executive Summary

Key Improvements in This Updated Plan

Critical Refinements (Final Review)

Issue Breakdown & Root Causes

Issue 1: KPI Updates Only on Scrap Submission

Issue 2: Graphs Empty on First Load, Sidebar Broken

Issue 3: Availability & OEE Drop to 0%

Issue 4: Graph Updates Too Frequent/Jerky

Issue 5: Time Range Filters Not Working

Fix Plan - Phased Approach

PHASE 1: Low-Risk Quick Wins ⚡

1.1 Fix Graph Filters (Issue 5)

1.2 Fix Empty Graphs on First Load (Issue 2)

PHASE 2: Medium-Risk Data Flow Improvements 🔧

2.1 Implement KPI Update Throttling with Dual-Path Architecture (Issue 4)

PHASE 3: High-Risk Core Logic Fixes ⚠️

3.1 Fix KPI Continuous Updates (Issue 1)

3.2 Fix Availability/OEE Drops to 0 (Issue 3)

3.3 Fix START/STOP Button State (Issue 1b)

Implementation Order & Dependencies

Recommended Sequence:

Why This Order?

Testing Strategy

Per-Phase Testing:

Integration Testing (After All Phases):

Rollback Plan

Per-Phase Rollback:

Emergency Full Rollback:

Potential Roadblocks & Mitigations

Roadblock 1: Global Context Persistence on Deploy/Restart ⚠️ CRITICAL

Roadblock 2: State Sync Between Flow and Dashboard (Push vs Pull Model)

Roadblock 3: UI/Angular Timing Races in ui-template ⚠️ HIGH IMPACT

Roadblock 4: Throttling vs Live Display Trade-off

Roadblock 5: Availability 0% Logic Too Simplistic

Roadblock 6: KPI Calculation Performance

Roadblock 7: Browser Cache Issues

Success Criteria

Phase 1:

Phase 2:

Phase 3:

Integration:

Estimated Timeline

Best Practices for LLM-Assisted Implementation

1. Isolate Logic Focus (Function Node Precision)

2. Explicitly Define Global Variables

3. Specify Node-RED Input/Output Requirements

4. Request Defensive Code

5. Break Down Complex Changes

6. Request Testing/Debugging Code

7. Validate Against Node-RED Constraints

8. Phase-by-Phase Verification

9. Example: Perfect LLM Prompt for Phase 2.1

10. Common Pitfalls to Avoid

Quick Reference: Key Code Snippets

1. Init Node (Run on Deploy)

2. Machine Cycles - Add to Final Return

3. Calculate KPIs - Multi-Source Guard

4. Work Order START Button - Clear Buffer

5. Graphs Template - Combined Init

Final Notes

42 KiB

Raw Blame History