42 KiB
OEE Dashboard Fix Plan
Comprehensive Strategy for Resolving All Issues
Executive Summary
We have identified 5 distinct issues affecting your OEE dashboard. This plan addresses each systematically, ordered by priority based on impact, risk, and dependencies.
Estimated Total Implementation Time: 2-3 hours Recommended Approach: Sequential implementation with testing between each phase
Key Improvements in This Updated Plan
This plan has been enhanced based on critical friction point analysis for Node-RED environments:
-
Global Context Persistence - Added robust initialization logic for all global variables to handle Node-RED restarts and deploys without data loss or spikes
-
State Synchronization (Push + Pull Model) - Enhanced START/STOP button state tracking with both push notifications AND pull requests to handle mid-production dashboard loads
-
Angular Timing Issues - Replaced brittle fixed timeouts with data-driven initialization and polling fallback for reliable chart loading across all system speeds
-
Dual-Path KPI Architecture - Implemented separate paths for live display (real-time, unthrottled) and historical graphs (averaged, smooth) to eliminate the stale-data vs jerky-graphs trade-off
-
Time-Based Availability Logic - Enhanced availability calculation with configurable time thresholds to distinguish brief pauses from legitimate shutdowns
-
LLM Implementation Guide - Added comprehensive best practices section for working with LLMs to implement this plan with precise, defensive code
Critical Refinements (Final Review)
Based on final review, these critical refinements have been integrated:
-
Clear Buffer on Production START - Prevents stale data from skewing averages if Node-RED restarts mid-production and context is restored from disk
-
Consolidated lastMachineCycleTime Updates - Now updated ONLY in Machine Cycles function (not Calculate KPIs) to maintain clean "machine pulse" signal, initialized to
Date.now()on startup to prevent immediate 0% availability -
Combined Initialization Strategy - Graphs now use BOTH data-driven initialization (fast when production is running) AND 5-second safety timeout (for idle machine scenarios)
-
Multi-Source KPI Calculation - Calculate KPIs now explicitly handles triggers from both Machine Cycles (continuous) and Scrap Submission (event-based) with proper guards
-
Complete Init Node - Added production-ready initialization function with all global variables (
kpiBuffer,lastKPIRecordTime,lastMachineCycleTime,lastKPIValues) properly initialized with correct default values and logging
Issue Breakdown & Root Causes
Issue 1: KPI Updates Only on Scrap Submission
Symptom: KPIs stay static during production, only update when scrap is submitted or START/STOP clicked Root Cause:
- Machine Cycles function has multiple return paths with
[null, ...]outputs - Output to Calculate KPIs (output port 2) only happens in specific conditions
- When
trackingEnabledis false or no active order, KPI calculation is skipped - Critical line:
if (!trackingEnabled) return [null, stateMsg];prevents KPI updates
Sub-issue 1b: START/STOP Button State
- Button state not persisting because UI doesn't track
trackingEnabledglobal variable - Home template needs to watch for tracking state changes
Issue 2: Graphs Empty on First Load, Sidebar Broken
Symptom: Graphs tab shows blank, navigation doesn't work until refresh Root Causes:
- Timing Issue: Charts created before Angular/scope is fully ready
- Scope Isolation:
scope.gotoTabmight not be accessible immediately - Data Race: Charts created before first KPI data arrives
Why refresh works: Second load benefits from cached scope and existing data
Issue 3: Availability & OEE Drop to 0%
Symptom: Metrics incorrectly show 0% during active production Root Cause:
- Calculate KPIs function has logic that sets availability to 0 when certain conditions aren't met
- Need to verify: When does
trackingEnabledcheck fail? - Hypothesis: When production is running but tracking flag isn't properly set, availability defaults to 0
Issue 4: Graph Updates Too Frequent/Jerky
Symptom: Data points recorded too often, causing choppy visualization Root Cause:
- Record KPI History is called on EVERY Calculate KPIs output
- With machine cycles happening every ~1 second, KPIs recorded every second
- Need time-based throttling (1-minute intervals) instead of event-based recording
Issue 5: Time Range Filters Not Working
Symptom: Shift/Day/Week/Month/Year buttons don't change graph display Root Cause:
build(metric, range)function receives range parameter but ignores it- Function always returns ALL data from
realtimeData[metric] - Need to filter data based on selected time range
Fix Plan - Phased Approach
PHASE 1: Low-Risk Quick Wins ⚡
Estimated Time: 30 minutes Risk Level: LOW
1.1 Fix Graph Filters (Issue 5)
Files: projects/Plastico/flows.json → Graphs Template
Changes:
// BEFORE
function build(metric, range){
const arr = realtimeData[metric];
if (!arr || arr.length === 0) return [];
return arr.map(d=>({x:d.timestamp, y:d.value}));
}
// AFTER
function build(metric, range){
const arr = realtimeData[metric];
if (!arr || arr.length === 0) return [];
// Calculate time cutoff based on range
const now = Date.now();
const cutoffs = {
shift: 8 * 60 * 60 * 1000, // 8 hours
day: 24 * 60 * 60 * 1000, // 24 hours
week: 7 * 24 * 60 * 60 * 1000, // 7 days
month: 30 * 24 * 60 * 60 * 1000, // 30 days
year: 365 * 24 * 60 * 60 * 1000 // 365 days
};
const cutoffTime = now - (cutoffs[range] || cutoffs.shift);
// Filter data to selected time range
return arr
.filter(d => d.timestamp >= cutoffTime)
.map(d => ({x: d.timestamp, y: d.value}));
}
Testing:
- Click each filter button
- Verify data range changes in charts
- Check that no errors occur
Potential Issues:
- If no data exists in selected range, chart might be empty (expected behavior)
Rollback: Easy - revert to original build() function
1.2 Fix Empty Graphs on First Load (Issue 2)
Files: projects/Plastico/flows.json → Graphs Template
Strategy: Use data-driven initialization instead of fixed timeout for reliability
Changes:
A) Combined Data-Driven + Safety Timeout (RECOMMENDED)
// BEFORE
setTimeout(()=>{
initFilters();
createCharts(currentRange);
},300);
// AFTER - Wait for first data message OR timeout
let chartsInitialized = false;
scope.$watch('msg', function(msg) {
// Initialize on first KPI data arrival
if (msg && msg.payload && msg.payload.kpis && !chartsInitialized) {
// Scope and data are both ready
initFilters();
createCharts(currentRange);
chartsInitialized = true;
console.log('[Graphs] Charts initialized via data-driven approach');
}
// Update charts if already initialized
if (chartsInitialized && msg && msg.payload && msg.payload.kpis) {
updateCharts(msg);
}
});
// ADDED: Safety timer for when machine is idle (no KPI messages flowing)
setTimeout(() => {
if (!chartsInitialized) {
console.warn('[Graphs] Charts initialized via safety timer (machine idle)');
initFilters();
createCharts(currentRange);
chartsInitialized = true;
}
}, 5000); // 5 seconds grace period for KPI messages
Why Both?
- Data-driven: Ensures charts initialize as soon as data is available (fast, reliable)
- Safety timeout: Handles "dashboard loaded but machine is idle" scenario (no KPI messages)
- Together they cover both active production and idle machine scenarios
B) Fallback: Polling with timeout (if data-driven doesn't work)
function initWhenReady(attempts = 0) {
const oeeEl = document.getElementById("chart-oee");
const availEl = document.getElementById("chart-availability");
if (oeeEl && availEl && scope.gotoTab) {
// Both DOM and scope ready
initFilters();
createCharts(currentRange);
} else if (attempts < 20) {
// Retry every 100ms, max 2 seconds
setTimeout(() => initWhenReady(attempts + 1), 100);
} else {
console.error("[Graphs] Failed to initialize charts after 2 seconds");
}
}
// Start polling on load
initWhenReady();
C) Ensure scope.gotoTab is properly bound
// BEFORE
(function(scope){
scope.gotoTab = t => scope.send({ui_control:{tab:t}});
})(scope);
// AFTER
(function(s){
if (!s.gotoTab) {
s.gotoTab = function(t) {
s.send({ui_control: {tab: t}});
};
}
})(scope);
D) Add defensive chart creation with retry
function createCharts(range){
// Ensure DOM elements exist
const oeeEl = document.getElementById("chart-oee");
const availEl = document.getElementById("chart-availability");
if (!oeeEl || !availEl) {
console.warn("[Graphs] Chart elements not ready, retrying...");
setTimeout(() => createCharts(range), 200);
return;
}
// ... rest of existing chart creation logic
}
Testing:
- Clear browser cache
- Navigate to Graphs tab from fresh load
- Test sidebar navigation
- Verify charts appear without refresh
- Test on slow network/system
Potential Issues:
- Data-driven approach requires KPI messages flowing
- If no production running, charts won't initialize (add timeout fallback)
Recommended Implementation:
- Start with data-driven approach (Option A)
- Add polling fallback (Option B) as safety net
- Implement defensive checks (Options C & D)
Rollback: Easy - revert to original setTimeout logic
PHASE 2: Medium-Risk Data Flow Improvements 🔧
Estimated Time: 45 minutes Risk Level: MEDIUM
2.1 Implement KPI Update Throttling with Dual-Path Architecture (Issue 4)
Files:
projects/Plastico/flows.json→ Calculate KPIs function (add second output)projects/Plastico/flows.json→ Record KPI History function (add averaging)
Strategy: Dual-path updates solve the stale display vs jerky graphs trade-off
- Path 1: Unthrottled live KPIs to Home Template for real-time display
- Path 2: Throttled/averaged KPIs to Record History for smooth graphs
Part A: Modify Calculate KPIs to Output on Two Paths
// At the end of Calculate KPIs function
// Prepare the KPI message
const kpiMsg = {
topic: "kpis",
payload: {
timestamp: Date.now(),
kpis: {
oee: msg.kpis.oee,
availability: msg.kpis.availability,
performance: msg.kpis.performance,
quality: msg.kpis.quality
}
}
};
// Return to TWO outputs:
// Output 1: Live KPI to Home Template (real-time, unthrottled)
// Output 2: KPI to Record History (will be averaged/throttled)
return [
kpiMsg, // Path 1: Live display
{ ...kpiMsg } // Path 2: History recording (clone to prevent mutation)
];
Wiring Changes:
- Calculate KPIs node needs 2 outputs (add one more)
- Output 1 → Home Template (existing connection)
- Output 2 → Record KPI History (new connection)
Part B: Add Averaging Logic to Record KPI History
// Complete Record KPI History function with robust initialization
// ========== INITIALIZATION ==========
// Initialize buffer
let buffer = global.get("kpiBuffer");
if (!buffer || !Array.isArray(buffer)) {
buffer = [];
global.set("kpiBuffer", buffer);
node.warn('[KPI History] Initialized kpiBuffer');
}
// Initialize last record time
let lastRecordTime = global.get("lastKPIRecordTime");
if (!lastRecordTime || typeof lastRecordTime !== 'number') {
// Set to 1 minute ago to ensure immediate recording on startup
lastRecordTime = Date.now() - 60000;
global.set("lastKPIRecordTime", lastRecordTime);
node.warn('[KPI History] Initialized lastKPIRecordTime');
}
// ========== ACCUMULATE ==========
const kpis = msg.payload.kpis;
if (!kpis) {
node.warn('[KPI History] No KPIs in message, skipping');
return null;
}
buffer.push({
timestamp: Date.now(),
oee: kpis.oee || 0,
availability: kpis.availability || 0,
performance: kpis.performance || 0,
quality: kpis.quality || 0
});
// Prevent buffer from growing too large (safety limit)
if (buffer.length > 100) {
buffer = buffer.slice(-60); // Keep last 60 entries
node.warn('[KPI History] Buffer exceeded 100 entries, trimmed to 60');
}
global.set("kpiBuffer", buffer);
// ========== CHECK IF TIME TO RECORD ==========
const now = Date.now();
const timeSinceLastRecord = now - lastRecordTime;
const ONE_MINUTE = 60 * 1000;
if (timeSinceLastRecord < ONE_MINUTE) {
// Not time to record yet
const secondsRemaining = Math.ceil((ONE_MINUTE - timeSinceLastRecord) / 1000);
// Debug log (can remove in production)
// node.warn(`[KPI History] Buffer: ${buffer.length} entries, recording in ${secondsRemaining}s`);
return null; // Don't send to charts yet
}
// ========== CALCULATE AVERAGES ==========
if (buffer.length === 0) {
node.warn('[KPI History] Buffer empty at recording time, skipping');
return null;
}
const avg = {
oee: buffer.reduce((sum, d) => sum + d.oee, 0) / buffer.length,
availability: buffer.reduce((sum, d) => sum + d.availability, 0) / buffer.length,
performance: buffer.reduce((sum, d) => sum + d.performance, 0) / buffer.length,
quality: buffer.reduce((sum, d) => sum + d.quality, 0) / buffer.length
};
node.warn(`[KPI History] Recording averaged KPIs from ${buffer.length} samples: OEE=${avg.oee.toFixed(1)}%`);
// ========== RECORD TO HISTORY ==========
// Update global state
global.set("lastKPIRecordTime", now);
global.set("kpiBuffer", []); // Clear buffer
// Send averaged values to graphs and database
return {
topic: "kpi-history",
payload: {
timestamp: now,
kpis: {
oee: Math.round(avg.oee * 10) / 10, // Round to 1 decimal
availability: Math.round(avg.availability * 10) / 10,
performance: Math.round(avg.performance * 10) / 10,
quality: Math.round(avg.quality * 10) / 10
},
sampleCount: buffer.length // Metadata for debugging
}
};
Recommendation: This dual-path approach provides the best of both worlds
Testing:
- Start production
- Observe KPI update frequency in graphs
- Verify updates occur approximately every 60 seconds
- Check that no spikes/gaps appear in data
Potential Issues:
- First data point might take up to 1 minute to appear
- Rapid production changes might not be immediately visible
- Buffer could grow large if production runs without recording
Mitigation:
- Set buffer max size (e.g., 100 entries)
- Force record on production stop/start
Rollback: Medium difficulty - remove throttling logic, clear global variables
PHASE 3: High-Risk Core Logic Fixes ⚠️
Estimated Time: 60 minutes Risk Level: HIGH
⚠️ CRITICAL: Backup flows.json before proceeding
3.1 Fix KPI Continuous Updates (Issue 1)
Files: projects/Plastico/flows.json → Machine Cycles function
Problem: Machine Cycles has multiple early returns that skip KPI calculation
Current Logic:
// Line ~36: No active order
if (!activeOrder || !activeOrder.id || cavities <= 0) {
return [null, stateMsg]; // ❌ Skips KPI calculation
}
// Line ~43: Tracking not enabled
if (!trackingEnabled) {
return [null, stateMsg]; // ❌ Skips KPI calculation
}
Solution Options:
Option A: Always Calculate KPIs (Recommended)
// Always prepare a message for Calculate KPIs on output 2
const kpiTrigger = { _triggerKPI: true };
// Change all returns to include kpiTrigger
if (!activeOrder || !activeOrder.id || cavities <= 0) {
return [null, stateMsg, kpiTrigger]; // ✓ Triggers KPI calculation
}
if (!trackingEnabled) {
return [null, stateMsg, kpiTrigger]; // ✓ Triggers KPI calculation
}
// Update last machine cycle time when a successful cycle occurs
// This is used for time-based availability logic
if (trackingEnabled && dbMsg) {
// dbMsg being non-null implies a cycle was recorded
global.set("lastMachineCycleTime", Date.now());
}
// ... final return
return [dbMsg, stateMsg, kpiTrigger];
Critical: The lastMachineCycleTime update must happen ONLY in Machine Cycles function to maintain a clean "machine pulse" signal separate from KPI calculation triggers.
Wire Configuration Change:
- Add third output wire to Machine Cycles node
- Connect output 3 → Calculate KPIs
Option B: Calculate KPIs in Parallel (Alternative)
- Add an inject node that triggers Calculate KPIs every 5 seconds
- Less coupled, but might calculate with stale data
Recommendation: Option A - ensures KPIs calculated with real-time data
Testing:
- Start production with START button
- Observe KPI values on Home page
- Verify continuous updates (every ~1 second before throttling)
- Check that scrap submission still works
- Test production stop/start
Potential Issues:
- Calculate KPIs might need to handle cases with no active order
- Could calculate KPIs unnecessarily when machine is idle
- Performance impact if calculating too frequently
Mitigation:
- Add guards in Calculate KPIs to handle null/undefined inputs
- Implement Phase 2 throttling first to reduce calculation frequency
- Monitor system performance
CRITICAL: Calculate KPIs Multi-Source Handling
The Calculate KPIs function will now receive triggers from TWO sources:
- Machine Cycles (continuous, real-time) - via new output 3
- Scrap Submission (event-based) - existing connection
Required Change in Calculate KPIs:
// At the start of Calculate KPIs function
// Must handle both trigger types
// The function should execute regardless of message content
// as long as it receives ANY trigger
const trackingEnabled = global.get("trackingEnabled");
const activeOrder = global.get("activeOrder") || {};
const productionStartTime = global.get("productionStartTime");
// Guard against missing critical data
if (!trackingEnabled || !activeOrder.id) {
// Can't calculate meaningful KPIs without tracking or active order
// But don't error - just skip calculation
return null;
}
// ... rest of existing KPI calculation logic
// This logic will now run for BOTH continuous and event-based triggers
This ensures availability and OEE calculations work correctly whether triggered by machine cycles or scrap submission.
Side Effects:
- Will trigger Issue 4 more severely → MUST implement Phase 2 throttling first
- Database might receive more frequent updates
- Global variables will change more often
Rollback: Medium difficulty - requires restoring original return statements and wire configuration
3.2 Fix Availability/OEE Drops to 0 (Issue 3)
Files: projects/Plastico/flows.json → Calculate KPIs function
Investigation Steps:
- Read full Calculate KPIs function
- Identify all paths that set
msg.kpis.availability = 0 - Add logging to track when this occurs
- Understand state flow: trackingEnabled, productionStartTime, operatingTime
Hypothesis Testing:
// Add debug logging at the start
node.warn(`[KPI] trackingEnabled=${trackingEnabled}, startTime=${productionStartTime}, opTime=${operatingTime}`);
// Before setting availability to 0
if (/* condition that causes 0 */) {
node.warn(`[KPI] Setting availability to 0 because: [reason]`);
msg.kpis.availability = 0;
}
Likely Fix:
// BEFORE
} else {
msg.kpis.availability = 0; // Not running
}
// AFTER
} else {
// Check if production was recently active
const prev = global.get("lastKPIValues") || {};
if (prev.availability > 0 && operatingTime > 0) {
// Maintain last availability if we have operating time
msg.kpis.availability = prev.availability;
} else {
msg.kpis.availability = 0;
}
}
// Store KPIs for next iteration
global.set("lastKPIValues", msg.kpis);
Testing:
- Start production
- Monitor availability values
- Trigger scrap prompt
- Verify availability doesn't drop to 0
- Check OEE calculation
Potential Issues:
- Might mask legitimate 0% availability (machine actually stopped)
- Could create artificially high availability readings
- State persistence might cause issues after restart
Mitigation:
- Add clear conditions for when availability should legitimately be 0
- Reset lastKPIValues on work order completion
- Add production state tracking
Rollback: Easy if logging added first - can revert based on log analysis
3.3 Fix START/STOP Button State (Issue 1b)
Files: projects/Plastico/flows.json → Home Template
Problem: Button doesn't show correct state (STOP when production running)
Investigation:
- Find button rendering logic in Home template
- Check how
trackingEnabledorproductionStartedis tracked - Verify message handler receives state updates
Changes:
// In Home Template scope.$watch
if (msg.topic === 'machineStatus') {
window.machineOnline = msg.payload.machineOnline;
window.productionStarted = msg.payload.productionStarted;
// NEW: Track tracking state for button display
window.trackingEnabled = msg.payload.trackingEnabled || window.productionStarted;
scope.renderDashboard();
return;
}
Button HTML Update:
<!-- BEFORE -->
<button ng-click="handleStart()">START</button>
<!-- AFTER -->
<button ng-click="handleStart()" ng-show="!trackingEnabled">START</button>
<button ng-click="handleStop()" ng-show="trackingEnabled" class="stop-btn">STOP</button>
Backend Update (Work Order buttons):
// When START clicked, also set trackingEnabled flag
if (action === "start-tracking") {
global.set("trackingEnabled", true);
// CRITICAL: Clear KPI buffer on production start
// Prevents stale data from skewing averages if Node-RED was restarted mid-production
global.set("kpiBuffer", []);
node.warn('[START] Cleared kpiBuffer for fresh production run');
// Optional: Reset last record time to ensure immediate data point
global.set("lastKPIRecordTime", Date.now() - 60000);
// Send state update to UI
const stateMsg = {
topic: "machineStatus",
payload: {
machineOnline: true,
productionStarted: true,
trackingEnabled: true
}
};
// ... send stateMsg to Home template
}
Why Clear Buffer on START:
If Node-RED restarts during a production run and context is restored from disk, the kpiBuffer might contain stale data from before the restart. When production resumes, new data would be mixed with old data, skewing the averages. Clearing on START ensures a clean slate for each production session.
Testing:
- Load dashboard
- Start work order
- Verify START button changes to STOP
- Click STOP (if implemented)
- Verify button changes back to START
Potential Issues:
- Need to implement STOP button handler if it doesn't exist
- State sync between backend and frontend
- Button might flicker during state transitions
Rollback: Easy - remove button visibility conditions
Implementation Order & Dependencies
Recommended Sequence:
- Phase 1.1 - Fix Filters (Independent, low risk)
- Phase 1.2 - Fix Empty Graphs (Independent, low risk)
- Phase 2.1 - Add Throttling (Required before Phase 3.1)
- Phase 3.2 - Fix Availability Calculation (Add logging first)
- Phase 3.1 - Fix Continuous KPI Updates (Depends on throttling)
- Phase 3.3 - Fix Button State (Can be done anytime)
Why This Order?
- Quick wins first - Build confidence, improve UX immediately
- Throttling before continuous updates - Prevent performance issues
- Logging before logic changes - Understand problem before fixing
- Independent fixes can run parallel - Save time
Testing Strategy
Per-Phase Testing:
- Test each phase independently
- Don't proceed to next phase if current fails
- Keep backup of working state
Integration Testing (After All Phases):
-
Fresh Start Test
- Clear browser cache
- Restart Node-RED
- Load dashboard
- Navigate all tabs
-
Production Cycle Test
- Start new work order
- Click START
- Let run for 2-3 minutes
- Submit scrap
- Verify KPIs update
- Check graphs show data
- Test time filters
-
State Persistence Test
- Refresh page during production
- Verify state restores correctly
- Check button shows STOP if running
-
Edge Cases
- No active work order
- Machine offline
- Zero production time
- Rapid start/stop
Rollback Plan
Per-Phase Rollback:
Each phase documents its rollback procedure. In general:
- Stop Node-RED
- Restore flows.json from backup
cp projects/Plastico/flows.json.backup projects/Plastico/flows.json - Clear global context (if needed)
// In a debug node global.set("lastKPIRecordTime", null); global.set("kpiBuffer", null); global.set("lastKPIValues", null); - Restart Node-RED
- Clear browser cache
Emergency Full Rollback:
# Restore from most recent backup
cp projects/Plastico/Respaldo_MVP_Complete_11_23_25.json projects/Plastico/flows.json
# Restart Node-RED
node-red-restart
Potential Roadblocks & Mitigations
Roadblock 1: Global Context Persistence on Deploy/Restart ⚠️ CRITICAL
Symptom: After Node-RED restart or deploy, throttling/averaging/availability logic breaks or shows incorrect data
Root Cause: Global variables (lastKPIRecordTime, kpiBuffer, lastKPIValues, trackingEnabled) may be reset or restored from file/memory store depending on settings.js configuration
Mitigation:
- Add Robust Initialization Logic:
// In Record KPI History function - ALWAYS check and initialize
let buffer = global.get("kpiBuffer");
if (!buffer || !Array.isArray(buffer)) {
buffer = [];
global.set("kpiBuffer", buffer);
}
let lastRecordTime = global.get("lastKPIRecordTime");
if (!lastRecordTime || typeof lastRecordTime !== 'number') {
// Set to 1 minute ago to ensure immediate recording on startup
lastRecordTime = Date.now() - 60000;
global.set("lastKPIRecordTime", lastRecordTime);
}
- Create an Init Node:
- Add a dedicated "Initialize Global Variables" function node
- Trigger on deploy using an inject node (inject once, delay 0)
- Wire to all critical nodes to ensure state is set before first execution
Complete Init Node Code:
// Initialize Global Variables - Run on Deploy
node.warn('[INIT] Initializing global variables');
// KPI Buffer for averaging
if (!global.get("kpiBuffer")) {
global.set("kpiBuffer", []);
node.warn('[INIT] Set kpiBuffer to []');
}
// Last KPI record time - set to 1 min ago for immediate first record
if (!global.get("lastKPIRecordTime")) {
global.set("lastKPIRecordTime", Date.now() - 60000);
node.warn('[INIT] Set lastKPIRecordTime');
}
// Last machine cycle time - set to now to prevent immediate 0% availability
if (!global.get("lastMachineCycleTime")) {
global.set("lastMachineCycleTime", Date.now());
node.warn('[INIT] Set lastMachineCycleTime to prevent 0% availability on startup');
}
// Last KPI values
if (!global.get("lastKPIValues")) {
global.set("lastKPIValues", {});
node.warn('[INIT] Set lastKPIValues to {}');
}
node.warn('[INIT] Global variable initialization complete');
return msg;
- Check settings.js:
- Verify contextStorage configuration
- Consider using
filestorage for persistence if usingmemory(default)
Testing:
- Deploy changes multiple times
- Restart Node-RED
- Verify variables persist/initialize correctly
- Check debug logs for initialization messages
Roadblock 2: State Sync Between Flow and Dashboard (Push vs Pull Model)
Symptom: START/STOP button shows wrong state when user loads dashboard mid-production Root Cause: Relying on push model (messages sent during state changes) - if user loads page after tracking started, initial message is missed
Mitigation:
- Add Pull Mechanism in Home Template:
// In Home Template initialization
(function(scope) {
// Request current state on load
scope.send({
topic: "requestState",
payload: {}
});
// Handle state response
scope.$watch('msg', function(msg) {
if (msg && msg.topic === 'currentState') {
window.trackingEnabled = msg.payload.trackingEnabled;
window.productionStarted = msg.payload.productionStarted;
window.machineOnline = msg.payload.machineOnline;
scope.renderDashboard();
}
// ... rest of watch logic
});
})(scope);
- Add State Response Handler:
- Create function node that listens for
requestStatetopic - Responds with current global state values
- Wire to Home template
- Create function node that listens for
Testing:
- Start production
- Open dashboard in new browser tab
- Verify button shows STOP immediately
- Test with multiple browser sessions
Roadblock 3: UI/Angular Timing Races in ui-template ⚠️ HIGH IMPACT
Symptom: Charts sometimes load, sometimes don't - fixed timeout (500ms) is unreliable on slow systems or complex templates Root Cause: Node-RED Dashboard uses AngularJS - digest cycle and DOM rendering timing is unpredictable
Mitigation Option A - Data-Driven Initialization (RECOMMENDED):
// Instead of fixed timeout, wait for first data
let chartsInitialized = false;
scope.$watch('msg', function(msg) {
if (msg && msg.kpis && !chartsInitialized) {
// First data arrived, scope is ready
initFilters();
createCharts(currentRange);
chartsInitialized = true;
}
if (chartsInitialized && msg && msg.kpis) {
updateCharts(msg);
}
});
Mitigation Option B - Angular Lifecycle Hook:
// Hook into Angular's ready state
scope.$applyAsync(function() {
// DOM and scope guaranteed ready
initFilters();
createCharts(currentRange);
});
Mitigation Option C - Polling with Timeout:
function initWhenReady(attempts = 0) {
const oeeEl = document.getElementById("chart-oee");
if (oeeEl && scope.gotoTab) {
// Both DOM and scope ready
initFilters();
createCharts(currentRange);
} else if (attempts < 20) {
// Retry every 100ms, max 2 seconds
setTimeout(() => initWhenReady(attempts + 1), 100);
} else {
console.error("Failed to initialize charts after 2 seconds");
}
}
// Start polling
initWhenReady();
Recommendation: Use Option A for most reliable results
Roadblock 4: Throttling vs Live Display Trade-off
Symptom: With averaging, displayed KPIs are stale (up to 59 seconds old), but without averaging, graphs are jerky Root Cause: OEE is a real-time snapshot - averaging smooths graphs but delays live feedback
Solution: Dual-Path KPI Updates
Architecture:
- Path 1 (Live): Machine Cycles → Calculate KPIs → Home Template (no throttling)
- Path 2 (History): Machine Cycles → Calculate KPIs → Averaging Buffer → Record History (throttled to 1 min)
Implementation:
// In Calculate KPIs function - send to TWO outputs
return [
msg, // Output 1: Live KPI to Home Template (unthrottled)
{ ...msg } // Output 2: KPI to History (will be throttled)
];
In Record KPI History - add averaging logic:
// Only this node has averaging/throttling
let buffer = global.get("kpiBuffer") || [];
buffer.push({
timestamp: Date.now(),
oee: msg.kpis.oee,
availability: msg.kpis.availability,
performance: msg.kpis.performance,
quality: msg.kpis.quality
});
const lastRecord = global.get("lastKPIRecordTime") || 0;
const now = Date.now();
if (now - lastRecord >= 60000) {
// Average the buffer
const avg = {
oee: buffer.reduce((sum, d) => sum + d.oee, 0) / buffer.length,
// ... other metrics
};
// Record averaged values to history
// Send to Graphs template
global.set("lastKPIRecordTime", now);
global.set("kpiBuffer", []);
return { kpis: avg };
} else {
global.set("kpiBuffer", buffer);
return null; // Don't record yet
}
Benefits:
- Live display always shows current OEE
- Graphs are smooth with averaged data
- No UX compromise
Roadblock 5: Availability 0% Logic Too Simplistic
Symptom: Availability drops to 0% during brief pauses (scrap submission) but also might NOT drop to 0% during legitimate stops (breaks, maintenance) Root Cause: Using previous value without time-based threshold can't distinguish brief interruption from actual shutdown
Improved Logic:
// In Calculate KPIs function
const now = Date.now();
const lastCycleTime = global.get("lastMachineCycleTime") || now;
const timeSinceLastCycle = now - lastCycleTime;
const BRIEF_PAUSE_THRESHOLD = 5 * 60 * 1000; // 5 minutes
if (!trackingEnabled || timeSinceLastCycle > BRIEF_PAUSE_THRESHOLD) {
// Legitimately stopped or long pause
msg.kpis.availability = 0;
global.set("lastKPIValues", null); // Clear history
} else if (operatingTime > 0) {
// Calculate normally
msg.kpis.availability = calculateAvailability(operatingTime, plannedTime);
global.set("lastKPIValues", msg.kpis);
} else {
// Brief pause - maintain last known value
const prev = global.get("lastKPIValues") || {};
msg.kpis.availability = prev.availability || 0;
}
// NOTE: lastMachineCycleTime is updated in Machine Cycles function ONLY
// This keeps the "machine pulse" signal clean and separate from KPI calculation
Configuration:
- Adjust
BRIEF_PAUSE_THRESHOLDbased on your production environment - Consider making it configurable via dashboard setting
Roadblock 6: KPI Calculation Performance
Symptom: System slow after implementing continuous KPI updates Mitigation:
- Implement Phase 2 throttling FIRST (now with dual-path approach)
- Ensure Calculate KPIs has guards for null/undefined inputs
- Profile Calculate KPIs function for optimization
- Monitor Node-RED CPU usage during production
Roadblock 7: Browser Cache Issues
Symptom: Changes don't appear after deployment Mitigation:
- Clear browser cache during testing (Ctrl+Shift+R / Cmd+Shift+R)
- Add cache-busting version to template (optional):
// In template header
<!-- Version: 1.1 - {{Date.now()}} -->
- Use incognito/private browsing for testing
- Test on different browsers/devices
Success Criteria
Phase 1:
- ✅ Time filters change graph display correctly
- ✅ Graphs load on first visit without refresh
- ✅ Sidebar navigation works immediately
Phase 2:
- ✅ Graph updates occur at ~1 minute intervals
- ✅ Graphs are smooth, not jerky
- ✅ No performance degradation
Phase 3:
- ✅ KPIs update continuously during production
- ✅ Availability never incorrectly shows 0%
- ✅ START button shows STOP when production running
- ✅ OEE calculation is accurate
Integration:
- ✅ All features work together without conflicts
- ✅ No console errors
- ✅ Production tracking works end-to-end
- ✅ Data persists correctly
Estimated Timeline
| Phase | Task | Time | Cumulative |
|---|---|---|---|
| 1.1 | Fix Filters | 15 min | 15 min |
| 1.2 | Fix Empty Graphs | 15 min | 30 min |
| 2.1 | Add Throttling | 45 min | 1h 15m |
| 3.2 | Fix Availability (with logging) | 30 min | 1h 45m |
| 3.1 | Fix Continuous Updates | 30 min | 2h 15m |
| 3.3 | Fix Button State | 20 min | 2h 35m |
| Testing | Integration Testing | 30 min | 3h 5m |
Total: ~3 hours (assuming no major roadblocks)
Best Practices for LLM-Assisted Implementation
When working with an LLM to implement this plan, use these strategies for best results:
1. Isolate Logic Focus (Function Node Precision)
DO:
- Ask for specific function node code: "Write the Record KPI History function with averaging logic including global.get initialization"
- Provide exact input/output requirements: "This function receives msg.kpis object and must return msg or null"
- Request one change at a time
DON'T:
- Ask vague questions like "fix my dashboard"
- Request multiple phase changes in one prompt
- Assume LLM knows your flow structure
2. Explicitly Define Global Variables
Template for LLM prompts:
Global variable: kpiBuffer
Type: Array of objects
Structure: [{timestamp: number, oee: number, availability: number, performance: number, quality: number}]
Lifecycle: Initialized to [] if null, cleared after recording to history
Purpose: Accumulates KPI values for 1-minute averaging
Always specify:
- Variable name
- Data type
- Default/initial value
- When it's read/written
- When it should be cleared
3. Specify Node-RED Input/Output Requirements
Example prompt:
The Machine Cycles function node must have 3 outputs:
- Output 1: DB write message (only when tracking enabled)
- Output 2: State update message (always sent)
- Output 3: KPI trigger message (always sent for continuous updates)
The return statement should be:
return [dbMsg, stateMsg, kpiTrigger];
4. Request Defensive Code
Always ask for:
- Null/undefined checks before accessing properties
- Type validation for global variables
- Initialization logic at the start of functions
- Error handling for edge cases
Example:
// BAD (LLM might generate)
const buffer = global.get("kpiBuffer");
buffer.push(newValue);
// GOOD (what you should request)
let buffer = global.get("kpiBuffer");
if (!buffer || !Array.isArray(buffer)) {
buffer = [];
}
buffer.push(newValue);
global.set("kpiBuffer", buffer);
5. Break Down Complex Changes
For Phase 3.1 (Continuous KPI Updates), ask in sequence:
- "Show me the current return statements in Machine Cycles function"
- "Modify the function to add a third output for KPI trigger"
- "Update all return statements to include kpiTrigger message"
- "Show me how to wire the third output to Calculate KPIs node"
6. Request Testing/Debugging Code
Ask LLM to include:
- Debug logging:
node.warn('[KPI] Buffer size: ' + buffer.length); - State validation: Check that variables have expected values
- Error messages: Descriptive messages for troubleshooting
7. Validate Against Node-RED Constraints
Remind LLM of Node-RED specifics:
- "This is a Node-RED function node, not regular JavaScript"
- "Global context uses global.get/set, not regular variables"
- "The msg object must be returned to send to next node"
- "Use node.warn() for logging, not console.log()"
8. Phase-by-Phase Verification
After each LLM response:
- Verify the code matches the plan
- Check for initialization logic
- Confirm output structure matches wiring
- Ask: "What edge cases does this handle?"
9. Example: Perfect LLM Prompt for Phase 2.1
I need to implement KPI throttling with averaging in Node-RED.
Context:
- Function node: "Record KPI History"
- Input: msg.kpis object with {oee, availability, performance, quality}
- Output: Averaged KPI values sent to Graphs template (or null if not ready to record)
Global variables needed:
1. kpiBuffer (Array): Accumulates KPI snapshots. Initialize to [] if null.
2. lastKPIRecordTime (Number): Last timestamp when history was recorded. Initialize to (Date.now() - 60000) if null for immediate first recording.
Requirements:
- Accumulate incoming KPIs in kpiBuffer
- Every 60 seconds (60000ms), calculate average of all buffer values
- Send averaged KPIs to output
- Clear buffer after sending
- If less than 60 seconds since last record, return null (don't send)
Please write the complete function with:
- Robust initialization (check and set defaults)
- Debug logging (buffer size, time until next record)
- Comments explaining each section
- Edge case handling (empty buffer, first run)
10. Common Pitfalls to Avoid
- Assuming LLM knows your flow structure - Always describe node connections
- Not specifying Node-RED context - LLM might give generic JavaScript instead
- Requesting too many changes at once - Break into single-phase requests
- Forgetting to mention global variable persistence - Specify initialization needs
- Not asking for defensive code - Request null checks and type validation
- Vague success criteria - Define exactly what "working" means
Quick Reference: Key Code Snippets
1. Init Node (Run on Deploy)
// Initialize Global Variables - Inject Once on Deploy
node.warn('[INIT] Initializing global variables');
if (!global.get("kpiBuffer")) global.set("kpiBuffer", []);
if (!global.get("lastKPIRecordTime")) global.set("lastKPIRecordTime", Date.now() - 60000);
if (!global.get("lastMachineCycleTime")) global.set("lastMachineCycleTime", Date.now());
if (!global.get("lastKPIValues")) global.set("lastKPIValues", {});
node.warn('[INIT] Complete');
return msg;
2. Machine Cycles - Add to Final Return
// Update last machine cycle time when a successful cycle occurs
if (trackingEnabled && dbMsg) {
global.set("lastMachineCycleTime", Date.now());
}
return [dbMsg, stateMsg, kpiTrigger];
3. Calculate KPIs - Multi-Source Guard
const trackingEnabled = global.get("trackingEnabled");
const activeOrder = global.get("activeOrder") || {};
if (!trackingEnabled || !activeOrder.id) return null;
// ... rest of calculation
4. Work Order START Button - Clear Buffer
if (action === "start-tracking") {
global.set("trackingEnabled", true);
global.set("kpiBuffer", []); // Clear stale data
global.set("lastKPIRecordTime", Date.now() - 60000);
// ... send state update
}
5. Graphs Template - Combined Init
let chartsInitialized = false;
scope.$watch('msg', function(msg) {
if (msg && msg.payload && msg.payload.kpis && !chartsInitialized) {
initFilters();
createCharts(currentRange);
chartsInitialized = true;
}
if (chartsInitialized && msg && msg.payload && msg.payload.kpis) {
updateCharts(msg);
}
});
setTimeout(() => {
if (!chartsInitialized) {
initFilters();
createCharts(currentRange);
chartsInitialized = true;
}
}, 5000);
Final Notes
- Backup First: Always backup
flows.jsonbefore starting each phase - Test Incrementally: Don't skip testing between phases
- Document Changes: Note any deviations from plan
- Monitor Logs: Watch Node-RED debug output during testing
- Clear Cache: Browser cache can mask issues
- Use LLM Strategically: Follow the best practices above for precise, working code
If you encounter issues not covered in this plan, STOP and ask for help before proceeding.