# OEE Dashboard Fix Plan
## Comprehensive Strategy for Resolving All Issues
---
## Executive Summary
We have identified 5 distinct issues affecting your OEE dashboard. This plan addresses each systematically, ordered by priority based on impact, risk, and dependencies.
**Estimated Total Implementation Time:** 2-3 hours
**Recommended Approach:** Sequential implementation with testing between each phase
### Key Improvements in This Updated Plan
This plan has been enhanced based on critical friction point analysis for Node-RED environments:
1. **Global Context Persistence** - Added robust initialization logic for all global variables to handle Node-RED restarts and deploys without data loss or spikes
2. **State Synchronization (Push + Pull Model)** - Enhanced START/STOP button state tracking with both push notifications AND pull requests to handle mid-production dashboard loads
3. **Angular Timing Issues** - Replaced brittle fixed timeouts with data-driven initialization and polling fallback for reliable chart loading across all system speeds
4. **Dual-Path KPI Architecture** - Implemented separate paths for live display (real-time, unthrottled) and historical graphs (averaged, smooth) to eliminate the stale-data vs jerky-graphs trade-off
5. **Time-Based Availability Logic** - Enhanced availability calculation with configurable time thresholds to distinguish brief pauses from legitimate shutdowns
6. **LLM Implementation Guide** - Added comprehensive best practices section for working with LLMs to implement this plan with precise, defensive code
### Critical Refinements (Final Review)
Based on final review, these critical refinements have been integrated:
1. **Clear Buffer on Production START** - Prevents stale data from skewing averages if Node-RED restarts mid-production and context is restored from disk
2. **Consolidated lastMachineCycleTime Updates** - Now updated ONLY in Machine Cycles function (not Calculate KPIs) to maintain clean "machine pulse" signal, initialized to `Date.now()` on startup to prevent immediate 0% availability
3. **Combined Initialization Strategy** - Graphs now use BOTH data-driven initialization (fast when production is running) AND 5-second safety timeout (for idle machine scenarios)
4. **Multi-Source KPI Calculation** - Calculate KPIs now explicitly handles triggers from both Machine Cycles (continuous) and Scrap Submission (event-based) with proper guards
5. **Complete Init Node** - Added production-ready initialization function with all global variables (`kpiBuffer`, `lastKPIRecordTime`, `lastMachineCycleTime`, `lastKPIValues`) properly initialized with correct default values and logging
---
## Issue Breakdown & Root Causes
### **Issue 1: KPI Updates Only on Scrap Submission**
**Symptom:** KPIs stay static during production, only update when scrap is submitted or START/STOP clicked
**Root Cause:**
- Machine Cycles function has multiple return paths with `[null, ...]` outputs
- Output to Calculate KPIs (output port 2) only happens in specific conditions
- When `trackingEnabled` is false or no active order, KPI calculation is skipped
- **Critical line:** `if (!trackingEnabled) return [null, stateMsg];` prevents KPI updates
**Sub-issue 1b: START/STOP Button State**
- Button state not persisting because UI doesn't track `trackingEnabled` global variable
- Home template needs to watch for tracking state changes
---
### **Issue 2: Graphs Empty on First Load, Sidebar Broken**
**Symptom:** Graphs tab shows blank, navigation doesn't work until refresh
**Root Causes:**
1. **Timing Issue:** Charts created before Angular/scope is fully ready
2. **Scope Isolation:** `scope.gotoTab` might not be accessible immediately
3. **Data Race:** Charts created before first KPI data arrives
**Why refresh works:** Second load benefits from cached scope and existing data
---
### **Issue 3: Availability & OEE Drop to 0%**
**Symptom:** Metrics incorrectly show 0% during active production
**Root Cause:**
- Calculate KPIs function has logic that sets availability to 0 when certain conditions aren't met
- **Need to verify:** When does `trackingEnabled` check fail?
- **Hypothesis:** When production is running but tracking flag isn't properly set, availability defaults to 0
---
### **Issue 4: Graph Updates Too Frequent/Jerky**
**Symptom:** Data points recorded too often, causing choppy visualization
**Root Cause:**
- Record KPI History is called on EVERY Calculate KPIs output
- With machine cycles happening every ~1 second, KPIs recorded every second
- Need time-based throttling (1-minute intervals) instead of event-based recording
---
### **Issue 5: Time Range Filters Not Working**
**Symptom:** Shift/Day/Week/Month/Year buttons don't change graph display
**Root Cause:**
- `build(metric, range)` function receives range parameter but **ignores it**
- Function always returns ALL data from `realtimeData[metric]`
- Need to filter data based on selected time range
---
## Fix Plan - Phased Approach
### **PHASE 1: Low-Risk Quick Wins** ⚡
*Estimated Time: 30 minutes*
*Risk Level: LOW*
#### 1.1 Fix Graph Filters (Issue 5)
**Files:** `projects/Plastico/flows.json` → Graphs Template
**Changes:**
```javascript
// BEFORE
function build(metric, range){
const arr = realtimeData[metric];
if (!arr || arr.length === 0) return [];
return arr.map(d=>({x:d.timestamp, y:d.value}));
}
// AFTER
function build(metric, range){
const arr = realtimeData[metric];
if (!arr || arr.length === 0) return [];
// Calculate time cutoff based on range
const now = Date.now();
const cutoffs = {
shift: 8 * 60 * 60 * 1000, // 8 hours
day: 24 * 60 * 60 * 1000, // 24 hours
week: 7 * 24 * 60 * 60 * 1000, // 7 days
month: 30 * 24 * 60 * 60 * 1000, // 30 days
year: 365 * 24 * 60 * 60 * 1000 // 365 days
};
const cutoffTime = now - (cutoffs[range] || cutoffs.shift);
// Filter data to selected time range
return arr
.filter(d => d.timestamp >= cutoffTime)
.map(d => ({x: d.timestamp, y: d.value}));
}
```
**Testing:**
- Click each filter button
- Verify data range changes in charts
- Check that no errors occur
**Potential Issues:**
- If no data exists in selected range, chart might be empty (expected behavior)
**Rollback:** Easy - revert to original build() function
---
#### 1.2 Fix Empty Graphs on First Load (Issue 2)
**Files:** `projects/Plastico/flows.json` → Graphs Template
**Strategy:** Use data-driven initialization instead of fixed timeout for reliability
**Changes:**
**A) Combined Data-Driven + Safety Timeout (RECOMMENDED)**
```javascript
// BEFORE
setTimeout(()=>{
initFilters();
createCharts(currentRange);
},300);
// AFTER - Wait for first data message OR timeout
let chartsInitialized = false;
scope.$watch('msg', function(msg) {
// Initialize on first KPI data arrival
if (msg && msg.payload && msg.payload.kpis && !chartsInitialized) {
// Scope and data are both ready
initFilters();
createCharts(currentRange);
chartsInitialized = true;
console.log('[Graphs] Charts initialized via data-driven approach');
}
// Update charts if already initialized
if (chartsInitialized && msg && msg.payload && msg.payload.kpis) {
updateCharts(msg);
}
});
// ADDED: Safety timer for when machine is idle (no KPI messages flowing)
setTimeout(() => {
if (!chartsInitialized) {
console.warn('[Graphs] Charts initialized via safety timer (machine idle)');
initFilters();
createCharts(currentRange);
chartsInitialized = true;
}
}, 5000); // 5 seconds grace period for KPI messages
```
**Why Both?**
- **Data-driven**: Ensures charts initialize as soon as data is available (fast, reliable)
- **Safety timeout**: Handles "dashboard loaded but machine is idle" scenario (no KPI messages)
- Together they cover both active production and idle machine scenarios
**B) Fallback: Polling with timeout (if data-driven doesn't work)**
```javascript
function initWhenReady(attempts = 0) {
const oeeEl = document.getElementById("chart-oee");
const availEl = document.getElementById("chart-availability");
if (oeeEl && availEl && scope.gotoTab) {
// Both DOM and scope ready
initFilters();
createCharts(currentRange);
} else if (attempts < 20) {
// Retry every 100ms, max 2 seconds
setTimeout(() => initWhenReady(attempts + 1), 100);
} else {
console.error("[Graphs] Failed to initialize charts after 2 seconds");
}
}
// Start polling on load
initWhenReady();
```
**C) Ensure scope.gotoTab is properly bound**
```javascript
// BEFORE
(function(scope){
scope.gotoTab = t => scope.send({ui_control:{tab:t}});
})(scope);
// AFTER
(function(s){
if (!s.gotoTab) {
s.gotoTab = function(t) {
s.send({ui_control: {tab: t}});
};
}
})(scope);
```
**D) Add defensive chart creation with retry**
```javascript
function createCharts(range){
// Ensure DOM elements exist
const oeeEl = document.getElementById("chart-oee");
const availEl = document.getElementById("chart-availability");
if (!oeeEl || !availEl) {
console.warn("[Graphs] Chart elements not ready, retrying...");
setTimeout(() => createCharts(range), 200);
return;
}
// ... rest of existing chart creation logic
}
```
**Testing:**
- Clear browser cache
- Navigate to Graphs tab from fresh load
- Test sidebar navigation
- Verify charts appear without refresh
- Test on slow network/system
**Potential Issues:**
- Data-driven approach requires KPI messages flowing
- If no production running, charts won't initialize (add timeout fallback)
**Recommended Implementation:**
1. Start with data-driven approach (Option A)
2. Add polling fallback (Option B) as safety net
3. Implement defensive checks (Options C & D)
**Rollback:** Easy - revert to original setTimeout logic
---
### **PHASE 2: Medium-Risk Data Flow Improvements** 🔧
*Estimated Time: 45 minutes*
*Risk Level: MEDIUM*
#### 2.1 Implement KPI Update Throttling with Dual-Path Architecture (Issue 4)
**Files:**
- `projects/Plastico/flows.json` → Calculate KPIs function (add second output)
- `projects/Plastico/flows.json` → Record KPI History function (add averaging)
**Strategy:** Dual-path updates solve the stale display vs jerky graphs trade-off
- **Path 1:** Unthrottled live KPIs to Home Template for real-time display
- **Path 2:** Throttled/averaged KPIs to Record History for smooth graphs
**Part A: Modify Calculate KPIs to Output on Two Paths**
```javascript
// At the end of Calculate KPIs function
// Prepare the KPI message
const kpiMsg = {
topic: "kpis",
payload: {
timestamp: Date.now(),
kpis: {
oee: msg.kpis.oee,
availability: msg.kpis.availability,
performance: msg.kpis.performance,
quality: msg.kpis.quality
}
}
};
// Return to TWO outputs:
// Output 1: Live KPI to Home Template (real-time, unthrottled)
// Output 2: KPI to Record History (will be averaged/throttled)
return [
kpiMsg, // Path 1: Live display
{ ...kpiMsg } // Path 2: History recording (clone to prevent mutation)
];
```
**Wiring Changes:**
- Calculate KPIs node needs **2 outputs** (add one more)
- Output 1 → Home Template (existing connection)
- Output 2 → Record KPI History (new connection)
**Part B: Add Averaging Logic to Record KPI History**
```javascript
// Complete Record KPI History function with robust initialization
// ========== INITIALIZATION ==========
// Initialize buffer
let buffer = global.get("kpiBuffer");
if (!buffer || !Array.isArray(buffer)) {
buffer = [];
global.set("kpiBuffer", buffer);
node.warn('[KPI History] Initialized kpiBuffer');
}
// Initialize last record time
let lastRecordTime = global.get("lastKPIRecordTime");
if (!lastRecordTime || typeof lastRecordTime !== 'number') {
// Set to 1 minute ago to ensure immediate recording on startup
lastRecordTime = Date.now() - 60000;
global.set("lastKPIRecordTime", lastRecordTime);
node.warn('[KPI History] Initialized lastKPIRecordTime');
}
// ========== ACCUMULATE ==========
const kpis = msg.payload.kpis;
if (!kpis) {
node.warn('[KPI History] No KPIs in message, skipping');
return null;
}
buffer.push({
timestamp: Date.now(),
oee: kpis.oee || 0,
availability: kpis.availability || 0,
performance: kpis.performance || 0,
quality: kpis.quality || 0
});
// Prevent buffer from growing too large (safety limit)
if (buffer.length > 100) {
buffer = buffer.slice(-60); // Keep last 60 entries
node.warn('[KPI History] Buffer exceeded 100 entries, trimmed to 60');
}
global.set("kpiBuffer", buffer);
// ========== CHECK IF TIME TO RECORD ==========
const now = Date.now();
const timeSinceLastRecord = now - lastRecordTime;
const ONE_MINUTE = 60 * 1000;
if (timeSinceLastRecord < ONE_MINUTE) {
// Not time to record yet
const secondsRemaining = Math.ceil((ONE_MINUTE - timeSinceLastRecord) / 1000);
// Debug log (can remove in production)
// node.warn(`[KPI History] Buffer: ${buffer.length} entries, recording in ${secondsRemaining}s`);
return null; // Don't send to charts yet
}
// ========== CALCULATE AVERAGES ==========
if (buffer.length === 0) {
node.warn('[KPI History] Buffer empty at recording time, skipping');
return null;
}
const avg = {
oee: buffer.reduce((sum, d) => sum + d.oee, 0) / buffer.length,
availability: buffer.reduce((sum, d) => sum + d.availability, 0) / buffer.length,
performance: buffer.reduce((sum, d) => sum + d.performance, 0) / buffer.length,
quality: buffer.reduce((sum, d) => sum + d.quality, 0) / buffer.length
};
node.warn(`[KPI History] Recording averaged KPIs from ${buffer.length} samples: OEE=${avg.oee.toFixed(1)}%`);
// ========== RECORD TO HISTORY ==========
// Update global state
global.set("lastKPIRecordTime", now);
global.set("kpiBuffer", []); // Clear buffer
// Send averaged values to graphs and database
return {
topic: "kpi-history",
payload: {
timestamp: now,
kpis: {
oee: Math.round(avg.oee * 10) / 10, // Round to 1 decimal
availability: Math.round(avg.availability * 10) / 10,
performance: Math.round(avg.performance * 10) / 10,
quality: Math.round(avg.quality * 10) / 10
},
sampleCount: buffer.length // Metadata for debugging
}
};
```
**Recommendation:** This dual-path approach provides the best of both worlds
**Testing:**
- Start production
- Observe KPI update frequency in graphs
- Verify updates occur approximately every 60 seconds
- Check that no spikes/gaps appear in data
**Potential Issues:**
- First data point might take up to 1 minute to appear
- Rapid production changes might not be immediately visible
- Buffer could grow large if production runs without recording
**Mitigation:**
- Set buffer max size (e.g., 100 entries)
- Force record on production stop/start
**Rollback:** Medium difficulty - remove throttling logic, clear global variables
---
### **PHASE 3: High-Risk Core Logic Fixes** ⚠️
*Estimated Time: 60 minutes*
*Risk Level: HIGH*
**⚠️ CRITICAL: Backup flows.json before proceeding**
#### 3.1 Fix KPI Continuous Updates (Issue 1)
**Files:** `projects/Plastico/flows.json` → Machine Cycles function
**Problem:** Machine Cycles has multiple early returns that skip KPI calculation
**Current Logic:**
```javascript
// Line ~36: No active order
if (!activeOrder || !activeOrder.id || cavities <= 0) {
return [null, stateMsg]; // ❌ Skips KPI calculation
}
// Line ~43: Tracking not enabled
if (!trackingEnabled) {
return [null, stateMsg]; // ❌ Skips KPI calculation
}
```
**Solution Options:**
**Option A: Always Calculate KPIs (Recommended)**
```javascript
// Always prepare a message for Calculate KPIs on output 2
const kpiTrigger = { _triggerKPI: true };
// Change all returns to include kpiTrigger
if (!activeOrder || !activeOrder.id || cavities <= 0) {
return [null, stateMsg, kpiTrigger]; // ✓ Triggers KPI calculation
}
if (!trackingEnabled) {
return [null, stateMsg, kpiTrigger]; // ✓ Triggers KPI calculation
}
// Update last machine cycle time when a successful cycle occurs
// This is used for time-based availability logic
if (trackingEnabled && dbMsg) {
// dbMsg being non-null implies a cycle was recorded
global.set("lastMachineCycleTime", Date.now());
}
// ... final return
return [dbMsg, stateMsg, kpiTrigger];
```
**Critical:** The `lastMachineCycleTime` update must happen ONLY in Machine Cycles function to maintain a clean "machine pulse" signal separate from KPI calculation triggers.
**Wire Configuration Change:**
- Add third output wire to Machine Cycles node
- Connect output 3 → Calculate KPIs
**Option B: Calculate KPIs in Parallel (Alternative)**
- Add an inject node that triggers Calculate KPIs every 5 seconds
- Less coupled, but might calculate with stale data
**Recommendation:** Option A - ensures KPIs calculated with real-time data
**Testing:**
1. Start production with START button
2. Observe KPI values on Home page
3. Verify continuous updates (every ~1 second before throttling)
4. Check that scrap submission still works
5. Test production stop/start
**Potential Issues:**
- Calculate KPIs might need to handle cases with no active order
- Could calculate KPIs unnecessarily when machine is idle
- Performance impact if calculating too frequently
**Mitigation:**
- Add guards in Calculate KPIs to handle null/undefined inputs
- Implement Phase 2 throttling first to reduce calculation frequency
- Monitor system performance
**CRITICAL: Calculate KPIs Multi-Source Handling**
The Calculate KPIs function will now receive triggers from TWO sources:
1. **Machine Cycles** (continuous, real-time) - via new output 3
2. **Scrap Submission** (event-based) - existing connection
**Required Change in Calculate KPIs:**
```javascript
// At the start of Calculate KPIs function
// Must handle both trigger types
// The function should execute regardless of message content
// as long as it receives ANY trigger
const trackingEnabled = global.get("trackingEnabled");
const activeOrder = global.get("activeOrder") || {};
const productionStartTime = global.get("productionStartTime");
// Guard against missing critical data
if (!trackingEnabled || !activeOrder.id) {
// Can't calculate meaningful KPIs without tracking or active order
// But don't error - just skip calculation
return null;
}
// ... rest of existing KPI calculation logic
// This logic will now run for BOTH continuous and event-based triggers
```
This ensures availability and OEE calculations work correctly whether triggered by machine cycles or scrap submission.
**Side Effects:**
- Will trigger Issue 4 more severely → MUST implement Phase 2 throttling first
- Database might receive more frequent updates
- Global variables will change more often
**Rollback:** Medium difficulty - requires restoring original return statements and wire configuration
---
#### 3.2 Fix Availability/OEE Drops to 0 (Issue 3)
**Files:** `projects/Plastico/flows.json` → Calculate KPIs function
**Investigation Steps:**
1. Read full Calculate KPIs function
2. Identify all paths that set `msg.kpis.availability = 0`
3. Add logging to track when this occurs
4. Understand state flow: trackingEnabled, productionStartTime, operatingTime
**Hypothesis Testing:**
```javascript
// Add debug logging at the start
node.warn(`[KPI] trackingEnabled=${trackingEnabled}, startTime=${productionStartTime}, opTime=${operatingTime}`);
// Before setting availability to 0
if (/* condition that causes 0 */) {
node.warn(`[KPI] Setting availability to 0 because: [reason]`);
msg.kpis.availability = 0;
}
```
**Likely Fix:**
```javascript
// BEFORE
} else {
msg.kpis.availability = 0; // Not running
}
// AFTER
} else {
// Check if production was recently active
const prev = global.get("lastKPIValues") || {};
if (prev.availability > 0 && operatingTime > 0) {
// Maintain last availability if we have operating time
msg.kpis.availability = prev.availability;
} else {
msg.kpis.availability = 0;
}
}
// Store KPIs for next iteration
global.set("lastKPIValues", msg.kpis);
```
**Testing:**
1. Start production
2. Monitor availability values
3. Trigger scrap prompt
4. Verify availability doesn't drop to 0
5. Check OEE calculation
**Potential Issues:**
- Might mask legitimate 0% availability (machine actually stopped)
- Could create artificially high availability readings
- State persistence might cause issues after restart
**Mitigation:**
- Add clear conditions for when availability should legitimately be 0
- Reset lastKPIValues on work order completion
- Add production state tracking
**Rollback:** Easy if logging added first - can revert based on log analysis
---
#### 3.3 Fix START/STOP Button State (Issue 1b)
**Files:** `projects/Plastico/flows.json` → Home Template
**Problem:** Button doesn't show correct state (STOP when production running)
**Investigation:**
- Find button rendering logic in Home template
- Check how `trackingEnabled` or `productionStarted` is tracked
- Verify message handler receives state updates
**Changes:**
```javascript
// In Home Template scope.$watch
if (msg.topic === 'machineStatus') {
window.machineOnline = msg.payload.machineOnline;
window.productionStarted = msg.payload.productionStarted;
// NEW: Track tracking state for button display
window.trackingEnabled = msg.payload.trackingEnabled || window.productionStarted;
scope.renderDashboard();
return;
}
```
**Button HTML Update:**
```html
```
**Backend Update (Work Order buttons):**
```javascript
// When START clicked, also set trackingEnabled flag
if (action === "start-tracking") {
global.set("trackingEnabled", true);
// CRITICAL: Clear KPI buffer on production start
// Prevents stale data from skewing averages if Node-RED was restarted mid-production
global.set("kpiBuffer", []);
node.warn('[START] Cleared kpiBuffer for fresh production run');
// Optional: Reset last record time to ensure immediate data point
global.set("lastKPIRecordTime", Date.now() - 60000);
// Send state update to UI
const stateMsg = {
topic: "machineStatus",
payload: {
machineOnline: true,
productionStarted: true,
trackingEnabled: true
}
};
// ... send stateMsg to Home template
}
```
**Why Clear Buffer on START:**
If Node-RED restarts during a production run and context is restored from disk, the `kpiBuffer` might contain stale data from before the restart. When production resumes, new data would be mixed with old data, skewing the averages. Clearing on START ensures a clean slate for each production session.
**Testing:**
1. Load dashboard
2. Start work order
3. Verify START button changes to STOP
4. Click STOP (if implemented)
5. Verify button changes back to START
**Potential Issues:**
- Need to implement STOP button handler if it doesn't exist
- State sync between backend and frontend
- Button might flicker during state transitions
**Rollback:** Easy - remove button visibility conditions
---
## Implementation Order & Dependencies
### Recommended Sequence:
1. **Phase 1.1** - Fix Filters (Independent, low risk)
2. **Phase 1.2** - Fix Empty Graphs (Independent, low risk)
3. **Phase 2.1** - Add Throttling (Required before Phase 3.1)
4. **Phase 3.2** - Fix Availability Calculation (Add logging first)
5. **Phase 3.1** - Fix Continuous KPI Updates (Depends on throttling)
6. **Phase 3.3** - Fix Button State (Can be done anytime)
### Why This Order?
1. **Quick wins first** - Build confidence, improve UX immediately
2. **Throttling before continuous updates** - Prevent performance issues
3. **Logging before logic changes** - Understand problem before fixing
4. **Independent fixes can run parallel** - Save time
---
## Testing Strategy
### Per-Phase Testing:
- Test each phase independently
- Don't proceed to next phase if current fails
- Keep backup of working state
### Integration Testing (After All Phases):
1. **Fresh Start Test**
- Clear browser cache
- Restart Node-RED
- Load dashboard
- Navigate all tabs
2. **Production Cycle Test**
- Start new work order
- Click START
- Let run for 2-3 minutes
- Submit scrap
- Verify KPIs update
- Check graphs show data
- Test time filters
3. **State Persistence Test**
- Refresh page during production
- Verify state restores correctly
- Check button shows STOP if running
4. **Edge Cases**
- No active work order
- Machine offline
- Zero production time
- Rapid start/stop
---
## Rollback Plan
### Per-Phase Rollback:
Each phase documents its rollback procedure. In general:
1. **Stop Node-RED**
2. **Restore flows.json from backup**
```bash
cp projects/Plastico/flows.json.backup projects/Plastico/flows.json
```
3. **Clear global context** (if needed)
```javascript
// In a debug node
global.set("lastKPIRecordTime", null);
global.set("kpiBuffer", null);
global.set("lastKPIValues", null);
```
4. **Restart Node-RED**
5. **Clear browser cache**
### Emergency Full Rollback:
```bash
# Restore from most recent backup
cp projects/Plastico/Respaldo_MVP_Complete_11_23_25.json projects/Plastico/flows.json
# Restart Node-RED
node-red-restart
```
---
## Potential Roadblocks & Mitigations
### Roadblock 1: Global Context Persistence on Deploy/Restart ⚠️ CRITICAL
**Symptom:** After Node-RED restart or deploy, throttling/averaging/availability logic breaks or shows incorrect data
**Root Cause:** Global variables (`lastKPIRecordTime`, `kpiBuffer`, `lastKPIValues`, `trackingEnabled`) may be reset or restored from file/memory store depending on settings.js configuration
**Mitigation:**
1. **Add Robust Initialization Logic:**
```javascript
// In Record KPI History function - ALWAYS check and initialize
let buffer = global.get("kpiBuffer");
if (!buffer || !Array.isArray(buffer)) {
buffer = [];
global.set("kpiBuffer", buffer);
}
let lastRecordTime = global.get("lastKPIRecordTime");
if (!lastRecordTime || typeof lastRecordTime !== 'number') {
// Set to 1 minute ago to ensure immediate recording on startup
lastRecordTime = Date.now() - 60000;
global.set("lastKPIRecordTime", lastRecordTime);
}
```
2. **Create an Init Node:**
- Add a dedicated "Initialize Global Variables" function node
- Trigger on deploy using an inject node (inject once, delay 0)
- Wire to all critical nodes to ensure state is set before first execution
**Complete Init Node Code:**
```javascript
// Initialize Global Variables - Run on Deploy
node.warn('[INIT] Initializing global variables');
// KPI Buffer for averaging
if (!global.get("kpiBuffer")) {
global.set("kpiBuffer", []);
node.warn('[INIT] Set kpiBuffer to []');
}
// Last KPI record time - set to 1 min ago for immediate first record
if (!global.get("lastKPIRecordTime")) {
global.set("lastKPIRecordTime", Date.now() - 60000);
node.warn('[INIT] Set lastKPIRecordTime');
}
// Last machine cycle time - set to now to prevent immediate 0% availability
if (!global.get("lastMachineCycleTime")) {
global.set("lastMachineCycleTime", Date.now());
node.warn('[INIT] Set lastMachineCycleTime to prevent 0% availability on startup');
}
// Last KPI values
if (!global.get("lastKPIValues")) {
global.set("lastKPIValues", {});
node.warn('[INIT] Set lastKPIValues to {}');
}
node.warn('[INIT] Global variable initialization complete');
return msg;
```
3. **Check settings.js:**
- Verify contextStorage configuration
- Consider using `file` storage for persistence if using `memory` (default)
**Testing:**
- Deploy changes multiple times
- Restart Node-RED
- Verify variables persist/initialize correctly
- Check debug logs for initialization messages
---
### Roadblock 2: State Sync Between Flow and Dashboard (Push vs Pull Model)
**Symptom:** START/STOP button shows wrong state when user loads dashboard mid-production
**Root Cause:** Relying on push model (messages sent during state changes) - if user loads page after tracking started, initial message is missed
**Mitigation:**
1. **Add Pull Mechanism in Home Template:**
```javascript
// In Home Template initialization
(function(scope) {
// Request current state on load
scope.send({
topic: "requestState",
payload: {}
});
// Handle state response
scope.$watch('msg', function(msg) {
if (msg && msg.topic === 'currentState') {
window.trackingEnabled = msg.payload.trackingEnabled;
window.productionStarted = msg.payload.productionStarted;
window.machineOnline = msg.payload.machineOnline;
scope.renderDashboard();
}
// ... rest of watch logic
});
})(scope);
```
2. **Add State Response Handler:**
- Create function node that listens for `requestState` topic
- Responds with current global state values
- Wire to Home template
**Testing:**
- Start production
- Open dashboard in new browser tab
- Verify button shows STOP immediately
- Test with multiple browser sessions
---
### Roadblock 3: UI/Angular Timing Races in ui-template ⚠️ HIGH IMPACT
**Symptom:** Charts sometimes load, sometimes don't - fixed timeout (500ms) is unreliable on slow systems or complex templates
**Root Cause:** Node-RED Dashboard uses AngularJS - digest cycle and DOM rendering timing is unpredictable
**Mitigation Option A - Data-Driven Initialization (RECOMMENDED):**
```javascript
// Instead of fixed timeout, wait for first data
let chartsInitialized = false;
scope.$watch('msg', function(msg) {
if (msg && msg.kpis && !chartsInitialized) {
// First data arrived, scope is ready
initFilters();
createCharts(currentRange);
chartsInitialized = true;
}
if (chartsInitialized && msg && msg.kpis) {
updateCharts(msg);
}
});
```
**Mitigation Option B - Angular Lifecycle Hook:**
```javascript
// Hook into Angular's ready state
scope.$applyAsync(function() {
// DOM and scope guaranteed ready
initFilters();
createCharts(currentRange);
});
```
**Mitigation Option C - Polling with Timeout:**
```javascript
function initWhenReady(attempts = 0) {
const oeeEl = document.getElementById("chart-oee");
if (oeeEl && scope.gotoTab) {
// Both DOM and scope ready
initFilters();
createCharts(currentRange);
} else if (attempts < 20) {
// Retry every 100ms, max 2 seconds
setTimeout(() => initWhenReady(attempts + 1), 100);
} else {
console.error("Failed to initialize charts after 2 seconds");
}
}
// Start polling
initWhenReady();
```
**Recommendation:** Use Option A for most reliable results
---
### Roadblock 4: Throttling vs Live Display Trade-off
**Symptom:** With averaging, displayed KPIs are stale (up to 59 seconds old), but without averaging, graphs are jerky
**Root Cause:** OEE is a real-time snapshot - averaging smooths graphs but delays live feedback
**Solution: Dual-Path KPI Updates**
**Architecture:**
- **Path 1 (Live):** Machine Cycles → Calculate KPIs → Home Template (no throttling)
- **Path 2 (History):** Machine Cycles → Calculate KPIs → Averaging Buffer → Record History (throttled to 1 min)
**Implementation:**
```javascript
// In Calculate KPIs function - send to TWO outputs
return [
msg, // Output 1: Live KPI to Home Template (unthrottled)
{ ...msg } // Output 2: KPI to History (will be throttled)
];
```
**In Record KPI History - add averaging logic:**
```javascript
// Only this node has averaging/throttling
let buffer = global.get("kpiBuffer") || [];
buffer.push({
timestamp: Date.now(),
oee: msg.kpis.oee,
availability: msg.kpis.availability,
performance: msg.kpis.performance,
quality: msg.kpis.quality
});
const lastRecord = global.get("lastKPIRecordTime") || 0;
const now = Date.now();
if (now - lastRecord >= 60000) {
// Average the buffer
const avg = {
oee: buffer.reduce((sum, d) => sum + d.oee, 0) / buffer.length,
// ... other metrics
};
// Record averaged values to history
// Send to Graphs template
global.set("lastKPIRecordTime", now);
global.set("kpiBuffer", []);
return { kpis: avg };
} else {
global.set("kpiBuffer", buffer);
return null; // Don't record yet
}
```
**Benefits:**
- Live display always shows current OEE
- Graphs are smooth with averaged data
- No UX compromise
---
### Roadblock 5: Availability 0% Logic Too Simplistic
**Symptom:** Availability drops to 0% during brief pauses (scrap submission) but also might NOT drop to 0% during legitimate stops (breaks, maintenance)
**Root Cause:** Using previous value without time-based threshold can't distinguish brief interruption from actual shutdown
**Improved Logic:**
```javascript
// In Calculate KPIs function
const now = Date.now();
const lastCycleTime = global.get("lastMachineCycleTime") || now;
const timeSinceLastCycle = now - lastCycleTime;
const BRIEF_PAUSE_THRESHOLD = 5 * 60 * 1000; // 5 minutes
if (!trackingEnabled || timeSinceLastCycle > BRIEF_PAUSE_THRESHOLD) {
// Legitimately stopped or long pause
msg.kpis.availability = 0;
global.set("lastKPIValues", null); // Clear history
} else if (operatingTime > 0) {
// Calculate normally
msg.kpis.availability = calculateAvailability(operatingTime, plannedTime);
global.set("lastKPIValues", msg.kpis);
} else {
// Brief pause - maintain last known value
const prev = global.get("lastKPIValues") || {};
msg.kpis.availability = prev.availability || 0;
}
// NOTE: lastMachineCycleTime is updated in Machine Cycles function ONLY
// This keeps the "machine pulse" signal clean and separate from KPI calculation
```
**Configuration:**
- Adjust `BRIEF_PAUSE_THRESHOLD` based on your production environment
- Consider making it configurable via dashboard setting
---
### Roadblock 6: KPI Calculation Performance
**Symptom:** System slow after implementing continuous KPI updates
**Mitigation:**
- Implement Phase 2 throttling FIRST (now with dual-path approach)
- Ensure Calculate KPIs has guards for null/undefined inputs
- Profile Calculate KPIs function for optimization
- Monitor Node-RED CPU usage during production
---
### Roadblock 7: Browser Cache Issues
**Symptom:** Changes don't appear after deployment
**Mitigation:**
- Clear browser cache during testing (Ctrl+Shift+R / Cmd+Shift+R)
- Add cache-busting version to template (optional):
```javascript
// In template header
```
- Use incognito/private browsing for testing
- Test on different browsers/devices
---
## Success Criteria
### Phase 1:
- ✅ Time filters change graph display correctly
- ✅ Graphs load on first visit without refresh
- ✅ Sidebar navigation works immediately
### Phase 2:
- ✅ Graph updates occur at ~1 minute intervals
- ✅ Graphs are smooth, not jerky
- ✅ No performance degradation
### Phase 3:
- ✅ KPIs update continuously during production
- ✅ Availability never incorrectly shows 0%
- ✅ START button shows STOP when production running
- ✅ OEE calculation is accurate
### Integration:
- ✅ All features work together without conflicts
- ✅ No console errors
- ✅ Production tracking works end-to-end
- ✅ Data persists correctly
---
## Estimated Timeline
| Phase | Task | Time | Cumulative |
|-------|------|------|------------|
| 1.1 | Fix Filters | 15 min | 15 min |
| 1.2 | Fix Empty Graphs | 15 min | 30 min |
| 2.1 | Add Throttling | 45 min | 1h 15m |
| 3.2 | Fix Availability (with logging) | 30 min | 1h 45m |
| 3.1 | Fix Continuous Updates | 30 min | 2h 15m |
| 3.3 | Fix Button State | 20 min | 2h 35m |
| Testing | Integration Testing | 30 min | 3h 5m |
**Total: ~3 hours** (assuming no major roadblocks)
---
## Best Practices for LLM-Assisted Implementation
When working with an LLM to implement this plan, use these strategies for best results:
### 1. Isolate Logic Focus (Function Node Precision)
**DO:**
- Ask for specific function node code: "Write the Record KPI History function with averaging logic including global.get initialization"
- Provide exact input/output requirements: "This function receives msg.kpis object and must return msg or null"
- Request one change at a time
**DON'T:**
- Ask vague questions like "fix my dashboard"
- Request multiple phase changes in one prompt
- Assume LLM knows your flow structure
### 2. Explicitly Define Global Variables
**Template for LLM prompts:**
```
Global variable: kpiBuffer
Type: Array of objects
Structure: [{timestamp: number, oee: number, availability: number, performance: number, quality: number}]
Lifecycle: Initialized to [] if null, cleared after recording to history
Purpose: Accumulates KPI values for 1-minute averaging
```
**Always specify:**
- Variable name
- Data type
- Default/initial value
- When it's read/written
- When it should be cleared
### 3. Specify Node-RED Input/Output Requirements
**Example prompt:**
```
The Machine Cycles function node must have 3 outputs:
- Output 1: DB write message (only when tracking enabled)
- Output 2: State update message (always sent)
- Output 3: KPI trigger message (always sent for continuous updates)
The return statement should be:
return [dbMsg, stateMsg, kpiTrigger];
```
### 4. Request Defensive Code
**Always ask for:**
- Null/undefined checks before accessing properties
- Type validation for global variables
- Initialization logic at the start of functions
- Error handling for edge cases
**Example:**
```javascript
// BAD (LLM might generate)
const buffer = global.get("kpiBuffer");
buffer.push(newValue);
// GOOD (what you should request)
let buffer = global.get("kpiBuffer");
if (!buffer || !Array.isArray(buffer)) {
buffer = [];
}
buffer.push(newValue);
global.set("kpiBuffer", buffer);
```
### 5. Break Down Complex Changes
**For Phase 3.1 (Continuous KPI Updates), ask in sequence:**
1. "Show me the current return statements in Machine Cycles function"
2. "Modify the function to add a third output for KPI trigger"
3. "Update all return statements to include kpiTrigger message"
4. "Show me how to wire the third output to Calculate KPIs node"
### 6. Request Testing/Debugging Code
**Ask LLM to include:**
- Debug logging: `node.warn('[KPI] Buffer size: ' + buffer.length);`
- State validation: Check that variables have expected values
- Error messages: Descriptive messages for troubleshooting
### 7. Validate Against Node-RED Constraints
**Remind LLM of Node-RED specifics:**
- "This is a Node-RED function node, not regular JavaScript"
- "Global context uses global.get/set, not regular variables"
- "The msg object must be returned to send to next node"
- "Use node.warn() for logging, not console.log()"
### 8. Phase-by-Phase Verification
**After each LLM response:**
1. Verify the code matches the plan
2. Check for initialization logic
3. Confirm output structure matches wiring
4. Ask: "What edge cases does this handle?"
### 9. Example: Perfect LLM Prompt for Phase 2.1
```
I need to implement KPI throttling with averaging in Node-RED.
Context:
- Function node: "Record KPI History"
- Input: msg.kpis object with {oee, availability, performance, quality}
- Output: Averaged KPI values sent to Graphs template (or null if not ready to record)
Global variables needed:
1. kpiBuffer (Array): Accumulates KPI snapshots. Initialize to [] if null.
2. lastKPIRecordTime (Number): Last timestamp when history was recorded. Initialize to (Date.now() - 60000) if null for immediate first recording.
Requirements:
- Accumulate incoming KPIs in kpiBuffer
- Every 60 seconds (60000ms), calculate average of all buffer values
- Send averaged KPIs to output
- Clear buffer after sending
- If less than 60 seconds since last record, return null (don't send)
Please write the complete function with:
- Robust initialization (check and set defaults)
- Debug logging (buffer size, time until next record)
- Comments explaining each section
- Edge case handling (empty buffer, first run)
```
### 10. Common Pitfalls to Avoid
1. **Assuming LLM knows your flow structure** - Always describe node connections
2. **Not specifying Node-RED context** - LLM might give generic JavaScript instead
3. **Requesting too many changes at once** - Break into single-phase requests
4. **Forgetting to mention global variable persistence** - Specify initialization needs
5. **Not asking for defensive code** - Request null checks and type validation
6. **Vague success criteria** - Define exactly what "working" means
---
---
## Quick Reference: Key Code Snippets
### 1. Init Node (Run on Deploy)
```javascript
// Initialize Global Variables - Inject Once on Deploy
node.warn('[INIT] Initializing global variables');
if (!global.get("kpiBuffer")) global.set("kpiBuffer", []);
if (!global.get("lastKPIRecordTime")) global.set("lastKPIRecordTime", Date.now() - 60000);
if (!global.get("lastMachineCycleTime")) global.set("lastMachineCycleTime", Date.now());
if (!global.get("lastKPIValues")) global.set("lastKPIValues", {});
node.warn('[INIT] Complete');
return msg;
```
### 2. Machine Cycles - Add to Final Return
```javascript
// Update last machine cycle time when a successful cycle occurs
if (trackingEnabled && dbMsg) {
global.set("lastMachineCycleTime", Date.now());
}
return [dbMsg, stateMsg, kpiTrigger];
```
### 3. Calculate KPIs - Multi-Source Guard
```javascript
const trackingEnabled = global.get("trackingEnabled");
const activeOrder = global.get("activeOrder") || {};
if (!trackingEnabled || !activeOrder.id) return null;
// ... rest of calculation
```
### 4. Work Order START Button - Clear Buffer
```javascript
if (action === "start-tracking") {
global.set("trackingEnabled", true);
global.set("kpiBuffer", []); // Clear stale data
global.set("lastKPIRecordTime", Date.now() - 60000);
// ... send state update
}
```
### 5. Graphs Template - Combined Init
```javascript
let chartsInitialized = false;
scope.$watch('msg', function(msg) {
if (msg && msg.payload && msg.payload.kpis && !chartsInitialized) {
initFilters();
createCharts(currentRange);
chartsInitialized = true;
}
if (chartsInitialized && msg && msg.payload && msg.payload.kpis) {
updateCharts(msg);
}
});
setTimeout(() => {
if (!chartsInitialized) {
initFilters();
createCharts(currentRange);
chartsInitialized = true;
}
}, 5000);
```
---
## Final Notes
1. **Backup First:** Always backup `flows.json` before starting each phase
2. **Test Incrementally:** Don't skip testing between phases
3. **Document Changes:** Note any deviations from plan
4. **Monitor Logs:** Watch Node-RED debug output during testing
5. **Clear Cache:** Browser cache can mask issues
6. **Use LLM Strategically:** Follow the best practices above for precise, working code
**If you encounter issues not covered in this plan, STOP and ask for help before proceeding.**