"You cannot manage what you do not measure." In the industrial world, Key Performance Indicators (KPIs) act as the cockpit instrumentation for plant management. Without them, decisions are made on gut feeling, which inevitably leads to catastrophic failure or excessive cost.

1. The KPI Hierarchy

Effective performance measurement is structured in a pyramid. If you measure everything, you measure nothing.

Strategic (Level 1)

**OEE, Maintenance Cost/RAV, Safety Record.** The boardroom metrics.

Tactical (Level 2)

**MTBF, Backlog (Wks), Schedule Compliance.** The department manager metrics.

Operational (Level 3)

**MTTR, PM Completion Rate, Re-work %**. The technician/team lead metrics.

2. Modern Maintenance Dashboard

Live Plant Health Dashboard
REF: ISO-14224-STD
MTBF
842 hrs
+12.4%
Planned %
78.2%
+5.1%
MTTR
2.4 hrs
-8.4%
Backlog
4.2 wks
+0.8%
Cost/RAV
2.1%
-0.2%
Schedule Compliance (Leading Indicator)
OEE Trend (Lagging Indicator)
The Gold Standard

3. OEE: The Ultimate Efficiency Metric

Overall Equipment Effectiveness (OEE) is the universal metric for measuring the percentage of planned production time that is truly productive. It is calculated by multiplying three factors:

OEE = Availability \times Performance \times Quality

Availability

Accounts for **Downtime Losses** (Breakdowns, Changeovers, Setup time).

Performance

Accounts for **Speed Losses** (Idling, Minor Stoppages, Reduced Speed operation).

Quality

Accounts for **Quality Losses** (Scrap, Rework, Yield loss during startup).

SMRP Best Practices

4. Financial KPIs: Maintenance as % of RAV

One of the most powerful "Top-Level" metrics is **Maintenance Cost as a Percentage of Replacement Asset Value (RAV)**.

The Benchmark

World-class facilities typically operate between **2.0% and 3.0%**. If your ratio is above 5%, you are likely in a "firefighting" loop with high emergency spending. If it's below 1%, you are likely under-maintaining, which will lead to a "Reliability Debt" that eventually manifests as catastrophic failure.

Workforce Engineering

5. Wrench Time: The Productivity Leak

A common misconception is that a technician working 8 hours a day is 100% productive. In reality, the **Wrench Time** ΓÇö the actual time spent performing maintenance ΓÇö is often as low as 25-35%.

Where does the time go?

The "Non-Productive" 65% is consumed by: **Searching for parts (20%)**, **Travel time (15%)**, **Waiting for instructions/permits (15%)**, and **Administrative paperwork (15%)**.

By measuring Wrench Time through "Day-in-the-Life" (DILO) studies or CMMS data analysis, organizations can identify systemic bottlenecks. Increasing wrench time from 30% to 45% effectively increases the maintenance workforce by 50% without hiring a single new person.

Loss Categorization

6. The 6 Big Losses of OEE

To fix OEE, you must understand where the losses occur. Total Productive Maintenance (TPM) categorizes these into six specific buckets:

1. Equipment Failure

Large-scale downtime events (Breakdowns).

2. Setup & Adjustments

Time lost during changeovers or machine tuning.

3. Idling & Minor Stoppages

The "micro-stops" that aren't recorded as breakdowns but kill performance.

4. Reduced Speed

Running the machine slower than its nameplate capacity.

Workforce Engineering

5. Wrench Time: The Productivity Leak

A common misconception is that a technician working 8 hours a day is 100% productive. In reality, the **Wrench Time** ΓÇö the actual time spent performing maintenance ΓÇö is often as low as 25-35%.

Where does the time go?

The "Non-Productive" 65% is consumed by: **Searching for parts (20%)**, **Travel time (15%)**, **Waiting for instructions/permits (15%)**, and **Administrative paperwork (15%)**.

By measuring Wrench Time through "Day-in-the-Life" (DILO) studies or CMMS data analysis, organizations can identify systemic bottlenecks. Increasing wrench time from 30% to 45% effectively increases the maintenance workforce by 50% without hiring a single new person.

Loss Categorization

6. The 6 Big Losses of OEE

To fix OEE, you must understand where the losses occur. Total Productive Maintenance (TPM) categorizes these into six specific buckets:

1. Equipment Failure

Large-scale downtime events (Breakdowns).

2. Setup & Adjustments

Time lost during changeovers or machine tuning.

3. Idling & Minor Stoppages

The "micro-stops" that aren't recorded as breakdowns but kill performance.

4. Reduced Speed

Running the machine slower than its nameplate capacity.

Predictive Performance

7. Leading Indicators: The PM-to-CM Ratio

A critical leading indicator is the **PM-to-CM Ratio** (Preventive Maintenance to Corrective Maintenance). It measures the health of your maintenance strategy.

The 80/20 Rule

World-class maintenance organizations aim for an **80/20** ratio ΓÇö 80% proactive work (PM, PdM) and only 20% reactive work (CM). If your ratio is 50/50, your technicians are constantly "fighting fires," which means they lack the time to perform high-quality PMs, leading to even more failures. It is a death spiral that can only be broken by rigorous schedule compliance.

Maintainability Analysis

8. MTTR: Breaking Down the Repair Clock

Mean Time To Repair (MTTR) is often misunderstood as just "the time it takes to fix it." To actually improve MTTR, you must break it down into its constituent parts:

1. Detection & Notification

How long does it take for someone to notice and report the failure?

2. Response & Diagnosis

Travel time to the asset and the time required to find the root cause.

3. Parts & Tools Logistics

The biggest killer of MTTR ΓÇö waiting for the storeroom to find the spare parts.

4. Active Repair & Testing

The actual "wrench time" and the time to verify the asset is safe to run.

Case Study: Steel Mill Turnaround

From Chaos to Control: The Steel Mill

A high-output rolling mill suffered from 15% unplanned downtime. The management team was focused on "Tons Produced" (a lagging metric) and ignored the maintenance backlog.

The Intervention

The facility implemented a "Leading Metric" dashboard focusing on **Schedule Compliance** and **Backlog Health**. They discovered their backlog was at 12 weeks ΓÇö a state of total reactive chaos. By freezing non-critical work and focusing strictly on **Preventive Maintenance (PM) Compliance**, they reduced the backlog to 4 weeks over six months.

Result: Unplanned downtime dropped from 15% to 4.5%, and tons produced increased by 20% without adding a single machine.

Technical Encyclopedia
MTBF

Mean Time Between Failures. A measure of an asset's reliability (Total uptime / number of failures).

MTTR

Mean Time To Repair. A measure of an asset's maintainability (Total downtime / number of repairs).

Backlog

The amount of approved work not yet completed, usually measured in labor weeks.

RAV

Replacement Asset Value. The current cost to replace an asset with a new one of similar capacity.

Yield

The ratio of good units produced to the total units started in a process.

Uptime

The total time an asset is operational and capable of performing its intended function.

Share Article

Technical Standards & References

REF [SMRP-5]
Society for Maintenance & Reliability Professionals (2018)
SMRP Best Practices: Metrics and Definitions, 5th Edition
Published: SMRP Publications
VIEW OFFICIAL SOURCE
REF [ISO-14224]
ISO/TC 67 (2016)
ISO 14224:2016 - Collection and exchange of reliability and maintenance data
Published: International Organization for Standardization
VIEW OFFICIAL SOURCE
REF [TPM-NAKAJIMA]
Seiichi Nakajima (1988)
Introduction to TPM: Total Productive Maintenance
Published: Productivity Press
Mathematical models derived from standard engineering protocols. Not for human safety critical systems without redundant validation.