When you see a problem occurring in your business, the most natural and important response is to ask “why is this happening?”
This is typically when maintenance organizations begin to think about the root cause.
A root cause is the most foundational issue that results in a series of cause and effects where you ultimately see a visible problem, many times without knowing the underlying issue.
In the maintenance world, root causes can ultimately lead to problems like increased wrench time, increase in unplanned downtime, etc.
To determine the underlying cause of a problemed business process or asset, you perform a Root Cause Analysis, often referred to as RCA.
It may be unrealistic that any small problem warrants a full root cause analysis, the rule of thumb would be to start with the…
- Most critical assets
- Assets that fail the most
- Assets that are used across the board (aka repeatable solutions)
Take these 7 Steps to Approach a Root Cause Analysis (RCA)
- Define the problem: in its entirety. ex: tire pressure indicator comes on
- Collect data: dig deeper and find what data you need surrounding the problem, its potential causes, and its effects. Ask: What will help you understand what caused the problem ex: what is the temperature outside? The colder it is the pressure will change, has the car been running for a long distance? Is it a continuous pressure loss? When was the last time the tires were filled?
- Ask the 5 whys: Ask why did this happen? Based on your answer, continue asking why five times to get to the final underlying reason.
- Determine the symptoms and factors of the problem.
- Symptom: car didn’t stop
- Factor: break pads are worn
- Identify corrective actions based upon the root cause finding.
- Identify solutions to help the problem from recurring. This may include creating an accurate and detailed job plan to fix with ease if it happens again.
- Implement solutions to address the root cause, not just the problem. An example of fixing the problem and not the cause would be this: In the example of tire pressure loss, if you stop at a gas station and fill the tire, get back in the car and keep driving, have you fixed the problem? It may happen again. Think about the overall impact this would have on your trip - stopping to put air in your tires instead of finding the root cause and addressing it.
There are several things that will need to be taken into consideration when you are conducting an RCA:
- The importance of good Data
- Having good data is always important! It gives you an accurate picture of what is happening now, what happened in the past, and how to remedy in the future. When your data is inaccurate, incomplete or inconsistent, you lack a full and accurate picture of what is happening with the assets, and it will take longer to identify the solution resulting in increased downtime and costs to repair.
- Make sure you have standards for data in all of your assets and processes. Data standards are important as they will improve the consistency of your data. Value lists help ensure the right data is being entered over free text.
- asset specific data
- asset attributes, work order history – how often has it failed in the past?
- How many hours of mechanics time is associated with this asset?
- How many spare parts have we been using?Consider the following examples as a starting point to show information that is actionable:
- Know when to Repair v. Replace
- It's always important to understand the cost difference in repairing and replacing. For some assets, it will be more cost effective to repair assets that aren’t operating properly. For others, it may be more efficient to replace the asset. Its important to know the difference.
- These help define parameters around your assets and business processes. Knowing where something is, and where it should be as a baseline and to be operating at peak efficiency is critical to understanding what the next best decision is.
- It is also important to ensure your KPIs and metrics are aligned with your objectives and goals.
- Utilize Automation and Monitoring
- Automation and monitoring can save maintenance and resource hours, and help you remedy problems faster. A solution that can take good data, put that against metrics and KPIs, and utilize monitoring and automation will alert you faster that there’s a problem about to occur and what needs to be done to get back to optimal functionality.
Root Cause Analysis can be as hard as we make it and typically, we cannot afford the resources to conduct them across the board. So, identify the most critical assets and bad actors and start with those. Remember, this is part of implementing a culture of continuous improvement and the journey is not one and done. If you are reactive by nature with your maintenance and repair activities, more than likely, you may be missing opportunities to improve. Make the time to start the process of conducting and RCA on fails and you will start to see improvements. Rewards folks for taking this approach (preventing fires) as well as putting out fires. Preventing fires is a much better approach!