The value of any performance measure depends on how well it is defined, understood, and applied to real situations. This is equally true for measures of “reliability,” where there is enormous potential either intentionally or accidentally to present data that may be unrepresentative or misunderstood.

It is for good reason that the truth of the saying — “There are three kinds of lies: lies, damned lies, and statistics” — is well established and often quoted. Its implications in the exploration and production (E&P) business, however, are not fully appreciated. It will suffice to reflect on just a few sources of potential confusion in the interpretation of familiar reliability measures. Although the examples given are simple, they all come from real (often repeated) cases in my E&P experience at Shell.

Mean time between failures
The mean time between failures (MTBF) of various items are quoted widely in reliability databases. It is — as the name suggests — the arithmetic average of the times between failures; if there are n failures there are n-1 times between failures. From a practical perspective, it is convenient to treat the beginning and end points of the time interval as if they were failures, but this isn’t strictly MTBF.

MTBF also appears to be commonly thought of as a measure of when an item will fail, e.g., MTBF of 1 year means the item fails every year. This is not true. MTBF describes less than 50% of failure characteristics, and even where it does apply, it can be shown that there is (approximately) a 2 in 3 chance that an item fails before reaching its MTBF.

Use of MTBF also contains several underlying assumptions. Among the more important are that reliability is characterized by time. In practice, the “age” of equipment may be better characterized by running hours or number of operating cycles.

MTBF and large scale behavior
Often it is the performance of large-scale integrated systems that directly affect the business bottom line. Of interest here is the relationship between system and component MTBF.
Consider two pumps in series and take three examples to illustrate that the MTBF of the system could easily be more, less, or the same as either pump.

The MTBF of subsystems and large-scale behavior are not related simply and — because real life data are rarely as transparent as these MTBF examples (even those from databases) — often need scrutiny before being translated into useful information.

What is availability?
Often confused with reliability, and sometimes used almost interchangeably, availability is a key measure of “ability to produce,” and is the fraction of a specified time interval that the system is able to function. This can be thought of in its simplest form as

A= Uptime/Uptime + Downtime.

Even this simple definition, however, leads to different measures of availability. Two of the more important issues that are common in continuous production processes depend on the interval chosen and how downtime is defined.

Availability is always measured over a time interval, e.g., daily, annually, between shutdowns. The longer the interval, the less the variability. Sometimes, however, availability is reported over intervals that may obscure real performance. I have seen the availability of a continuous operation plant reported as close to 100%. While accurate for the preceding three months, it was not representative, as this plant had recently been returned to service after almost 14 months downtime! It is good practice to verify that the time period specified does not distort the statistic. Compare any availability measured over a short period — say less than 1 year — with availability over a longer period.

Downtime should include all sources of downtime, including planned events. The difference between including and excluding planned outages can be very significant, although is not always obvious when “the availability” is quoted in a report.

Equipment and large-scale behavior

It is common to see “average of an average” calculations used to consolidate data. Here, it is important to understand what the average represents.

Consider three areas – each monitors its pump availability monthly. For clarity, the example uses many pumps, several of which have the same availability. The availabilities are in round numbers; in real data, all of these are warning signs, particularly any percentage that is a multiple of 5, 10, 12.5, or 33.3!

So what is the average availability over the month? Is it 70.6%, being the average of the three area averages, or 56.2%, the average of the pump availabilities?
As with all statistical data, it is always worth asking, “Average of what?”

Conclusion
The potential for accidental or intentional misrepresentation of even simple data is high. Charts and statistics give the impression of accuracy and, although analysts have a responsibility to explain the basis for their results, those using them also need to verify that they provide real information needed for the decision-making process.