?The industrial revolution was about speed, scale, and specialization. The information revolution is about value, innovation, and collaboration. It all starts with numbers and the art of “making them speak.”

Applying analytics

Businessdictionary.com defines analytics as “the field of data analysis.” According to the website, “Analytics often involve studying past historical data to research potential trends, to analyze the effects of certain decisions or events, or to evaluate the performance of a given tool or scenario. The goal of analytics is to improve the business by gaining knowledge which can be used to make improvements or changes.” Leading companies understand that mining data faster and better than their competitors yields a competitive advantage. This understanding has made data analytics mainstream.

The scan finds obvious and subtle anomalies using complexity changes. A non-obvious anomaly is detected (red circle) near the end of the data stream. (Images courtesy of IO-hub)

The scan finds obvious and subtle anomalies using complexity changes. A non-obvious anomaly is detected (red circle) near the end of the data stream. (Images courtesy of IO-hub)

The zoomed view shows details of the non-obvious anomaly detected near the end of the data stream.

The zoomed view shows details of the non-obvious anomaly detected near the end of the data stream.

Many industries contend with the tsunami of data generated by networks, sensors, and computer processors, but few face the additional challenge of data quality – what to do when the accuracy of the data feed is questionable or the readings are considered unreliable. In the oil and gas industry, this occurs regularly due to harsh environments and difficult transmission conditions. Engineers have been using their knowledge, common sense, and experience to differentiate “good” from “bad” data, for instance identifying meaningful spikes from insignificant blips. This was marginally tenable when few data feeds where installed, but the increased data volume and the decreased number of petroleum engineers now render this exercise nearly impossible. Today, heaps of data are gathered but cannot be mined in real time. Instead, they are held in reserve and used for after-the-fact forensic analysis when an incident happens. Real-time data capture (installed to enable preventive analysis and actions) more often than not is used for a much less attractive return on investment – a mere forensic tool to understand what went wrong.

Many attempts have been made to overcome this challenge. The two most common approaches revolve around ruled-based and model-based technologies. Unfortunately, while these methodologies work very well to solve other challenges, the variability of data exceptions in upstream oil and gas sensor readings render these techniques ineffective.

A different approach

Claude Shannon’s information theory handles these challenges and provides a novel solution to several oil and gas data analytic problems. Classical information theory can determine the bandwidth requirements for transmitting messages of varying types. Information streams containing rapid variations and complicated patterns (e.g., oil and gas data streams) require greater bandwidth than streams consisting of simple patterns with few variations. Shannon used the term “entropy” to describe his measure of information content. The higher the entropy of an information stream, the more bandwidth it requires to transmit, or when compressed, the more disk space it requires to store. Perhaps surprisingly, Shannon’s entropy relates directly to the entropy of classical thermodynamics.

To help conceptualize this connection qualitatively

(and without arguing their truth or falsity), consider these approximately equivalent statements: “The universe’s entropy is increasing,” “The universe is approaching perfect randomness,” and “The universe is approaching a more complex ordering.” The last, more nuanced statement suggests entropy may be considered a direct measure of complexity. Data streams can be compared by their inherent complexity and their complexity relative to one another. Some shifts in a data stream’s complexity signal changes in the underlying well physics. Other shifts signal normal responses to control actions. Still others reflect erroneous sensor spikes and drift.

An analogy to some of Shannon’s less well-known work may help. He used simple statistical data analytics for identifying the language used in a transmitted message. Simple algorithms automatically built 2-D histograms of the occurrence frequency of letter pairs for typical messages sent in several languages. For example, English language messages very frequently contain “th” but not often “cz” pairings. Every language analyzed has its own letter-pair statistics. Information theory can characterize the complexity of each language’s letter-pair probability distribution and distinguish these distributions from one another. New messages can be classified among the known languages. A message in an unknown language is not classified but is identified as unknown or “anomalous.”

Likewise, data streams from an oil well (e.g., pressure, temperature, flow, choke valve travel, etc.) contain their own typical complexity. The various data streams might be scanned for complexity (information entropy) singly or in combination. Analyzed during normal well operations, a data stream metaphorically may be “speaking French.” During methanol injection, the well might start “speaking Russian.” Or, a new previously unseen condition may arise that can be flagged as anomalous, perhaps requiring attention.

Using such methods, it is possible to detect precursor events (predictive or not), major events, and data trends as distinguished from erroneous sensor drift. The next step is to accumulate data for massive real-time parallel correlation of multiple events, offering and then exploiting a holistic view of the data to better identify normal versus abnormal operation. This allows the automated correlating of data stream combinations neither presently done nor considered feasible.

Applying algorithms

During the last year, IO-hub took a step in this direction, receiving multiple high-frequency data streams for several wells in the Gulf of Mexico. A typical well generated 15 to 20 data streams with new data readings every few seconds, covering months of operation. Half of the streams monitored well conditions (e.g., various downhole, annulus, and wellhead pressures and temperatures); the remaining data streams recorded actions taken (e.g., methanol injection, choke valve travel, and master and crossover valve settings). Most of the time, pressure and temperature streams showed steady stable behavior. All streams showed a fair number of irregular and very noticeable short duration fluctuations. Faced with reams of data from multiple wells, all clearly exhibiting many intermittent “data events,” the engineer wanted to know whether the well was in good health. Being fire-hosed with data, where should he spend his time?

IO-hub software scanned all data streams and identified all anomalies (the intermittent data events of unknown importance) in all data streams. When the data were assembled into a time-based “event table,” it quickly became clear that all anomalous events in the pressure and temperature data were associated with at least one “causing action” (e.g., methanol or chemical injection, change in choke valve position, change in crossover valve state, etc.). Learning this, the engineer concluded the wells were behaving normally.

This solution – while done quickly, with substantial time savings over typical largely manual methods – was not fully automatic. Work continues toward automating the process. When done, the software detects and classifies data events, alerting the petroleum engineers only to 1) as yet unclassified anomalies or 2) event classes the engineers have indicated they want flagged.

Rather than mining numbers after the fact to understand what went wrong, this approach offers the ability to anticipate and avoid potentially catastrophic and costly events. Furthermore, the technology lends itself to smart recording and transmission by prioritizing the essential data from the mundane.

The information revolution is about a new approach to data and determining its usefulness. This novel approach automatically can clean and analyze data to identify precursor events in each data stream, enhancing its value by correlating this analysis over multiple concurrent real-time data streams.