The oil and gas industry has had Big Data capabilities for decades now. Since about 2014, however, things have changed across multiple areas, simultaneously. The industry has gone through one of its most severe downturns, creating the need for increased productivity. At the same time, data volume and variety have continued their expansionary pace, coinciding with the takeoff of technologies from outside oil and gas, including the introduction of the cloud, Big Data management and a new generation of advanced machine learning (ML).

This rapid rate of change has seen the industry evolve into new technologies and business models at a breathtaking pace. These next-generation technologies are beginning their transition from conceptualization and the prototype phase into real commercial solutions. Along the way, operators are discovering what works and common pitfalls.

Emerging data lakes

Among the new wave of technologies, the most fundamental is perhaps the least glamorous—data management, a challenge the industry has worked to resolve for decades. The recent introduction of data lakes, a new approach to better manage disparate data sources and volumes, might finally move the industry ahead of the problem.

James Dixon, CTO at Pentaho, a Hitachi Group company, coined the term “data lake,” and he contrasts it to a data warehouse, saying the latter is more like a packaged bottle of water, “cleansed, packaged and structured for easy consumption.”

A data lake, on the other hand, is water in its natural state, with users being able to sample just what they need when they need it. A traditional data warehouse approach calls for laboriously scrubbing, filtering and transforming all the data as they come. It requires knowing the business processes involved and results in a rigid and limiting structure. A data lake keeps all the data and only transforms them upon request. This flexibility makes it perfect for data scientists to glean new insights. It is for this reason that many major operators are building out their own data lakes.

There are key differences between the data warehouse and data lake. (Source: Supply Chain Institute)


Enabling AI

The data lake also is the key enabling technology to unlock the power of modern artificial intelligence (AI). The success of the new generation of such capabilities rests on the ability to access massive volumes of training data. For the most part, the algorithms the industry is using today in ML existed decades ago. However, the new types of algorithms discussed under the rubric of deep learning can tune themselves by learning from trial and error.

For example, a convolutional neural network can identify trends at near human or better rates; ML requires 100,000 or more samples to learn from, for each narrow use case defined, demonstrating the value of a data lake as the source from which AI can learn because all the data and all data types remain available for inspection.

The industry is still in the early days of applying ML in oil and gas. That said, there are already some emerging classes of applications that lend themselves to early success. Organizations would like to apply ML to automate many routine human tasks, such as better understanding the reservoir, analyzing the performance of their equipment, locating all their data and providing virtual assistance using tools like Amazon’s Alexa.

Failure prediction

Among these use cases, perhaps the most success has been demonstrated in the prediction of equipment failure. Many vendors and operators are demonstrating early detection of failure signatures for the pump, motor or artificial lift failure. Perhaps one reason for success in these areas is that there is a relatively constrained set of characteristics to monitor and lots of historical data to train on. In many cases, vibration, temperature and power consumption variations on equipment, trended over time, are enabling the detection of failure conditions in advance of them occurring.

Though predictive equipment failure lends itself to AI, operators are generally not going to replace the equipment before it fails, limiting the value to helping companies be prepared in advance and reduce downtime.

Reservoir characterization

A larger value proposition for AI is reservoir characterization. Finding more oil more rapidly has perhaps the highest return on investment in the industry. Here, seismic data, well log records, core data and other sources are all being combined to unlock new insights.

For example, Emerson’s Democratic Neural Network Association’s (DNNA) ML methodology identifies hydrocarbon-bearing facies using seismic and well log inputs up to 90% of the time on training data. Rather than a geologist, geophysicist and petrophysicist working together to make sense of huge amounts of reservoir data, the DNNA ML, once trained, can be dispatched to detect these deposits. To be clear, the need to have well-qualified personnel does not disappear. The ML is great at identifying possible target rich zones, but it still requires knowledgeable users to root out false positives and select the best drilling target. Additionally, the AI has to be trained separately for each new reservoir.

Despite those constraints, applying ML for reservoir prediction is proving to be a powerful tool. Training the AI for new areas, where there is good data management, is not difficult. There is significant value in freeing up user time to focus on evaluating the AI predictions rather than having to start from scratch.


ML inputs for probabilistic lithofacies modeling demonstrate the effectiveness of AI in reservoir characterization. (Source: Emerson)


End-user assistance

Perhaps less successful so far has been the use of AI for end user assistance. It is one thing for a virtual assistance tool to turn off a light—a very binary decision— but another for it to understand the operational context and navigate complex workflow steps, stay within appropriate safeguards and take action merely by a simple user request. The current generation of narrow AI remains more fit for precise tasks rather than as all-purpose assistance tools.

There is a key exception. It is conceivable that a junior operator, perhaps wearing an augmented reality headset, could be given simple AI guidance (e.g., meter reading to inspect) to enable lower-cost field workers to perform more complicated operations. It is a new and promising application under industry evaluation but is at an earlier stage of deployment than the other approaches to ML covered earlier.

Migrating to the cloud

The cloud is an enabling technology advancing the adoption of superior data management and ML. With the rapid migration to public cloud providers like Amazon Web Services and Microsoft Azure, organizations tap into prebuilt systems optimized for both data lakes and AI, enabling direct access to Alexa or Cortana, and the applications created enable ease of access to all data as they come to reside in the single, cross-connected repository of the cloud.

There is, indeed, considerable technological change happening all at once, but oil and gas professionals, perhaps more than anyone else, know the importance of change. The Cambrian explosion helped usher in a new era of flora and fauna. However, that change took 25 million years to occur. Get ready. This time the industry is going to have to evolve a whole lot faster or face extinction.