Tessella – Putting the ‘data’ in Data Analytics

    Finding Petroleum


    Data Science

    Oil and gas companies are excited about the potential of data analytics. However, they struggle to move from a promising idea to something useable in everyday operations. The problem, says Dr. Warrick Cooke, consultant with data science company Tessella, is the data being fed in to their models.

    There is little doubt in oil and gas that data offers huge potential to improve efficiency and safety and save money. There is also little doubt that it is mostly failing to do so.

    A major reason for failure, says Dr. Warrick Cooke, consultant with data science company Tessella, is a focus on data tools and models, at the expense of the actual data itself. Garbage in, garbage out, as the old computing adage goes.

    There has recently been an explosion in easy to-use data tools, such as Microsoft Azure, says Dr Cooke. These are extremely user-friendly, and push users to be hands-on and try things out, allowing quite powerful data and machine learning models to be built with relatively little experience.

    Users can quickly come a long way with these tools. There are lots of simple tasks that they do well, and they are great for proof of concept models built on well understood test data.

    “But they are quite formulaic, and they don’t encourage good practice in ensuring results are repeatable when models are applied to messy real-life production data”, Dr Cooke says. The result is models which work on test data, but are not fit to be released into the wild.

    Dr Cooke makes an analogy to the early days of Visual Basic. “It opened up application development to a much broader audience, but many of these would then break once deployed. Eventually companies learned that making these applications a long-term success needed qualified software engineers.”

    Get the data right first

    Tessella has a history of working with the oil and gas industry to develop models and curate data. “We often find data is the biggest sticking point,” says Dr Cooke. “It can be fragmented, incorrectly labelled, missing information such as time or location, or not properly indexed.”

    He gives an example of an oil company looking at drill readings to analyse drill team performance.

    “Often data will not have consistent naming conventions, so two comparable pieces of data are recorded differently,” he says. “Equally, different data can be named identically. One company comparing asset performance was using the same Well ID across different regions. Our team needed to update the data before the model would work.”

    The list goes on. Data is captured in different formats, sometimes even as scanned pieces of paper. Metric and imperial units are mixed. Data is missing; Dr Cooke tells of a project using sensor data, where one sensor was down for half an hour, creating a gap in the time series. Models built using ideal datasets can’t deal with these inconsistencies.

    If something’s worth doing

    Models need good data to get good results. This means developing a system for naming things, and agreeing consistent data formats for wells, sensors and equipment. In most cases, it means considerable changes to existing data and data collection methods. This takes time and effort.

    Where data is missing, domain experts should assess what it should look like. A data analyst may be able to tell you what they expect it to look like, based on the past patterns, but this is risky. What if the absence of data was caused by an unexpected event? It is often the gaps that represent the most important information for training models to recognise warning signs. Domain experts have the contextual knowledge to fill these in.

    Even with good practice, real-life data is rarely perfect. Good models should be designed to cope with the unexpected. Problems such as inconsistent units or missing data can be overcome, but only if the problem has first been identified and the model trained to deal with it.

    Just as critical is testing the model on less than perfect data – of the sort it will encounter in the real world – to see how it performs. This allows problems to be identified and modifications
    made, either to the model or the data - to ensure it delivers meaningful insights.

    Testing should be ongoing. Expanding the model to new assets will bring new data problems which need to be factored in. This is true of any change, including when data sources such
    as new sensors are added to existing systems, or updates made to the model.

    “Building a good model is important, and modelling tools can be a good starting point to test ideas,” concludes Dr Cooke. “But if you want a model that works on real-world data and scales across diverse assets, you need to ensure data is properly curated, and models are rigorously designed and tested.”

    Original source: Finding Petroleum