Making Science and Engineering Data FAIR

    Jeroen de Jong

    FAIR data (Findable, Accessible, Interoperable and Reusable) makes data-driven decisions easy and accurate. Making data FAIR means getting scientists and engineers excited about sharing their data.


    Imagine you work in a science or engineering business (perhaps you do), and are responsible for strategic or operational decisions (perhaps you are).

    You need to take one of those decisions, and you want the data to know you are doing the right thing. Perhaps you want to know whether to deploy a new substation in an area with growing EV ownership. Perhaps you want to tweak a new drug formulation to get optimal yield as it moves from the lab to production.

    No problem. You call up one of your data scientists and ask her to find the answer. She logs on to your IT system, pulls up the different data sets she needs, calls up similar predictive models from past challenges, and with a few adjustments she has modelled your new challenge. Within hours you have your answer.

    This is an ideal many aspire to, but few are close. Such a system of well curated data would make modelling quick, and decisions more accurate. The problem is not the modelling capability, but access to reliable data.

    To get towards this ideal, we talk of creating FAIR data – data that is Findable, Accessible, Interoperable, and Reusable. A good data strategy should deliver FAIR data into the organisation, which should in turn create an organisation that can model complex problems quickly and accurately.

    FAIR data relies on people and culture

    The technical part of FAIR data is, if not easy, then at least straightforward for those with expertise.

    We do not plan to get technical here but for more information, my colleague Nick Cook provides a summary of the four elements here, and the excellent but exhaustive Data Management Body of Knowledge covers the detail. Plenty of experts including Tessella can take the technical side off your hands.

    The hard bit is getting the people creating the data to capture, label, and share it. The challenge, then, is creating organisational structures and cultures that progress you towards FAIR data.

    Digital giants, like Amazon and Google, do this easily because they were designed to capture data into central systems where it can be used for business decisions.

    On the other hand, scientists and engineers in non-digital firms tend to create data to serve a specific real-world purpose (eg studying molecular interactions). Having to spend time collecting more than they need, or turning it into formats others can easily use, can understandably feel like a distraction from their job.

    Digital companies also have the advantage that much of their data is measured in ‘clicks’, making it easy to compare. R&D and Engineering data encompasses diverse and complex data sets such as ‘recipes’ for drug production, or physical attributes of turbines in a changing real-world environment.

    But it is these industries – where mistakes are costly – where making good decision based on good data matters most.

    Faster data analysis and digital R&D can transform the future consumer goods.  Discover how in our in-depth guide.

     

    The journey to FAIR data for science and engineering

    Having delivered a lot of these projects, a number of approaches have shown to be effective at unblocking this challenge:

    1. Talk the language

    Avoid sending in sharply dressed management consultants, or recent data science graduates, to tell engineers and scientists to hand over their data. Instead, have people who look and sound like the data holders, and who can speak their language, explain why they should do it. Focus on what’s in it for them (reusing models, combining data with their own, oversight of their colleagues work) rather than why your organisation wants them to do it.

    2. When it comes to tech, appropriateness trumps powerfulness

    Don’t assume tech is the whole solution, or that poor data quality can be solved by a black box. Although AI can help clean data and fill gaps, the most important step is ensuring it is inputted correctly in the first place.

    Ensure the tech and data teams closely engage with the data holders, and understand which of the many systems and technologies are right for their environment. Set tech up to make it easy to use, don’t just go for the option your IT team are familiar with – they are not the users.

    3. Get senior management excited about it

    If the C-suite don’t see the challenge and opportunity, it will be hard to deliver a successful programme. If they are running a company built in the pre-digital age, chances are they are not data experts. And data management can be a hard sell, since benefits are indirect.

    Highlighting what can go wrong can be an effective message, as can drawing their attention to easy-to-understand problems, such as different definitions of the same data across the organisation that make comparisons impossible.

    We hear plenty of stories of people getting it wrong. A Water Utility with bad data management accidentally turned off a pipe providing essential water to a major customer, resulting in big fines. Some GPs administering drugs record it immediately, others do so weeks later, or not at all, so a pharma company studying this data and assuming all records are ‘date administered’, would reach incorrect conclusions about their drug’s efficacy.

    Another effective tool for buy in is to visualise data maturity simply. We sometimes use a ‘trivial pursuit’ approach (see image) where each ‘wedge’ represents an aspect of data maturity (quality, privacy, etc) and each is awarded green/amber/red. This is an effective way to both communicate progress, to start useful discussions, and to create healthy competition between departments.

     

     

    4. Get started

    Break down an overwhelming challenge into manageable steps forward. Don’t expect to do everything at once.

    We worked with one company who engaged a management consultancy to produce a 120 page guide to data management. It was wonderful. But they had no idea what to do with it. Many projects stall because they feel overwhelming, but once you find a place to start, things snowball.

    For example we worked with an organisation with a dislike of top-down policies. They chose to focus on improving meta-data quality, emphasising that this would help everyone understand what their colleagues around the world were doing. Setting up a federated meta-data system was a simple first step. As researchers got the hang of searching colleague’s data, they soon started asking each other to add in or improve their meta data, which organically led to the development of data ownership, stewardship, and data cleansing.

    FAIR data – boring but worthwhile

    Data management is like cleaning a house that’s been neglected for years. It’s not much fun, but once it’s done, it’s easy to find everything you need.

    The tech is the easy bit. Most problems are human. Getting people to share their data needs an understanding of the challenges humans face in these environments, and the benefits they will get if they spend some time getting it right.

    Tessella can help. We are scientists and engineers, as well as data management and data science experts, and we know how to break down barriers between the two groups, and build a case for senior management. We have decades of experience working on the specific data challenges of engineering and R&D environments. Contact us to discuss how we can help.

    New call-to-action

    Subscribe to Our Newsletter