How to Make the Biggest Big Data Decision

    James Hinchliffe


    Data Science

    Big data has become big business. IDC has forecast that revenue from the sales of big data and business analytics applications, tools, and services will reach $187 billion (£132 billion) in 2019. The analyst house also recently predicted that the amount of digital data created in 2025 will be 10 times greater than the amount created in 2016, with enterprises expected to create 60% of data by that year.

    With this in mind, most enterprises are today either looking to benefit from their big data or are doing so already. Forward-looking organisations across a range of industries are already using big data across their value chains in a staggering range of applications from developing more effective medicines more rapidly, to learning about customer preferences, to improving operational performance in the extraction of oil.

    However, side by side with the successes that companies are keen to talk about are many failures that are, for obvious reasons, discussed much less openly.

    Why do big data projects fail?

    There is a lot of analysis and advice about the reasons why big data projects fail. Much of this rightly identifies a lack of clarity in the project’s aims as being the cause, but big data projects also fail because of unsuitable technology choices. It is easy to understand why this might happen.

    First mover advantage is very significant in the digital world and there are many different data analytics technology options available. Consequently, technology leaders are under pressure to make complex technology selection decisions quickly. The risk is that the wrong choice is made.

    Getting an existing live project back on the right technology path is at best incredibly costly and inconvenient, and at worst impossible. In any case, it will involve uncomfortable discussions with business stakeholders. Therefore, getting it right the first time is of utmost importance.

    Get More from Your Data. Talk to Us about What Data Science Can Do for You

    How to choose the right big data technology

    Deciding on the right big data technologies for a project involves having a clear understanding of a few key internal factors, such as:

    • What the business is aiming to achieve
    • What is the current technology setup within the organisation
    • what talent already exists within the organisation

    Taking the time to access these elements carefully before committing to a beguiling new technology is an invaluable step to take to avoid making costly and potentially irreversible mistakes.

    It is also important to realise that, sometimes, establishing clear project objectives and exploring alternative approaches can reduce the problem to one that can be solved with mature mainstream technologies such as relational databases.

    In his book “Big Data Analytics: Turning Big Data into Big Money”, Frank Ohlhorst captures a very important point: “Big Data defines a situation in which data sets have grown to such enormous sizes that conventional information technologies can no longer effectively handle either the size of the data set or the scale and growth of the data set.”

    In other words, we only have a big data project when we have a problem that can’t be solved with conventional approaches. Attempting to use big data technologies to address something that is truly a small-scale problem is at least as dangerous, in terms of lost time and money, as the reverse.

    The five deciding factors

    There are several aspects of a solution design that we recommend are explicitly considered when an organisation is deciding which big data technology to employ, and we group these in into five deciding factors.

    Data ingestion

    How will data get into the system? How will data sources be connected to the system and how will the data travel between any two systems? What data formats will be used?


    Does anything need to be done to the data immediately as it flows between systems? If so, how will the data be processed? How quickly are the results really needed?

    Batch processing

    Can all the processing be done in real time, or will some data need to be processed offline? If so, how? How will stream and batch-processed data be recombined at query time?


    Where, how and for how long should the data be stored? Do we need to distinguish between medium-term and long-term storage or archiving?


    Once the data is captured and processed, how will it be used, and who by? Will it be used by business users, data scientists, automated processes or machines? How will users search for data? How will users subsequently work with data?

    In a world where many big data technologies began life in companies such as Google or Facebook, companies that don’t have the same challenges as many pre-digital organisations, it’s never been more important to ensure that big data technology choices at the beginning of a project are well thought through.

    Getting it wrong can mean a blunted competitive edge, disappointment and even embarrassment, but the outcomes of doing this correctly – turbo-charging R&D, gaining new insight into customers’ behaviour, being first to market with a new service or beating the competition to introduce a first-in-class life-saving drug – can be immeasurably great. If you're looking for expert guidance in your big data decisions, contact us, here

    New call-to-action