New Bio-Informatics Platform Makes Large Cancer Data Sets Available to all Scientists in H3


    Life Sciences R&D Digitalization

    Leading oncology discovery company turns to Tessella’s analytics and data science skills to deliver new data-driven oncology research and decision making platform.

    Having worked with Tessella in the past, I had first-hand knowledge of the high capabilities of their consultants. Their experience and focus on scientific software gave me confidence that they would understand the challenge I was facing.

    Lihua Yu
    Vice President of Bioinformatics at H3


    Located in spacious laboratory headquarters in Cambridge, Massachusetts, H3 Biomedicine is a unique, privately held oncology discovery company that applies the deep expertise of leading scientists to the integration of insights from cancer genomics. H3 uses its innovative capabilities in synthetic organic chemistry and tumor biology to develop new, highly targeted small-molecule drugs designed to advance cancer treatment options for some of the most difficult to treat cancer types. H3 Biomedicine’s approach represents the most promising current opportunity in cancer therapeutics.

    With its mission to bring better and more efficient drugs to patients in need, the company symbolizes this goal by its name, H3, signifying Human, Health and Hope.

    Business Situation

    At the core of H3 Biomedicine’s mission and culture is a biology driven approach and dedication to datacentric oncology research and decision making. H3 leverages the world’s cumulative investment in cancer research, including the vast cache of publicly available cancer genomics and pharmacogenomics data, to analyze the drivers of cancer.

    The data-centric culture at H3 hinges on all researchers being empowered with easy access to the information and data necessary to make informed decisions. As at many biopharmaceutical organizations, H3’s bioinformatics group operates as custodians of the wealth of data available from internal and external sources. They are responsible for curating, integrating and analyzing the data and play an important role in making it available to the rest of the company.

    Lihua Yu is Vice President of Bioinformatics at H3. She has taken a leading role in establishing the data-centric culture of the company in such a way as to enable colleagues across the organization to explore their ideas and hypotheses through the data that is available to them. “My group not only provides expert bioinformatics analysis input to project teams,” Lihua explains, “but it also seeks to build tools and resources for everyone in the company. One of our missions is to enable data driven hypothesis generation by all scientists. ”

    Business Challenge

    Over the past few years, a large amount of pharmacogenomics data has been amassed in the public and private sectors. Large data sets have been generated that characterize cell-line, patient and invivo models based on a variety of molecular features and at varying depths of characterization.

    Large scale screening experiments have generated arrays of pharmacology data, which can be correlated to molecular features to better understand the links between these molecular features, the cause of disease and potential treatment options. These data sets can be used to explore new hypotheses, identify potential biomarkers, and validate internally generated results.

    However, these data sets are often broad, covering multiple diseases and a variety of treatment types. Multiple data sets have been produced using many of the same models, but using different experimental platforms. Nomenclatures vary from organization to organization and laboratory to laboratory. Finally, data is distributed in many different formats. As a result, it is difficult to integrate data sets into a single resource. It is also difficult for non-technical scientists to handle, understand and query the data.

    Given their technical skills, it fell to the H3 bioinformatics team to integrate the data sets the company used and support scientists in exploring it. The process was a demanding and time consuming task; the team spent a high proportion of their time developing queries and scripts to extract subsets of data aimed at answering specific scientific questions – a necessary activity, but not the best use of their capabilities and, most importantly, it was a bottleneck preventing scientists testing their emerging ideas in a flexible and timely manner.

    To maximize the value her team could provide to the company, and to better align with H3’s culture of data driven decision making, Lihua reached out to Tessella. It was the only company she considered. “Having worked with Tessella in the past,” Lihua stated, “I had first-hand knowledge of the high capabilities of their consultants. Their experience and focus on scientific software gave me confidence that they would understand the challenge I was facing.” Together, H3 and Tessella developed a tool that would put the power of an integrated pharmacogenomics database in the hands of every employee at H3.


    The result of this combined effort was the Translational informatics Platform (TIP) software. The premise of TIP is to integrate multiple pharmacogenomics data sources and provide a framework in which to query and explore that data. The software was designed with a powerful, highly intuitive and easy-to-use interface for data querying and interactive data visualization that provides feedback for the user. The software eliminated the time consuming need for researchers to write their own queries or scripts to pull the data together.

    At the core of TIP is a powerful and flexible data model that allows for the loading and integration of data from a variety of internal and external sources, enabling new entity and data types to be incorporated quickly and easily. A development focus on performance starts at the data layer with the use of efficient data structures and indices that enable the system to handle the large data volume.

    User centered design and an emphasis on usability has led to an interface that combines familiar concepts, such as simple search and Amazon-like filters, to tackle a complex data challenge. This approach was chosen to facilitate bringing data to novice users – those who might not be as familiar with the data sources as an expert bioinformatician. This is important because the data in TIP is complex – dense in some areas and sparse in others. An important aspect of the TIP interface is its ability to guide the user through the data. As the user interacts with the system, TIP dynamically provides additional filters and options based on what the user has selected and what data is available in the system.

    Benefits Going Forward

    TIP was developed under the “anytime, anywhere, any device” context. “As a result,” Lihua noted, “it has put hundreds of millions of data points in the hands of every scientist at H3, accessible from phone, tablet or laptop.TIP has made it possible for scientists at H3 to explore their ideas quickly and easily, whether they are together in a meeting or in an audience in a conference.” Looking ahead, H3 and Tessella are continuing to develop TIP to extend the data and data types available through the system.