How to Build Trustworthy AI Solutions for Biological Imaging

    Dr Sam Genway



    Trustworthy AI is essential for biological imaging

    AI can learn complex relationships between artefacts in images to discover what features combine to indicate disease. This offers huge potential for accelerating the development and application of therapeutics, from the clinical phases of drug discovery to helping physicians diagnose and treat disease.

    Diagnostics is an area where trust matters hugely. AI recommendations inform high-stakes decisions, like which drugs to commit to human trials, or what course of treatment is appropriate for a patient. The performance of the system is of course important, but to trust results, you need to trust AI.

    Using Tessella’s Five Principles of Trusted AI Framework, we explore how to build trust into a hypothetical AI.

    Patient Centric Healthcare - The Role of AI and Data Science White Paper

    1. Assured AI

    An assured AI system for imaging leverages the right training data and the right deep learning tools for the job.

    Data must be representative of the problem being tackled. Images of tissue samples must represent the full range of how healthy and diseased states appear.

    This data must be of high quality, carefully curated, and annotated by experts. Confounding information, such as visual aids or annotations added by subject matter experts, must be removed. Bias must be assessed and corrected; for example, regarding medical diagnostics, this could mean asking whether the training data contains medical images from all different age groups.

    AI development must be guided by experts who know what they're looking for. The project should implement good governance (for example, by following our professional governance framework) to build complexity in the models to improve performance, while characterizing and understanding their behaviour. A stage-gated approach allows rapid experimentation to reduce many ideas down to the most viable ones, and spot dead-ends before costs spiral.

    The models must undergo extensive validation and verification with independent imaging data, typically from a range of sources – for example on different people, using different manufacturers’ equipment – captured by expert scientists and pathologists, to validate that the AI performs as intended. This process will necessarily involve collaboration between subject matter experts in disease pathology and AI.

    2. Explainability

    Diagnostics are never 100% accurate. With the stakes so high, it’s important to understand the rationale for any recommendation.

    Techniques in explainable AI give insight into what features in an AI model’s input are driving the output. For an image, useful explainability tools can show how each pixel contributed to the AI system’s classification. AI engineers can then check that diagnoses are being made for the right reasons.

    AI models can be right for the wrong reasons. This allows AI engineers to spot when the AI is learning from information other than the disease indicator. For example, if the pathologist had drawn a dot in the corner of all unhealthy images, the AI would see this as the clearest indicator of disease. Such simple mistakes have happened, which lead to an AI reaching the right answer with the wrong reasoning.

    Explainable AI can tell you what is driving the diagnosis, but you still need expertise in the underlying biology science to know what that means. Explainability requires a human to be able to understand how an AI reached its conclusion and judge whether it's right to a satisfying level of scientific scrutiny. 

    Explainability for non-expert users may be some way off in complex diagnostics like medical images, although it's conceivable that one day descriptive explanations – for example an AI-generated text passage describing why the size, shape and colour of a lesion is indicative of a particular pathology – could be commonplace, and will assist in the creation of more trustworthy AI solutions.

    3. Human

    The AI needs to be easy to use, or it won’t be used correctly.

    Its design will be influenced by a knowledge of how humans interact with technology. The limits of users’ knowledge must be considered as well as an appreciation of the way in which they will use the output of the system. In some cases, the output will be a simple classification – a positive or negative; in other scenarios, a breakdown of probabilities might be the right output.

    It will often be important to capture uncertainty: perhaps the image doesn’t look like anything the AI has been presented with during training and is consistent with a different disease or condition not captured in the training data.

    Either way, it must be presented in a simple, easy to use interface, that integrates with the existing technology in use. This might be an app or online service, or may be integrated into scanners or other medical equipment. Adequate user training must also be offered to build confidence.

    4. Any AI must be allowed to be used for its intended purpose

    If AI is used in the process of diagnosing an individual’s condition, this will involve regulatory approval by a body such as the FDA. This involves assessing the risks associated with deploying such a system and mitigating them throughout development.

    The risks depend on two factors:

    • The level of autonomy of the system
    • The consequences of the decisions made

    Will the AI system be used to diagnose or treat a patient, or will it only inform clinical management? A critical condition will involve much higher risks in decision making than a non-serious condition.

    While regulatory frameworks are most developed in healthcare and automotive sectors, regulation in other domains is developing and guidance is emerging at national and international levels.

    The wider data infrastructure around an AI system requires clear data governance. AI is no different in this respect and, as with any system which uses sensitive personal data, compliance with local regulation may mean multiple deployments around the globe.

    Even in domains where AI regulation is less mature, ethical considerations are paramount. Consider the performance of an AI system on different demographic groups. If a model is trained on data from different groups of people and shown to work 80% of time, does that mean 80% accuracy for everyone, or 100% accuracy for the majority group, and 0% for other groups? An AI system needs to be assessed in stratified way to understand where and why it works.

    5. The model needs to perform in the real world, not just the lab

    Good performance starts by setting the right objective. This may not be a metric as simple as accuracy but should reflect what you hope to achieve.

    If its aim is to assist physicians, then it should be assessed on whether it helps them make better decisions, but not necessarily on whether it outperforms them. A real-world assessment should consider how it improves the current approach – does it reduce errors, save time, increase throughput?

    Finally, any AI development means planning for the whole lifecycle. Performance must be monitored post-deployment and as new data is generated, there will opportunities to improve the performance of the system through retraining. Subsequent releases must be routinely, but carefully, validated in accordance with regulatory requirements.

    Boy with robot