What Makes People Trust AI? Good and Bad Examples of AI

    Dr Matt Jones



    The mainstream media hasn't been kind to AI. They've highlighted the many mishaps of badly planned AI, from sexist HR tools to unpredictable autonomous vehicles.

    AI professionals may roll their eyes at media hyperbole, but these stories matter for our industry’s development. Firstly, whilst selective, they highlight very real problems with how businesses – including world-leaders in data science – approach AI. Secondly, these stories create suspicion and undermine trust in AI across the board.

    If AI is to fulfil its potential, we need to address concerns that undermine trust, both in AI design and user experience. This is important both for the AI in question, and to stop more such stories hitting the headlines.

    Let’s look at some examples of positive and negative AI use and how they can affect trust in the technology.


    When AI goes bad

    One of the most famous ‘AI failures’ was Amazon’s recruitment tool. The AI looked at applicant CVs and predicted the best candidates based on similarities with previous successful hires.

    But models were trained by observing patterns in past resumes. Most came from men due to tech industry bias. Not understanding this nuance, the system taught itself that male candidates were preferable, and started rejecting applicants for being female. As a result, it couldn't be trusted to make unbiased decisions.

    In another case, an AI was designed to predict premature births by exploring the link between non-invasive electro-hysterography readings and premature births. An initial project suggested up to 99% accuracy.

    But there was a problem. The researchers used oversampling to compensate for the small data set. The invented records were used in both training and validation. So, it was trained to learn about a correlation, then tested with some of the same data to see if that correlation existed. Given the initial data set was quite small anyway, this gave it a very high accuracy.

    When researchers reproduced the models, accuracy dropped to 50%. Trust in the original accuracy claims was destroyed at a stroke. A model that looked like it should go into clinical practice was shown to be one that definitely should not.

    In both cases, the designers didn't take a rigorous approach to selecting training and testing data. They didn't think through the nuances of real-world data. They over relied on data to give all the answers, rather than engaging with the reality of how it would work and be used in the real world.

    Good AI has good processes

    Now let’s look at a couple of examples of getting it right.

    Earlier this year, a pharma company announced it had successfully used AI to identify drug molecules for treating OCD.

    They used algorithms to sift through potential compounds, checking them against a huge database of parameters. As a result, a drug molecule was ready to go into clinical trials, and was developed in 12 months. The industry average is 4.5 years.

    Researchers need to be confident there is a reasonable chance of success before they embark upon expensive clinical trials. In this high-stakes world, how could they trust the results?

    Trust was built through a dedicated focus on high-quality data acquisition, checking, and tailoring algorithms to the specific task at hand. Through close collaboration between AI and drug chemistry experts, inputs and outputs could be continuously assessed for bias and accuracy and verified against real world understanding – not just trusting correlations in the data.

    Another lovely example is Google’s Bolo reading-tutor, designed to help children in rural India with reading skills. The ‘tutor’ app encourages, explains, and corrects the child as they read aloud.

    The tech itself was nothing new – it applied Google's speech recognition and text-to-speech technology to a specific application. What made the project successful was the focus on applying it to a specific application, developed with a clear purpose in mind. The app was carefully tested in a pilot in 200 Indian villages, which was verified by ongoing research in the field, showing 64% of children showed significant improvements in reading proficiency.

    The rigorous process of development, real-world testing, and gathering user feedback allowed them to build an app that could be trusted. The app has since been rolled out widely.

    Trust in AI

    What differentiated the trusted AIs from the AI failures was a rigorous focus on processes to ensure the right data was selected, the right models developed, and the output was designed with the user in mind. With this in mind, can you be confident your users will trust your AI?

    Boy with robot