Image of a chatbot interface with LIME based AI explanations

UI representing the conversation flow while utilising explanability features for a chatbot to be used in clinical settings

AI Interpretability in Chat and Data workflows

Enabling AI interpretability for chat based technologies for clinical and data science pipelines. (2021)

Interpretability

Explanability

Chatbot Design

Conversational UX

STATUS

The explanability modules developed during the time are being used to provide added explanations in an app called DocTalk.

DURATION

3 months
May, 2021 - July, 2021

TEAM (x1)

Sumedh Supe, UX and ML Research Engineer

CONTRIBUTIONS: UX and ML Engineer

Enabled critical technical practice for both in clinical practice as well as ML pipelines by creating explanability tools.

From a human-centered standpoint I had to find out places where explanations will have the most impact, and what sort of explanations would benefit the users.

As the ML Engineer, I had to work with the data science pipelines and create the functional software that would enable the creation of explanations, powered by the LIME algorithm.

FINAL OUTCOME

A composite image showcasing various examples of LIME (Local Interpretable Model-agnostic Explanations) explainability modules. The left side features two smartphone screens displaying an AI-powered chatbot named "DocTalk," where the system explains medical reasoning using probability-based charts for parameters like blood pressure and heart rate. The center-right shows a colorful scatter plot visualization, resembling a clustered bubble chart, representing data distribution or feature importance. The bottom-right contains a sentiment analysis pipeline, highlighting prediction probabilities and annotated text with color-coded words (e.g., "Bill Gates," "population") to indicate sentiment or relevance.

Explanability modules created for clinical chatbots(left) and for data science pipelines(right) utilising LLMs

The explanations allow for quick lookout to how the black box models perform, allowing for a 40% increase in asking repeated questions. Enabling critical technical practice both for the clinal practice as well as for the ML pipeline.

1. Explanability Module for DocTalk(RASA): An explanability module for DocTalk was created that helps people analyse the text and explain the output. It was created as a library for RASA in Python.

2. Explanability Module for Sentiment Analysis Pipeline (Python): A generalized LIME explanability module was created to explain data science pipeline. This one was tested for explaining how sentiment analysis was used to create clusters.

PROBLEM

A diagram titled 'Blackbox Models' featuring a large black square in the center labeled as a 'black box.' On the left, there is text reading 'Lorem Ipsum Dolor Sit Amet Consectetur adipiscing Elit sed do LIME Eiusmod Tempor incididunt ut labore et dolore magna aliqua,' with an arrow pointing toward the black box. On the right, another arrow exits the black box with text reading 'Sumedh speaks Esperanto!'

Most AI models are blackboxes and provide no clear way to understand how they work

A diagram titled 'Blackbox Models' showing a large black square in the center labeled as a 'black box.' On the left, there is colorful text reading 'Lorem Ipsum Dolor Sit Amet Consectetur adipiscing Elit sed do LIME Eiusmod Tempor incididunt ut labore et dolore magna aliqua,' with an arrow pointing to the black box. Certain words, like 'Lorem,' 'adipiscing,' 'LIME,' and 'labore,' are highlighted in red, orange, blue, or green. On the right, an arrow exits the black box toward a cartoon detective character with a magnifying glass labeled 'LIME,' next to the text 'Sumedh speaks Esperanto!'

Explanability models like LIME can help explain certain instances of the model and provide more transparency

Most AI models today generate results in a way that cannot be explained. These models are blackboxes and it is very difficult to know how they operate.

To enable transparency and in a way critical technical practice, understanding how a result was produced by an AI model becomes utmost important.

By understanding what specific parts of a particular instance resulted to the output we can allow for more transparency as to how the result was produced without disclosing the entire working of the AI model.

THE USERS

A split-screen graphic highlighting two user personas. On the left, labeled '1. Residents, Especially those undergoing training,' a photo shows a medical professional listening attentively to another colleague. Below, bullet points outline their needs ('quick help, guidance, confirm their thoughts,' 'clarity on chat responses'), goals ('improve decision-making,' 'enhance learning'), and pain points ('information overload,' 'inability to get AI model to explain'). On the right, labeled '2. Data Scientists,' a photo shows a computer screen with code and a person observing it. Below, bullet points outline their needs ('quick understanding of model mechanics,' 'instance-specific feedback'), goals ('improve technical practice,' 'build confidence in outputs'), and pain points ('opaque models,' 'overwhelming complexity').

The users, Residents(the first user) undergoing training on the left and data scientists(the second user) on the right

AI and ML models are making their way into so many apps. Creating a transparent way to boost confidence in these blackboxes is the key to have them being used at critical junctures like healthcare and data science.

PROCESS AND OUTCOMES

After going through the requirements of the practitioners and the RASA chatbot that they were using through interviews and collected data, I decided to use explanation algorithm like LIME to help analyze the results of the pipeline and understand the decisions made by the ML model.

LIME was chosen because of how easy it made explanations. It converted any response into a linear model and explained it using weights. We did not want to explain the entire model and could do with just explaining individual responses.

Instead of designing use cases for every scenario, a probabilistic design approach was taken and the abilities of the explainability module were studied. Knowing that only the weights would be the only output, an API enpoint in RASA was created to output these whenver the user requested.

Unnecessary human interactions as a result of confusion were reduced by 80%.

In the case of the Reflexive ML Pipeline, I analyzed how Explainable Al can help in understanding black box models that are used for the processing of textual data and how explanations can help the user inculcate reflexive practices. I created a LIME explanations module for the Reflexive ML Pipeline that extracts words that have a high-impact on the classification output. This module is aimed at enabling reflection and understanding of the black box models that are used for the processing of textual data.

The Reflexive ML pipeline enhanced the ability of the data scientist to practice critical technical practice on the data by 40%.

ADDITIONAL INFORMATION

DocTalk

DocTalk ("DocTalk: Dialog meets Chatbot") is a joint research project of Charité: Universitätsmedizin Berlin, Fern Universität in Hagen and Freie Universität Berlin funded by the Federal Ministry of Education and Research (BMBF). The overarching goal of the research project is to analyze and improve digital communication and learning paths in clinical environments in order to meet the increased requirements due to interdisciplinary collaboration and intertwining professional process flows, both technically and didactically. To achieve this goal, a proactive communication platform is being implemented at Charité, including a conversational agent (chatbot) that is supposed to support reflective learning processes of residents in the clinical environment. The research group Human-Centered Computing designs and implements the DocTalk chatbot in collaboration with the project partners and the residents as end users of the proactive communication platform.

Reflexive ML

The project "Reflexive ML" is an interdisciplinary research collaboration in the context of the Cluster of Excellence: Matters of Activity (ExC:MoA funded by DFG between Michael Tebbe (Computer Scientist at HCC Research Group and PhD candidate at EXC:MOA), Dr. Simon David Hirsbrunner (Postdoc in Science and Technology Studies at HCC Research Group) and Prof. Dr. Claudia Muller-Birn (Head of HCC Research Group and Principal Investigator at ExC:MoA). The goal of the project is to study how methods from Natural Language Processing (NLP) can be applied in such a way that they support hermeneutic practices of interpretivist scholars by increasing their accessibility through reconceptualization. To this end, a ML-Pipeline has been implemented that can be applied to large written natural language datasets (e.g. YouTube comments). The NLP-pipeline recontextualizes the data by semantically associating it with large datasets in a pre-trained model and thus making similarities visible that have not yet been considered in the analysis (e.g. by grouping together references to conspiracy narratives). Additional features in the Reflexive ML pipeline provide opportunities to inspect and interpret the data in its new context and thus allow a different view by providing material for reflection on the phenomena the data represents.

Human-Centered Computing, Freie Universitat Berlin

The project was a part of the Human-Centred Computing Group at Freie Universität Berlin. The group's mission is to embrace a critical practice in the design of socially responsible technologies. The group's current focus is on machine learning technologies related to privacy, reflection and interpretability, with a focus on interactive and conversational user interfaces. The group is headed by Prof. Claudia Müller-Birn.

REFLECTIONS

The utilisation of open source technologies like RASA enabled me to build an entire explanability pipeline on the their framework. This now enables any model's outputs to be understood using LIME.

I realized that even though we had set out to get explanations instantly the lack of compute made it take even longer to generate outputs, but with better computers and algorithms, I think the would soon be a non issue.

This was my first step into HCI research and I learnt a lot about having useful discussions to get to the desired result. Moreover this was all before ChatGPT, so I feel ChatGPT based explanations can make it even better.

This project would have not been possible without the opportunity from Prof. Claudia Muller-Birn and my mentors, Michael Tebbe and Diane Linke. Thanks to the other members of the HCC lab too for being accomodating throughout my stay.