Variants of knowledge-based chatbots in family caregiving – human evaluation and comparison

Isa Nowak, Dietlind Zühlke, Narjes Nikzad-Khasmakhi, David Alexander Kring, Tanja Grüb-Okkan, Juliane Steinborn, Isabel Zorn

Abstract


Purpose Knowledge in the family caregiving domain is highly complex and incorporates many conditions and peculiarities. This knowledge poses a challenge for people needing care and their informal caregivers due to the assessment of reliability, credibility, and seriousness of the vast array of available information (Hoffmann et al. 2021). Simultaneously it is essential for effective home care. Although professional care advice exists, it is sometimes difficult to reach and cannot be consulted immediately. Therefore, we develop a Large Language Model (LLM) based, knowledge-integrated chatbot tailored to support this target group with reliable information about rights and entitlement to benefits and how to apply for them. One promising conceptual solution is the Retrieval-Augmented Generation (RAG) (Lewis et al., 2020). RAG connects LLMs to external sources to provide more specific domain knowledge for answering questions (Pan et al., 2024). In light of the increasing dissemination of LLMs and their use cases in sensitive areas such as the family caregiving domain, we provide a rigorous human evaluation of the effectiveness of different variants of knowledge integration into chatbots to reduce factual inaccuracies, the so-called hallucinations. Method In this study, we compare five different variants of knowledge graphs (KG) and vector databases (VDB) in RAG systems combined with LLMs like Llama3 and GPT-4. Besides the mathematical evaluation based on RAGAS metrics, we performed a comprehensive human evaluation by experts in family caregiving. For the evaluation of the given answers, we used the following criteria: faithfulness of the content, correct language, factual accuracy and completeness. Furthermore, we allowed free text comments for more detailed remarks, e.g. if the domain terms used were outdated or what content was missing. The study focused on specific questions concerning care benefits-in-kind, especially relevant for informal caregivers. This allowed us to identify the potential and challenges in knowledge representation for such a sensitive area. Results and Discussion The preliminary results of our study show that for generating highly accurate answers to qualified questions the integration with a knowledge graph is especially advantageous. In contrast, vector databases perform very well in delivering a broader context. However, all of the knowledge-integrated chatbots showed an apparent increase in performance for the answer-generation process. These observations encourage the potential significance of a combination of both techniques. Furthermore, the results suggest the need to tailor the knowledge representation to the specific use case as in home care a huge variety of information needs exists.  In light of ever-developing and increasing requirements for the healthcare system, our findings can add to the development of chatbots and AI-based information systems that better suit diverse users' needs and competencies. Future research should strive to combine KG optimally with VDB approaches to enhance the reliability and accuracy of LLM-generated answers. This should also allow the dynamic adaptation to new information.  

 

References

Lewis, P.; Perez, E.; Piktus, A.; Petroni, F.; Karpukhin, V.; Goyal, N.; Kuttler, H.; Lewis, M.; Yih, W.-t.; Rocktäschel, T.; et al. 2020. Retrieval-augmented generation for knowledge-intensive nlp tasks. Advances in Neural Information Processing Systems 33:9459–9474. 

Hoffmann, M.; Schwarz, C.; Sendlhofer, G. ; 2021. Patienten und Angehörige richtig informieren. Springer Fachmedien. Wiesbaden. https://doi.org/10.1007/978-3-658-35274-5

Pan, S.; Luo, L.; Wang, Y.; Chen, C.; Wang, J.; and Wu, X. 2024. Unifying large language models and knowledge graphs: A roadmap. IEEE Transactions on Knowledge and Data Engineering.

 

Keywords: Large Language Models, Retrieval Augmented Generation, family caregiving, Chatbot, Response Accuracy

Affiliation: Institute for Media Research and Media Pedagogy; TH Cologne; Germany 

Corresponding Author Email: isa.nowak@th-koeln.de Author’s ORCID iD: https://orcid.org/0009-0000-4415-734X

Acknowledgement The authors would like to thank the German Federal Ministry of Labour and Social Affairs for supporting the research in the project “BOTschafft Inklusion: Der inklusive Chatbot zur Eingliederung und Teilhabe von pflegenden Angehörigen in den Arbeitsmarkt“ (#CIP.00.00012.23). This research is making an essential contribution to the understanding of external knowledge sources for LLMs and their influence on answer accuracy. 

 


Refbacks

  • There are currently no refbacks.