Beyond text: multimodal analysis for mental health diagnostics leveraging large language models

Tank, Chayan; Shah, Rajiv Ratn (Advisor)

Beyond text: multimodal analysis for mental health diagnostics leveraging large language models

Tank, Chayan; Shah, Rajiv Ratn (Advisor)

URI: http://repository.iiitd.edu.in/xmlui/handle/123456789/1880

Date: 2025-07

Abstract:

Mental health disorders, particularly depression and suicide risk, represent critical public health challenges that are often underdiagnosed due to social stigma, lack of access to professionals, and the limitations of traditional diagnostic tools such as questionnaires. This thesis consolidates a series of research contributions that explore the use of Large Language Models (LLMs), both unimodal and multimodal, for more scalable, accurate, and context-aware mental health assessment. The analysis encompasses three key approaches of analysis: Establishing baseline performance using classical ML/DL models on the E-DAIC and Reddit datasets with traditional feature extraction and fusion techniques; comprehensive benchmarking of state-of-the-art LLMs in zero-shot and few-shot settings for depression and suicide risk prediction; and evaluating the EDAIC test set on the novel AM-LLM framework which introduces a model-agnostic, multilingual architecture that combines audio and text for enhanced mental health assessment, demonstrating improved performance in depression detection tasks than just textual analysis, with comparable performance on English and Hindi languages. The thesis also provides a critical perspective on the scalability, bias, and ethical implications of deploying LLMs in sensitive health contexts and explores building explainable mental health support systems. Collectively, this work demonstrates that multimodal LLMs, when properly adapted and evaluated, hold immense promise for augmenting mental health diagnosis.

Show full item record