Abstract:
Abstract This study compares the biases in human cognition and those exhibited by large language mod- els (LLMs) compared using the same assessment instrument. The research evaluates biases across eight key parameters—gender, religion, socio-economic status, sexual orientation, caste, linguistic background, political views, and disability—through a survey conducted among II- ITD students and responses from multiple LLMs (Llama3.1, Llama3.2, Llama2, Mistral, and Gemma2). We found that Llama 3.2, Llama 3.1, Mistral, and Gemma 2 are less effective than humans at identifying bias and tend to follow more polarised judgments in decision-making. Additionally, Llama 2 provided inconclusive answers, preventing us from assessing its bias levels. LLM biases mirror patterns in their training data, highlighting the need for fine-tuning to reduce bias and enable ethical decision-making in AI systems.