Abstract:
This project introduces a document retrieval system leveraging the LangChain library integrated with the LLaMA3 and Ollama language models. This system is specifically designed to anal- yse and respond to queries related to drug-related content within a curated dataset of PDF documents. Employing state-of-the-art natural language processing technologies, the system extracts text from the PDFs, processes the text into structured data, and then utilises FAISS for indexing to support efficient information retrieval. The backend, developed with Django, enables handling user interactions, managing documents, and processing queries efficiently and scalably. Our dataset comprises two to three PDF documents containing detailed articles on various drug- related topics. Due to the system’s ability to allow precise queries about drug interactivity, effi- cacy, and regulations, this specialised dataset proves most useful when used in pharmaceutical research or as an information dissemination tool. With the help of the integration of modern models such as LLaMA3 and Ollama, the system not only increases the effectiveness of the in- formation search but also contributes to answering users’ questions more relevantly, which helps to make effective decisions in the sphere of drugs.