<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:dc="http://purl.org/dc/elements/1.1/" version="2.0">
<channel>
<title>Year-2024</title>
<link>http://repository.iiitd.edu.in/xmlui/handle/123456789/1654</link>
<description>Year-2024</description>
<pubDate>Sat, 11 Apr 2026 21:31:02 GMT</pubDate>
<dc:date>2026-04-11T21:31:02Z</dc:date>
<item>
<title>Detection to interpretation: advancing tabular data processing with multimodal AI</title>
<link>http://repository.iiitd.edu.in/xmlui/handle/123456789/1788</link>
<description>Detection to interpretation: advancing tabular data processing with multimodal AI
Bhuyan, Pijush; Shah, Rajiv Ratn (Advisor)
Tables are the most common form of structured data found in documents. Proper interpretation of such raw tabular data by computer systems remains an open challenge. We take a deep dive into document intelligence - which includes table detection, table reconstruction and table structure interpretation by AI models. Firstly, we handle domain adaptation in table detection. Pre-trained table detection models have displayed poor results when the target domain varied from the source. We resolve this by building a domain invariant table detection dataset where we inject additional noisy synthetic detection data. Empirical tests show that training a detection model on synthetic data displays a significantly lower drop in performance when tested on out-of-distribution datasets. Following this, we build a fast,yet efficient, end-to-end pipeline for Table-OCR. It reconstructs the table structure and content from raw detection crops and converts them into computer-storable text format. Finally, we design a comprehensive benchmark suite of tests to test the table structure understanding capabilities and limitations of existing Large Language models (LLMs) and Vision Language Models (VLMs) using both text and image modalities. The vision component of VLMs is found to be a bottleneck in multi-modal table interpretability. We work with a light-weight, yet efficient model-agnostic adapter module which injects positional information into the image modality through positional embeddings during model training. We also design a novel pre-training task for image-text alignment for open-source VLMs and study the change in model performance while interpreting visual tabular data. We also study the feasibility and future scope for true multimodal table understanding - interpreting tabular data from both image and text modalities for reasoning.
</description>
<pubDate>Sat, 21 Dec 2024 00:00:00 GMT</pubDate>
<guid isPermaLink="false">http://repository.iiitd.edu.in/xmlui/handle/123456789/1788</guid>
<dc:date>2024-12-21T00:00:00Z</dc:date>
</item>
<item>
<title>Optimising serialisation for cloud applications</title>
<link>http://repository.iiitd.edu.in/xmlui/handle/123456789/1695</link>
<description>Optimising serialisation for cloud applications
Nayak, Siddharth; Shah, Rinku (Advisor)
Serialisation latency is a significant concern in modern cloud applications that leverage the microservice paradigm. A cloud service request typically traverses a sequence of microservices across nodes, increasing latency due to (de) serialisation overhead at every hop. The serialisation process comprises memory allocation, data encoding, and data copy. Observations from existing benchmarking results show that the data copy operation dominates the overall (de) serialisation cost. Existing serialisation libraries follow a two-copy technique — (1) the application copies the encoded data into a serialised buffer, and (2) the serialised data is copied to the NIC’s device memory. To reduce serialisation latency, researchers have proposed (1) kernel bypass techniques that eliminate data copy, (2) use of hardware acceleration solutions, and (3) wire format optimisations. However, kernel bypass solutions have security concerns and cannot be deployed in public cloud networks, and hardware acceleration solutions depend on specialised hardware. We propose designing and implementing a one-copy serialisation library, which leverages the scatter-gather I/O technique provided by the standard POSIX library for data movement. Our solution does not require special hardware support or any specialised network stack. Our design relies on the Linux network stack; there are no security concerns, making it usable in public and private clouds.
</description>
<pubDate>Tue, 21 May 2024 00:00:00 GMT</pubDate>
<guid isPermaLink="false">http://repository.iiitd.edu.in/xmlui/handle/123456789/1695</guid>
<dc:date>2024-05-21T00:00:00Z</dc:date>
</item>
<item>
<title>Programmable proxy for microservice communication in multi-cloud environments</title>
<link>http://repository.iiitd.edu.in/xmlui/handle/123456789/1693</link>
<description>Programmable proxy for microservice communication in multi-cloud environments
Pathak, Shambhavi; Shah, Rinku (Advisor)
As enterprises migrate from single-cloud to multi-cloud architecture, they encounter challenges due to geographical dispersion, varying WAN characteristics, and diverse cloud policies. These challenges demand a re-evaluation of communication strategies by the developers to ensure reliability, security, and compliance. In response, our work presents a solution, Programmable Proxy, designed to address the dynamic nature of multi-cloud environments. My thesis focuses on the switch between HTTP and MQTT communication protocols; the proxy facilitates real-time adaptation based on service location, packet characteristics, and request types. Through this approach, programmable proxy optimizes communication mechanisms in alignment with evolving deployment requirements, enabling enhanced performance across diverse cloud infrastructures. This study contributes to the advancement of flexible, adaptive microservices architectures in the context of multi-cloud environments.
</description>
<pubDate>Tue, 21 May 2024 00:00:00 GMT</pubDate>
<guid isPermaLink="false">http://repository.iiitd.edu.in/xmlui/handle/123456789/1693</guid>
<dc:date>2024-05-21T00:00:00Z</dc:date>
</item>
<item>
<title>Incongruence identification in eyewitness narratives</title>
<link>http://repository.iiitd.edu.in/xmlui/handle/123456789/1691</link>
<description>Incongruence identification in eyewitness narratives
Nair, Akshara; Akhtar, Md. Shad (Advisor)
Eyewitness accounts are crucial to legal and investigative proceedings, serving as foundational sources for reconstructing events. Agencies typically collect multiple statements to achieve a thorough understanding of the situation. However, during investigations, it is essential to detect any incongruences in these eyewitness accounts, such as conflicting details about timelines, actions, or identities. Comparing and identifying these inconsistencies within testimonies is critical because they often signal potential deception or manipulation of facts. Traditional methods for identifying inconsistencies often prove inadequate, as they lack access to detailed, event-specific datasets. Moreover, these conventional approaches do not pinpoint the exact incongruence between two statements, which is necessary to provide direct evidence supporting the detected answer. This thesis proposes a novel framework for identifying incongruences between two testimonies by comparing their responses within the context of shared questions.We created a Multimodal Eyewitness Deception Detection Dataset (ED3) which contains the testimonies of eyewitnesses collected through an interview process after witnessing scenarios involving different stimuli (e.g., a simulated crime) Our research is centered on two tasks: first, identifying the presence of incongruence within the statements of witnesses. We demonstrate the superior efficacy of prompt tuning techniques over traditional fine-tuning methods in this identification process. Second,detecting the exact contradiction statements within the testimonies, we utilized the reasoning ability of LLM and proposed a three step reasoning framework inspired by the Chain-of-Thought (COT) methodology.This method breaks down the statements into a step-by-step reasoning process, mirroring human problem-solving behaviors, which helps in systematically identifying, explaining the discrepancies found and facilitating the extraction of incongruent text spans. The outcomes of this thesis contribute to more accurate and dependable analysis of testimony evidence, enhancing the reliability of legal and investigative practices through the use of generative AI methods.
</description>
<pubDate>Mon, 01 Jul 2024 00:00:00 GMT</pubDate>
<guid isPermaLink="false">http://repository.iiitd.edu.in/xmlui/handle/123456789/1691</guid>
<dc:date>2024-07-01T00:00:00Z</dc:date>
</item>
</channel>
</rss>
