Abstract:
Privacy policies are often lengthy and complex, hindering individual’s ability to make informed decisions about their data privacy. Abstractive summarization techniques can improve accessibility and transparency, but there is a lack of research in this area. Further development of these techniques can enhance comprehension of privacy policies and promote trust between individuals and organizations. In this work, we propose a controlled abstractive text summarization approach using a Bidirectional and Auto-Regressive Transformer (BART) model, which achieves state-of-the-art performance on our custom dataset. Our method optimizes the relevance and duration of the generated summaries to enable controlled summaries by integrating a reinforcement learning framework and a tailored loss function. We also introduce a new dataset of privacy policy documents and their summaries and establish performance benchmarks for future research. Experimental results on the custom dataset demonstrate significant improvement in summarization quality compared to several baseline methods, as measured by ROUGE and BLEU scores. The proposed approach has the potential to facilitate comprehension of privacy policies and improve user’s privacy awareness.