Abstract:
Financial risk prediction is an essential task in today’s financial markets, especially with the disclosure of
important information such as Merger and Acquisition (M&A) calls nowadays. M&A calls often provide
key insights into the confidence levels and agreeability between the different high ranking members of
an organization concerning a merger or acquisition in the form of subtle vocal cues and verbal messages.
Variations in vocal features such as pitch can suggest doubt and a lack of confidence that may discredit
the message spoken. Additionally, information about the speaker of the message can assist the system
in making better predictions. To aid the analysis of M&A calls, we curate a dataset of conference call
transcripts and audios from the past 5 years. We propose strong baseline architectures to accompany
the model that takes advantage of the multimodal multi-speaker input to perform financial forecasting.
Empirical results show the improvement that our architecture gives over existing models.