Abstract:
The problem of the ETA prediction of public transit has an essential role in improving the rider’s experience. While, it is challenging to ensure the timeliness of bus, especially during the rush hours. This thesis provides a heads-up on estimated arrival time for better planning using the open-transit data.
The first step of providing a real-time scalable ETA is to design an algorithm that can preprocess the raw GTFS data of a day into a tensor. The representation aims to decouple the information about the bus, thereby enabling scalability across routes and reducing variance.
The second step is to design a Spatio-temporal model (SSTG) for scalable and robust ETA prediction. In the proposed SSTG framework, we will provide answers to the following open problems. Firstly how can we exploit the spatial-temporal correlation in the ETA data? Secondly, how to scale the spatial-temporal ETA prediction framework on a large network effectively? Thirdly, How to handle sparsity in the data? Fourthly, the prediction of the ETA for the cold start stops is an unexplored problem. i.e., stops that are absent from the training dataset, how can we predict ETA for a cold start-stop? Moreover, a user would prefer waiting a bit longer than missing the bus because of underestimation. Therefore, for better customer satisfaction, we need to reduce the underestimation.
The proposed framework captures the Spatiotemporal structure in the ETA data using recurrent neural networks modified with a graph convolutional. The input to the network can be sub-sampled, thereby ensuring scalable learning and further providing a solution to the cold start stops ETA prediction. The first layer of the encoder integrates GRU-D for the missing data imputation. Moreover, we use a MSLE- Weighted loss function to overestimate the ETA and fine-tune the penalty on overall performance compared to the regression loss(MSE) function. We finally conclude that the SSTG model is computationally efficient and outperforms the state-of-the-art methods on ETA and traffic datasets.