Abstract:
Scene Text Recognition (STR) refers to the task of recognition of text in natural scenes. The
success of OCR models is hard to achieve on natural scene images due to a variety of challenges,
including - variation in orientation and pixel intensities in images, low resolution and errors in
bounding box detection, as well as variation in fonts and shapes of print of characters. Our
main objective is to obtain a model that achieves near State of the Art performance out custom
MAVI dataset, which will allow it to be used in the real world application of assisting a visually
impaired person to read signboards in order to obtain directions. We provide an end-to-end
detection and recognition system for the same. Problems arise when the distribution of data
seen during test time differs from the training data. The model cannot make reliable predictions
in such a scenario. We perform experiments to demonstrate how the model performance drops
due to a shift in domain.