INTRODUCTION: The quantity of audio and visual data is increasing exponentially due to the internet's rapid growth. The digital information in images and videos could be used for fully automated captions, indexing, and image structuring. The online image and video data system has seen a significant increase. In such a dataset, images and videos must be retrieved, explored, as well as inspected.
OBJECTIVES: Text extraction is crucial for locating critical as well as important data. Disturbance is indeed a critical factor that affects image quality, and this is primarily generated during image acquisition and communication operations. An image can be contaminated by a variety of noise-type disturbances. A text in the complex image includes a variety of information which is used to recognise textual as well as non-textual particulars. The particulars in the complicated corrupted images have been considered important for individuals seeing the entire issue. However, text in complicated degraded images exhibits a rapidly changing form in an unconstrained circumstance, making textual data recognition complicated.
METHODS: The naïve bayes algorithm is a weighted reading technique is used to generate the correct text data from the complicated image regions. Usually, images hold some disturbance as a result of the fact that filtration is proposed during the early pre-processing step. To restore the image's quality, the input image is processed employing gradient and contrast image methods. Following that, the contrast of the source images would be enhanced using an adaptive image map. Stroke width transform, Gabor transform, and weighted naïve bayes classifier methodologies have been used in complicated degraded images to segment, features extraction, and detect textual and non-textual elements.
RESULTS: Finally, to identify categorised textual data, the confluence of deep neural networks and particle swarm optimization is being used. The dataset IIIT5K is used for the development portion, and also the performance of the suggested methodology is assessed by utilizing parameters like as accuracy, recall, precision, and F1 score. It performs well enough for record collections such as articles, even when significantly distorted, and is thus suitable for creating library information system databases.
CONCLUSION: A combination of deep neural network and particle swarm optimization is being used to recognise classified text. The dataset IIIT5K is used for the development portion, and while high performance is achieved with parameters such as accuracy, recall, precision, and F1 score, characters may occasionally deviate. Alternatively, the same character is frequently extracted  multiple times, which may result in incorrect textual data being extracted from natural images. As a result, an efficient technique for avoiding such flaws in the text retrieval process must be implemented in the near future.