Document Image Analysis for Deep Learning-Based Text Recognition
DOI:
https://doi.org/10.30743/yb06bm29Keywords:
Deep Learning; CNN; CNN-RNN; OCR; Text RecognitionAbstract
This study evaluates the performance of two models, OCR + CNN and Hybrid CNN-RNN, using the MNIST dataset, which consists of 60,000 training samples and 10,000 test samples, each sized 28 x 28 pixels. The results show that the Hybrid CNN-RNN model significantly outperforms the OCR + CNN model in terms of Overall Accuracy, F1-Score, Character Error Rate (CER), and Word Error Rate (WER). The Hybrid CNN-RNN achieved an overall accuracy of 99.18%, compared to 93.14% for OCR + CNN, and demonstrated much lower error rates (CER and WER of 0.82%) compared to OCR + CNN (CER and WER of 6.86%). In terms of training and validation accuracy, the Hybrid CNN-RNN also performed better, reaching 99.53% training accuracy and 99.18% validation accuracy at Epoch 5, while OCR + CNN achieved 95.89% and 95.97%, respectively. Despite the superior accuracy, the Hybrid CNN-RNN model required more inference time, taking 7.46 seconds for 10,000 samples, as opposed to 5.22 seconds for the OCR + CNN model. In conclusion, while the Hybrid CNN-RNN model offers better accuracy and stability, the OCR + CNN model is more efficient in terms of inference time, and the choice of model depends on whether higher accuracy or faster inference is prioritized.
References
Dessy Tri, N. (2022). "Recent Developments in Recurrent Neural Networks for Sequence Modeling." Journal of Computational Intelligence and Applications.
G.R. Hemanth, M., & Rao, P. (2023). "Hybrid Approaches for Handwritten Text Recognition: Recent Trends and Future Directions." IEEE Transactions on Neural Networks and Learning Systems.
Favour Olaoye, O., & Adepoju, A. (2024). "Recent Advances in Handwritten Text Recognition with Deep Learning." International Journal of Artificial Intelligence Research.
Kasyfi Ivanedra, B., & Metty Mustikasari, S. (2019). "Improving Handwritten Text Recognition Using CNN and Data Augmentation." Journal of Data Science and Analytics.
Mayur Bhargab Bora, R., & Ghosh, A. (2019). "Deep Learning Approaches for Handwritten Character Recognition." Proceedings of the International Conference on Machine Learning and Data Engineering (ICMLDE).
Hanan, M., & Kumar, V. (2021). "Handwritten Text Recognition Using CNN and Transfer Learning Techniques." Journal of Machine Learning Research.
Zhang, J., & Wallace, B. C. (2019). "A Sensitivity Analysis of (and Practitioners' Guide to) Convolutional Neural Networks for Sentence Classification." Proceedings of the 8th International Joint Conference on Natural Language Processing (IJCNLP).
Ali Firdaus, M., Riawan, B., & Hadi, S. (2021). "Deep Learning for Handwritten Text Recognition with CNN and Attention Mechanisms." International Journal of Computer Vision.
Suryo Hartanto, A., & Yuliana, L. (2020). "RNN and Attention Mechanisms for Handwritten Text Recognition." Journal of Computer Engineering and Applications.
Gabrani, M., & Joshi, R. (2021). "Hybrid CNN and Transformer Models for Handwritten Text Recognition." Journal of Artificial Intelligence Research.
Hassan, M., & Li, X. (2020). "Recent Advances in Recurrent Neural Networks for Time Series Prediction." International Journal of Data Science and Analytics.
I Wayan Suartika E. P., et al (2016). "Klasifikasi Citra Menggunakan Convolutional Neural Network (Cnn) pada Caltech 101." JURNAL TEKNIK ITS Vol. 5, No. 1.
Soheila Gheisari, et al (2021). "A combined convolutional and recurrent neural network for enhanced glaucoma detection." Scientific Reports(11-1945). DOI: 10.1038/s41598-021-81554-4.
Downloads
Published
Issue
Section
License
Copyright (c) 2025 Firman Styono, Bob Subhan Riza, Mhd Furqan

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.