Madhan Kumar Selvaraj's blog

Posts

Showing posts from January, 2020

Extracting text from the image and translation using Tesseract and Yandex API

- January 01, 2020

Presenting by Madhan Kumar Selvaraj Text extraction from the image and translation In this blog, we will learn about the process of extracting the text from the image by using the open-source Google's Optical Character Recognition OCR engine Tesseract. Then we will use the multiple language detector and translator Yandex API to detect and translate the language of the text and the image. What is OCR ? OCR refers to Optical Character Recognition. As a human we can able to recognize the text from the image but what about the computer and how it can read or parse the data from the image. At that time we found the OCR and inside the OCR engine, there will be computer vision for the image processing and machine learning technique to train the machine with training data set using certain algorithms to improve the accuracy. What are the benefits of learning and using OCR? The ultimate benefit of the OCR is to convert the hard copy into the digital format. Still ...