Converting image to text using Tesseract



In this article we will see how to convert image to text using OCR Tesseract library. 

Tesseract is one of the open source OCR library to read text from images. Tesseract library is available in c# language in github.

Here is the code to convert image to text by passing bytearray of that image. Datapath is tessdata file path which is downloaded along withe tesseract library.

Just save that file on your disk or save it to your project and pass its path to function. Generally, this file contains language support and other meta data information to convert text from image.

For other languages you can use respective files and configure same in code.(i.e., replace "eng" with applicable language code.)


public string TesseractImageStreamToText(byte[] byteArray, string datapath)
{
	string image_content = string.Empty;

	try
	{
		using (var tesseractEngine = new TesseractEngine(datapath, "eng", EngineMode.Default))
		{
			using (var srcImg = Pix.LoadTiffFromMemory(byteArray))
			{
				using (var page = tesseractEngine.Process(srcImg))
				{
					image_content = page.GetText();
				}
			}
		}
	}
	catch (Exception e)
	{
		Console.WriteLine(e.Message);
	}

	return image_content;
}

 

Labels: tesseract c#, tesseract c#.net, tesseract c# api, tesseract image to text c#
Subscribe

Receive Quality Tutorials Straight in your Inbox by submitting your Email below:

Delivered by FeedBurner

Protected by Copyscape Duplicate Content Checker