HenoohOCR

Description

HenoohOCR is an Optical Character Recognition software that I wrote in C++, eventually ported to C#. I was mostly interested in creating a separate library that can be implemented in code to give computer ability to read what is intended for human users, and make intelligent decisions. Most OCR engines are based on larger fonts with high quality inputs, usually perfomed with template matching techniques, and I needed an OCR that can mimic human recognition capabilities for much smaller fonts. This has been a humbling experience due to amount of understanding needed in mathmatics and arts. But compared to other OCR engines, where years of effort and many number of developers with funding from HP and Google, HenoohOCR is impressively accurate.

Examples

I wrote a WPF GUI interface, where you can drag an drop images, and get text outputs.

Quick Brown Fox Example

Tested with couple sets of images with different fonts, not perfect, but great accuracy.

Google News Example

Small Text Processing

This is a feature in henooh OCR that allows small fonts to be processed accurately. Due to the nature of pixel arrangements, when you zoom text that are small, you will see a illusion trick that details out smaller fonts. I wrote a class that solves this problem.

Example Coding Style

I have included a sample code, full code is available upon request. Code below is a logic to determine number of lines by using K-algorithm, sorting the characters before doing the actual character recognition.

Code Example

The code below is sharpening small texts.