4 min read

APPS & SMARTPHONES

Work with your favorite apps and tools or create your own custom integrations using the AEYE API.
APPS & SMARTPHONES

Work with tech on your mobile

Title

A Novel Method of Feature Extraction for Optical Character Recognition

Abstract

In this brief paper, we propose a method of feature extraction for character recognition that is inspired by Labusch et al. digit recognition research: a sparse-coding strategy and a local maximum operation. We first employ the binarisation algorithm to represent outlines of character images. In a second step, we apply a local maximum operation to determine number of local minima in an array. Finally we match glyph from singular value decomposition  (SVD)  labeled set and obtain state-of-the-art classification performance in the character recognition task defined by the MNIST benchmark.

Introduction

Machine Vision has become integral part in robotics and intelligent systems like autonomous vehicle. Optical Character Recognition is a field of research in pattern recognition, artificial intelligence and computer vision. It is electronic conversion of images of typed, handwritten or printed text into machine-encoded text. Optical Character Recognition depend on feature extraction. Many engines have derived from pretext  which decomposes glyphs into features. Features can be global properties of digit image that can be extracted using pattern recognition. The extraction reduces the dimensionality of the representation and makes the recognition process computationally efficient. These features are then compared with an abstract vector-like representation of the character, globally accepted provided in MNIST database. Reduction in dimensionality is one of the most important research topics while extracting features in glyph pattern. There are many different approaches which have been carried out by researchers in machine vision, artificial intelligence and pattern recognition. For example, Labusch et al. [6] described a sparse coding based feature extraction method with SVM as classifier. The work described in [8] combined three recognizers by majority vote, and one of them is based on Kirsch gradient (four orientations), dimensionality reduction by PCA, and classification by SVM. Accuracy of 95.05% with 0.93% error on 10,000 test samples of MNIST database was achieved.

Description

In this brief paper, we propose a novel method for character recognition that is inspired by Labusch et al. digit recognition research: a sparse-coding strategy and a local maximum operation. We show that our method, despite its simplicity, yields state-of-the-art classification. We first employ a binarisation algorithm to represent outlines of character images. We then use this basis to extract local coefficients. In a second step, we apply a local maximum operation to determine a number of euclidean lemma in an array. Finally we  compare pattern glyph against a singular value decomposition  (SVD)  labeled set and obtain state-of-the-art classification performance in the character recognition method. We compare the classification performances obtained with binarisation, euclidean lemma and obtaining linear span (linear hull) pattern. We conclude that the our method of binary representation of character image outline combined with euclidean lemma operation for feature extraction can significantly improve recognition performance.

The key module of the proposed methodology is shown in Figure 1.

Diagrams

Background of the Invention

  1. Brænne I., Labusch K., Madany Mamlouk A. (2010) Sparse Coding for Feature Selection on Genome-Wide Association Data. In: Diamantaras K., Duch W., Iliadis L.S. (eds) Artificial Neural Networks – ICANN 2010. ICANN 2010. Lecture Notes in Computer Science, vol 6352. Springer, Berlin, Heidelberg
  2. Kirsch, R. A., Cahn, L., Ray, C., & Urban, G. H. (1957). Experiments in processing pictorial information with a digital computer. In Proceedings of the Eastern Joint Computer Conference: Computers with Deadlines to Meet, IRE-AIEE-ACM 1958 (pp. 221–229). Association for Computing Machinery, Inc. doi.org/10.1145/1457720.1457763

Based on digital data processing machine applications by  Kirsch R.A et al and Labusch K. et al to devise automatic character sensing equipment, we propose our novel method of feature extraction  for Optical Character Recognition

Summary of the Disclosure

Optical Character Recognition is electronic or mechanical conversion of images of typed, handwritten or printed text into machine encoded text. OCR depends on feature extraction and pattern recognition. Labusch et al. used sparse coding strategy and local maximum operation in his seminal research to extract feature and apply SVM as classifier. Inspired by the same approach we propose a novel method of character recognition. In our research we printed and handwritten alphabets from greek characters on white paper as background. We captured each character or glyph using mobile based camera to capture image of glyph in a frame. Optical character recognition module consists of following steps described in Figure 1. Character image capture, de-skew, binarisation, extraction of local coefficients, extract a number of euclidean lemma in an array. Statistical significance tests for comparison and last selection of appropriate classification, The module is represented by the following algorithm

// Choose an offset from 1 to 5 any value 
// Make an offset of bounding box innerside with an offset
// (minX, minY) --> (maxX, minY) top 
// (maxX, minY) --> (maxX, maxY) right 
// (minX, maxY) --> (maxX, maxY) bottom

// CAVEAT Can you see we can just iterate on line segments 4 of them with canny

// Loop over x coordinates of segment Left which matches minX + offset (constant for left) with canny set x coordinates (filter)

// Check how many of them matches y coordintes incremented in a loop till it reaches minY - offset (not constant)

Brief Description of the Drawings

Figure 1 describes the complete algorithm to describe a novel feature extraction method based on Labusch et al. for character recognition.