Khmer Optical Character Recognition Initiative

The Khmer OCR project has started since July 2008. With full training, two engines of PLC’s OCR system were created. They can recognize the scanned documents with good quality in normal format with no table, picture, column, bold, italic and underline printed in either Limon S1 or Limon R1 with size 22. As a result, the final outputs are in ASCII text which can be easily converted into Khmer Unicode with the implementation of the existing Conversion library, one of the libraries developed during Phase I of PAN Localization projects. The outputs can also be saved into two document formats: Microsoft Word 2003 (*.doc) and Text document (*.txt).

http://www.pancambodia.info/index.php

Leave a Reply