UCSC Digital Library >
Computer Science Masters >
Master of Computer Science - 2017 >
Please use this identifier to cite or link to this item:
|Title: ||Sinhala Intelligent Word Recognition with Content based Search Suggest|
|Authors: ||Kahandagamage, K. S.|
|Issue Date: ||2017|
|Abstract: ||Optical Character Recognition is a computer science approach to resolve offline character
recognition problem. More advanced approaches like Intelligent Character Recognition and
Intelligent Word Recognition are suitable to deal with unconstraint and cursive handwriting.
Intelligent Word Recognition approach tries to recognize entire word than individual letters
and good approach to process real world documents with unconstraint (free-form), cursive and
This research mainly focus on identifying multiline, unconstraint, cursive and incomplete
Sinhala words in offline mode with higher accuracy. Identify word lines from scanned image
and segment them into primitive components (character or its parts) are considered as prepro-
cessing. Image processing methods are used to remove noise, remove frames and underlines,
correct skews and slant which increase the accuracy of recognition.
Context-free, analytical approach is used to yield a optimum letter string in recognition.
Optimum letter string is retrieved by classifying gradient features of a character. 8 directions
are considered for feature extraction.
Search suggest algorithm with Ayurveda content based corpus is used in post processing to
identify words. Natural language processing methods are used to match words by correcting
misspelled and incomplete words. Scope of the research is limited to Ayurveda domain but can
be extended to any other domain by simply plugging a specific corpus. Prescriptions written
by Sinhala PaaramparikaWeda-Mahathwaru are used to validate system accuracy and achieved
|Appears in Collections:||Master of Computer Science - 2017|
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.