UCSC Digital Library >
Bachelor of Computer Science (BCS) >
SCS Individual Project - Final Thesis (2013) >

Please use this identifier to cite or link to this item: http://hdl.handle.net/123456789/2469

Title: Named Entity Recognition For Sinhala Language
Authors: Dahanayaka, J.K.
Issue Date: 20-May-2014
Abstract: Today with the vast growth of technology and information content, there is a need of retrieving the required information more efficiently out of the huge unstructured contexts with own native languages. To fulfill that need Natural Language Processing related research areas such as Information Extraction, Machine Translation, Information Retrieval and Automatic Summarization are essential. In all those areas Named Entity Recognition is one of the preliminary task that has to be performed. However it is challenging to build a proper Named Entity Recognition (NER) System especially for Indic languages because of the features inherited. Sinhala language, mother tongue of Sri Lanka belongs to Indo Aryan branch of Indic language family, still has not any proper NER system to be use in its Machine Translation and Information Extraction tasks. Although Latin languages like English having far better NER solutions, Sinhala could not apply them directly as those systems use capitalization as a major crucial feature which Indic languages misses. Since there have not much previous work based on NER for Sinhala, the concept and the needed resources has to be built from them sketch. It is believed that there will be a higher probability about the applicability of the algorithms used for Indian languages to Sinhala language too. So this dissertation tries to find out the effectiveness of using data-driven techniques to detect NEs in Sinhala text. Two data-driven techniques, Conditional Random Fields and Maximum Entropy model has been tried out. To improve the performance language dependant as well as language independent features in Sinhala text were added. Conditional Random Fields model outer performs well expressing high precision, reasonable recall and f-measure respectively 91.64%, 69.34%and 78.95% while Maximum Entropy model expressed 81.71%, 51.34% and 63.06%.
URI: http://hdl.handle.net/123456789/2469
Appears in Collections:Jinadi
SCS Individual Project - Final Thesis (2013)

Files in This Item:

File SizeFormat
9000151.pdf366.77 kBAdobe PDFView/Open

Public View:

File Preview

View Statistics

Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.


Valid XHTML 1.0! DSpace Software Copyright © 2002-2010  Duraspace - Feedback