1380

Page 3

92 % of accuracy. Besides that, the related work [3] also used statistical approach for tagging Indonesian language and yet has achieved an average accuracy of 80 %. After Brill’s Tagger has been introduced, it has become an approach which give similar or even better than those two approaches mentioned above. Brill's approach is also known as a Tranformation-Based Error Driven Learning where it transform tags based on rules applied. In this approach, a tag will be assign to each word and transform using a set of rules. These rules will be applied over and over to transform the incorrect tags into correct tags until there is no more rules can be applied. Besides that, it is also known as self learning where it uses a comprehensive technique known as TEL and rule templates instead of pure statistical. Two of the related works which used Brill's approach are [1] and [6]. In related work [1], they used Brill's approach to tag Polish language and has achieved 89.2 % of accuracy. In related work [6], they used Brill's approach to tag Greek language and has achieved 95% of accuracy. Brill’s approach is chosen to develop a Kadazan POS tagger because of its better performance showing good results for tagging not only English language but also for other languages such as Polish and Greek. IV. BRILL’S TAGGER FOR KADAZAN Kadazan POS tagger has been divided into four phases. The first phase is where the text or corpus will be inputted into the system in order tag all the words inside the corpus. The second phase begins when the corpus will go through the initial state annotater to tag all the words to its most likely tag based on the lexicon. The output of this process is the temporary corpus. In third phase, the temporary corpus will be compared to the goal corpus (manual tagged corpus) to detect if there is any errors occurred. Lastly, in phase four, the lexical and the contextual rules will be applied to correct the errors that occurred before. Figure 1 shows the overall model for Kadazan POS Tagger based on Brill's approach. Input Unannotated Corpus/Text to to assign

Initial State Annotater produce

Most Likely Tag to words (initial tag) based on lexicon

output Temporary Corpus compare apply

Error

Lexical/Contextual Rules

Goal Corpus

Correct Tags Corpus

Figure 1. Kadazan POS tagger model based on Brill's Approach

A. The First Phase The first phase of tagging begins by inputting an annotated text into the system. Figure 2 shows the diagram of the first phase.

Figure 2. First phase

77


Issuu converts static files into: digital portfolios, online yearbooks, online catalogs, digital photo albums and more. Sign up and create your flipbook.