Detecting Paraphrases in Marathi Language

Page 1

ISSN (online) 2583-2026 BOHR International Journal of Smart Computing and Information Technology 2020, Vol. 1, No. 1, pp. 7–17 https://doi.org/10.54646/bijscit.003 www.bohrpub.com

Detecting Paraphrases in Marathi Language Shruti Srivastava and Sharvari Govilkar Department of Computer Engineering, PCE, University of Mumbai, India E-mail: shrutics2015@gmail.com; sgovilkar@mes.ac.in Abstract. Paraphrasing refers to the sentences that either differs in their textual content or dissimilar in rearrangement of words but convey the same meaning. Identifying a paraphrase is exceptionally important in various real life applications such as Information Retrieval, Plagiarism Detection, Text Summarization and Question Answering. A large amount of work in Paraphrase Detection has been done in English and many Indian Languages. However, there is no existing system to identify paraphrases in Marathi. This is the first such endeavor in Marathi Language. A paraphrase has different structured sentences and Marathi being semantically strong language hence this system is designed for checking both statistical and semantic similarity of Marathi sentences. Statistical similarity measure does not need any prior knowledge as it is only based on the factual data of sentences. The factual data is calculated on the basis of the degree of closeness between the word-set, word-order, word-vector and word-distance. Universal Networking Language (UNL) speaks about the semantic significance in the sentence without any syntactic point of interest. Hence, the semantic similarity calculated on the basis of generated UNL graphs for two Marathi sentences renders semantic equality of two Marathi sentences. The total paraphrase score was calculated after joining statistical and semantic similarity scores which gives the judgement of being paraphrase or non-paraphrase about the Marathi sentences. Keywords: Paraphrase, Marathi Language Statistical, Semantic, Sumo metric, Universal Networking Language (UNL).

1

retrieval query paraphrases are generated to retrieve better quality of relevant data. In question and answering system, in absence of question from database the answers returned for the question paraphrase is always helpful. The base of paraphrasing is semantic equivalence which gives alternative translation in the same language. For paraphrase detection it is necessary to study the possibilities of paraphrasing at each level. Mainly there are 3 types of surface paraphrases. Lexical level: Lexical paraphrases occur when synonyms appear in almost identical sentences. At lexical level, in addition to synonym lexical paraphrasing is described by hyperonymy. In hyperonymy one word has many alternatives but only one of the words is more general or specific than the other one. ) Example – synonyms (solve and resolve), (पु – सुता, Hyperonymy (reply, say), (landlady, hostess) {घर} has {सदन, शाला, आलय, धाम} hypernym S1: मोदी सरकारन दोन पयाय समोर ठेवले आहेत. S1: The Modi government has put forward two options. S2: मोदी सरकारन दोन े मांडले आहेत. S2: The Modi government has given two options.

Introduction

Paraphrase is the translation of a sentence or a paragraph into same language. Paraphrasing occurs when texts are lexically or syntactically modified to appear different texts, but retaining the same intention. Paraphrase can be generated, extracted and identified. Paraphrase extraction involves collection of different words or phrases which express same or almost same meaning. Vocabulary plays an important role in paraphrase extraction. Paraphrase extraction helps in paraphrase generation in which paraphrases are created. Paraphrase generation involves not only dictionary exercise but also changing the information sequence and grammatical structure. Paraphrase identification is method of detecting the variety of expressions which conveying same meaning. It presents a major challenge for numerous NLP applications. In automatic summarization, identifying paraphrases is necessary to find repetitive information in the document. In information extraction, paraphrase identification provides most significant information whereas in information 7


Turn static files into dynamic content formats.

Create a flipbook
Issuu converts static files into: digital portfolios, online yearbooks, online catalogs, digital photo albums and more. Sign up and create your flipbook.
Detecting Paraphrases in Marathi Language by BOHR International Journal of Smart Computing and Information Technology - Issuu