International Research Journal of Engineering and Technology (IRJET)
e-ISSN: 2395-0056
Volume: 08 Issue: 09 | Sep 2021
p-ISSN: 2395-0072
www.irjet.net
Automated Text Summarization of News Articles Varun Deokar1, Kanishk Shah2 1Student,
Dept. of Information Technology, Vidyalankar Institute of Technology, Maharashtra, India Dept. of Computer Science, D.J. Sanghvi College of Engineering, Maharashtra, India ---------------------------------------------------------------------***---------------------------------------------------------------------2Student,
Abstract - This paper focuses on a technique to
also use the same article to generate summaries by different models and pick your favorite one. By automating the entire process, time and energy can be saved.
automatically summarize news articles from the web by just typing in keywords related to your article. It uses the transformers model and compares the Bart and T5 model to see which model helps generate the best summaries on average. After passing around 1000 articles of data, we found out that the Bart model outperformed the T5 model in every aspect albeit the difference was not very large. This tells us that for mid-sized news articles, the Bart Model is better than the T5 model when it comes to text-summarization.
2. Existing Methods and Drawbacks 2.1 Manual Selection of Important points
Key Words: text-summarization, bart, web-scraping, t5, automation, news-summarization
Many websites do the whole process manually. However, this process is tedious and time consuming. It involves not only going through multiple articles from different sources and comparing them, but also manually selecting the important points from the articles. Due to the large number of manual steps, this method of news summarization takes a comparatively large amount of time and energy
1.INTRODUCTION
2.2 Presence of irrelevant content and ads
Scraping articles online comes with the drawback of picking up on multiple advertisements and links to other unrelated articles. This useless information can end up in the summary leading to a poor summary generation. We use the newspaper3k package in python to scrape only the relevant information of the website while ignoring ads and other unimportant content.
With the advent of the internet, large quantities of news articles have flooded the internet. However, articles are often long and roundabout, or simply have catchy titles for increasing viewership. A summary of a news article could deliver the same message in a straightforward manner, saving time and energy. Further, news summaries can give users a quick preview of the main points of the content, giving them the opportunity to understand the gist of an article and read the whole article only if it is what they want. Previously, text summarization has also found applications in product reviews, and fiction and nonfiction books[1],[2],[3].
2.3 Text summarization by humans
Conventionally, humans produce summaries of large articles, but this takes up a lot of time and money. While a model might not produce a summary as coherent and concise as a human, it can produce tens of thousands of summaries where a human would produce one. Thus, being more advantageous in the long run.
With the necessity of summarizing only the important information and identification of these informative points, news summarization has become a perplexing task. That being said, with the advancements taking place in Machine Learning, Artificial Intelligence, and Natural Language Processing, research on text summarization has evolved exponentially. A text summarization model that can instantly give a summary of any important topic entered as a query by the user, such as - ‘Coronavirus’, ‘Latest News about Afghanistan’, or a summary of any specific article online by passing the link to that particular article can be invaluable in today’s world as you can gain a general idea about any topic within seconds. An automated text summarizer has many benefits, and by removing multiple tedious steps such as: finding an article from a trusted source. This process is tedious and time-consuming.
3. Our Approach 3.1 Dataset We have made use of the BBC News Summary[4] dataset, also available on Kaggle. This dataset has news articles grouped into 5 classes – ‘business’, ‘entertainment’, ‘politics’, ‘sport’, and ‘tech’. There are also summaries provided for each article in each class. This dataset cumulatively consists of more than 2200 articles with summaries.
A complete integrated system that automates the news fetching and summarization process ensures the quick delivery of summarized articles. Furthermore, comparing different text summarization models will give us the best model to use to get the most relevant summary. You could
© 2021, IRJET
|
Impact Factor value: 7.529
|
ISO 9001:2008 Certified Journal
|
Page 1908