Web Scraping using Python

Page 1

Web Scraping using Python Web Scraping using python – a technique which can be used to extract a large amount of data from websites using some programs or applications and save it to your computer or to a database for further use. It is a technique to automate the process of collecting data from any website instead of collecting data manually. Whenever any website that doesn’t have their API to pull data for the user then web scraping techniques can play an important role. The beauty of web scraping is that you can scrap almost any content that viewed on a web page. These days’ web scraping solutions are in the range from traditional way of manual effort, semiautomated to fully automated scraping. Automated web scraping is often done using custom scripting or automation tools. Python is a powerful scripting language for web scraping. Codes written in Python can be connected to website from where we want to pull data. Some big websites like Google, Twitter, Amazon, etc. having different APIs which allows third party tools to pull data from their website with some terms & conditions. So, mining these websites are not a tough call under some finite range of data provided you have an expert support. After completing that range, they charge for extra data. Scraping these websites using hard coding without their API will not be a wise decision. It may be a cause of legal issues or even blocking your IP. In this article we will mainly focus on second type of websites that haven’t any API to pull data from their websites. To pull data from these types of website we use hard coding or web scraping software. Here we will see about that hard coding and how python is powerful for this purpose. Python is a scripting language which can be used for various purpose, especially in big data python is used very frequently due to its user friendly characteristics. Python is the most used language for scripting web scrapping. There are many packages available in python which supports web scrapping. Some of them are:

Amazon API Wrapper This module offers a light-weight access to the latest version of the Amazon Product Advertising API without getting in your way. An object oriented interface to Amazon products which supports both item search and item lookup. Using this package you may pull Amazon product data from Amazon website.

GoogleScraper A module to scrape and extract links, titles and descriptions from Google search results.

Flipkart This module help you in book search on Flipkart.

Scrapy


Turn static files into dynamic content formats.

Create a flipbook
Issuu converts static files into: digital portfolios, online yearbooks, online catalogs, digital photo albums and more. Sign up and create your flipbook.
Web Scraping using Python by AIMLEAP - Outsource Bigdata - Issuu