Page 1

Thelan Web-Scraping Macros using Excel VBA Introduction: The coding language used to write macros in Excel, visual basic, is also used to control all other Microsoft applications. For example you could open any Microsoft Office program and transfer data into Excel, allowing for more thorough analysis. Internet Explorer falls into this category as well, and a relatively basic program can open the browser, navigate to any specified URL and manipulate the data contained in the webpage. This means tables, lists or just one specific piece of data can be easily transferred from a busy webpage to your worksheet in the format you desire just by calling your function. The rest of the article will go into more detail on how this is done.

About Thelan Founded in 2011, Thelan exists to support its clients across multiple industries worldwide with tested project management expertise. We are a professional services firm whose project management consultancy practice – with core competencies in Strategy, Planning, and Implementation – is at the center of our service offerings. At Thelan, highperforming, well-trained practitioners help clients achieve their objectives by providing personalized service defined by consistent quality. The mission of our company is honored every time our people exceed the expectations of our clients.

THELAN | Washington, DC | www.thelanconsulting.com


Thelan Before you can start writing your function: To write a function in Excel, begin by navigating to the developer tab (unhide it in Excel properties if needed), click the Macros icon and pick a name for your new program.

You will also need to include Internet Explorer related references within the developer environment. This will allow Excel to recognize all of the Internet Explorer and HTML commands in the code. Open the ‘References’ window within the ‘Tools’ menu, and check boxes ‘Microsoft HTML Object Library’ and ‘Microsoft Internet Controls’. Click ‘OK’.

THELAN | Washington, DC | www.thelanconsulting.com


Thelan The first step is to open the Internet Explorer application. We will call the object referring to the application “ieApp” and set it to visible so we can see what is happening. Sub webScraping () Dim ieApp As Object Set ieApp = CreateObject("InternetExplorer.Application") ieApp.Visible = True End Sub

This should simply open an empty Internet Explorer page. We will now need to navigate to the webpage of our choice. We can do this using the command ieApp.Navigate“copyYourAddressHere”

Internet Explorer will start loading the page, it is important to wait until the loading is done before issuing any other commands, as it may crash the program. Do While ieApp.Busy: DoEvents: Loop Do Until ieApp.readyState = READYSTATE_COMPLETE: DoEvents: Loop

We will be using a marketwatch.com webpage (Dow Jones live index) for this example, your full code should look like this so far:

Sub webScraping() Dim ieApp As Object Set ieApp = CreateObject("InternetExplorer.Application") ieApp.Visible = True ieApp.navigate "http://www.marketwatch.com/investing/index/DJIA" Do While ieApp.Busy: DoEvents: Loop Do Until ieApp.readyState = READYSTATE_COMPLETE: DoEvents: Loop End Sub

THELAN | Washington, DC | www.thelanconsulting.com


Thelan Now that the page is fully loaded, the data we want is ready to be transferred into excel. The way we tell the program which pieces of information we are interested in is by looking at their HTML/CSS tags, classes, IDs … General knowledge of how this type of naming works will make it much easier to quickly pin-point the desired data, especially when the HTML is long and complicated.

It is often easier to use Google chrome than Internet explorer when trying to find how elements are named within the HTML. Just right-click the number or text and choose the ‘Inspect’ option. For example, when trying to find the index quote on our marketwatch.com, we see the number is contained inside a div with class name “lastprice”

THELAN | Washington, DC | www.thelanconsulting.com


Thelan The object ieApp.Document.getElementsByClassName(“lastprice”) will return the list of all elements within the HTML document with the class name “lastprice”. The object ieApp.Document.getElementsByClassName(“lastprice”)(0) will return the first element of that list. And the object ieApp.Document.getElementsByClassName(“lastprice”)(0).innerText will return all basic text with that element, the index quote in our case.

To put this data in our Excel worksheet, simply set it as the value of any cell; for example: Range(“A1”).value = ieApp.Document.getElementsByClassName(“lastprice”)(0).innerText

THELAN | Washington, DC | www.thelanconsulting.com


Thelan It is also possible to fill several cells with a list of data points. Using all of the elements within a class instead of just one, we can populate a column or row in our Excel worksheet. Admit we want to import the list of the Dow Jones Index Components symbols and last prices into two separate columns of our worksheet. The symbols have class name “quotelist-symb”, the actual prices have class name “quotelist-last bgLast”. It is important to follow the same capitalization as the HTML source. Here is some code to do this: Dim i As Integer Dim symbolList As Object Dim priceList As Object Set symbolList = ieApp.document.getElementsByClassName("quotelist-symb") Set priceList = ieApp.document.getElementsByClassName("quotelist-last bgLast")

i=1 For Each elem In symbolList Range("C" & i).Value = elem.FirstChild.innerText i=i+1 Next elem i=2 For Each elem In priceList Range("D" & i).Value = elem.innerText i=i+1 Next elem

We store the symbol names and prices in two lists we call symbolist and pricelist. We then go through each element of each and store the values in cells of columns C and D. We use the ‘FirstChild’ property to avoid some excess text within the symbolList elements.

THELAN | Washington, DC | www.thelanconsulting.com


Thelan Conclusion:

This article presented a brief explanation of how to start using web scraping with Microsoft Excel. The first part of the example code, which navigates to the webpage, can be used for most scenarios. The tricky part is then identifying what you need to take and how to communicate that to the program. No webpage is the same, this is why some trial an error will be necessary. If no solution presents itself, is often possible to find specific answers online, you are unlikely to be the first to ask a question. Once all the desired data is in the worksheet the intended way, the powerful analytical tools of Microsoft Excel can be used to answer questions, or to format it in a visualization. You can even program your macro to run automatically every hour, five mutes or thirty seconds‌ to keep your results up to date. Another thing to keep in mind is that this language can be used in most Microsoft applications, which means web scraping can be done with similar code to import data into Microsoft Word, Microsoft PowerPoint, Microsoft Project‌

THELAN | Washington, DC | www.thelanconsulting.com

Profile for Thelan, Inc.

Webscraping  

Web-Scraping Macros using Excel VBA

Webscraping  

Web-Scraping Macros using Excel VBA

Profile for thelan
Advertisement