Page 1

HitBox Web Analytics Technology: The Future of Internet Intelligence

In the world of e-business, premium Web analytics is critical. Today, limited Web intelligence solutions based on the analysis of Web server log files are giving way to the sophisticated HitBox clientbased analysis technique pioneered by WebSideStory. HitBox collects traffic and visitor data directly from the browsers of individual users, analyzes this data in real time, and delivers valuable audience information to site owners over the Web – all as a convenient outsourced service. In this way, HitBox provides more detailed, more accurate information than log file analysis, gives site owners faster, more reliable access to that information, and does it all more cost-effectively.

10182 Telesis Court San Diego, California 92121 (858) 546-0040 www.WebSideStory.com


Introduction Early attempts to measure site traffic were based on the analysis of “log files” – files created by a Web server, containing every request received by the server from every user. Because the data in these files is virtually unintelligible to human readers, software tools were developed to analyze these files and generate reports containing useful information. Unfortunately, this approach has many shortcomings, which are growing more severe as Internet technology advances. Despite these shortcomings, log file analysis is still widely used. HitBox, an innovative technology pioneered by WebSideStory, provides far better Web analytics by collecting and analyzing extensive, detailed traffic and visitor information directly from the browsers, or “clients,” of individual users, and delivering this information to site owners over virtually any Web connection – all in real time. Through this “client-based” approach, HitBox offers a broader range of information, greater accuracy and timeliness, and easier infor mation access than log file analysis, as a convenient, cost-effective outsourced service. This paper compares the log file and HitBox Web analytics techniques and identifies many ways in which HitBox can give businesses an edge in measuring and enhancing the effectiveness of their sites.

Two contrasting techniques Log file analysis Log file analysis is a simplistic Web analytics technique, developed early in the evolution of Internet technology. In this technique, the server for a Web site generates a file containing every request received from every visitor to the site. Periodically, the site owner processes these files to extract meaningful traffic information. Figure 1 on the next page illustrates this process.

Copyright © 2000 WebSideStory, Inc. All rights reserved.

2


Web site users

1

1

2

Web server

3

Log file

4

Analysis

Traffic reports

Figure 1. Log file Web analytics

The log file analysis process shown in Figure 1 comprises the following steps: 1. Web users make requests to view specific pages on a site. These requests are received by the site’s Web server, which serves the requested pages back to the users. 2. The Web server also records each user request in a chronological log file. 3. Periodically, the Web site owner uses proprietary log file analys is software to extract useful information from the latest log files. 4. The analysis software generates traffic reports, which the site owner distributes to appropriate users in their organization.

Copyright Š 2000 WebSideStory, Inc. All rights reserved.

3


HitBox client- based analysis HitBox client-based audience analysis is an advanced technique, based on data collected directly from users. Whenever a visitor views a page on the site, special HitBox code sends information about the page and user to WebSideStory. When the site owner requests HitBox information for the site, HitBox quickly analyzes the data for the site to obtain the requested information. Figure 2 illustrates this process.

Web site users

1

2

1

2 3

Web server

3

WebSideStory database

Site owner’s personnel

Figure 2. HitBox client-based Web analytics

The HitBox analysis process shown in Figure 2 comprises the following steps: 1. Web users make requests to view specific pages on a site. These requests are received by the Web server for the site, which serves the requested pages back to the users. 2. When a page is displayed on a browser, special HitBox code that the site owner has embedded in the page sends a variety of anonymous information about the page and user to WebSideStory, where it is immediately integrated into the HitBox database for the site. 3. Users authorized by the site owner log onto the WebSideStory site and request HitBox reports. HitBox extracts the requested information from the database and serves it back to the requesting users.

Copyright Š 2000 WebSideStory, Inc. All rights reserved.

4


Log file analysis vs. HitBox Here are some of the significant differences between the log file and HitBox analysis techniques. Log file analysis

HitBox client-based analysis

§ Examines site visitors’ activities from the distant perspective of the Web server.

§ Examines site visitors’ activities from the close perspective of users’ browsers.

§ Limited to information contained in user’s Web server requests.

§ Provides additional information available from user’s browsers.

§ Requires time-consuming batch analysis to extract useful information.

§ Provides information on demand in real time.

§ Typically performed on the site owner’s system using proprietary software tools.

§ An all-in-one service that requires no IT resources.

§ Information delivered through static reports distributed by the site owner.

§ Information available anytime, through any Internet connection.

These differences give HitBox many advantages over log file analysis in the areas of information quality, information access, and efficient use of IT resources. These advantages are discussed below.

Information quality More statistics HitBox provides many statistics that cannot be obtained from log file analysis. Here are some examples. §

Paths: Due to the large quantity of data in typical log files, it is often difficult to identify individual visitors’ paths through a site. HitBox accurately identifies the path of every visitor.

§

Display characteristics: Understanding users’ system characteristics is important in site design. HitBox and log file analysis can both identify visitors’ operating systems and browser versions, but only HitBox can identify display characteristics such as resolution and color palettes.

§

Plug-ins: Understanding which browser plug-ins are popular among Netscape users is also important in site design. HitBox identifies the plug-ins in every Netscape browser.

§

Java status: HitBox and log file analysis can both identify the user’s level of Java support based on browser version, but only HitBox can determine whether Java support is enabled or disabled.

§

Cookie status: Many e-commerce sites rely on cookies, which can be a problem with browsers that are configured to reject cookies. HitBox can determine whether a browser is accepting or rejecting cookies.

Copyright © 2000 WebSideStory, Inc. All rights reserved.

5


Web crawlers Web crawlers (also called “spiders” or “robots”) are programs that surf the Web automatically, following hypertext links and scanning site content. Since crawlers are not actual users, their activities need to be excluded from Web analytics data. This is difficult with log file analysis: in order to identify the activity of a crawler, the log file analysis tool needs to know about the crawler, much as anti-virus software needs to know about a virus in order to detect it. Since there are thousands of crawlers – and new ones appear every day – log file analysis cannot identify every crawler. (In fact, recent studies by WebSideStory have identified hundreds of crawlers not detected by popular log-file analysis tools.) In contrast, since crawlers do not actually cause pages to be displayed on browsers, HitBox automatically excludes their activities.

Cached pages Many ISPs maintain proxy servers that store millions of pages copied from the Web. When a user requests a page stored on a proxy, the ISP delivers the page quickly from the proxy rather than using the Web server to actually retrieve the page from the Web, which can take much longer. Surveys by WebSideStory and others indicate that 14 to 20 percent of the page views for a typical site are served by proxies. Log file analysis cannot detect pages served by proxies, resulting in significant undercounting of page views. HitBox detects all displayed pages regardless of the source, giving accurate page-view counts.

Address pools Many ISPs have a pool of IP addresses that are dynamically assigned to indiv idual users. In this situation, a single user may use multiple IP addresses over time – even during a single visit to a site. Since log file analysis identifies individual users by their IP addresses, it cannot track a user whose IP address changes. As a result, counts of unique users and measurements of how long users spend on a site and on individual pages may be grossly inaccurate. In contrast, HitBox does not depend on the IP address to identify individual users, so it provides correct values for these statistics.

Dynamic content Web page content is often generated dynamically. A common example is a search-results page, whose content is generated on the fly in response to the user’s search. Log file analysis, relying on cryptic URLs, has difficulty identifying the type of page, let alone its dynamically generated content. With HitBox, site owners can add code to their CGI or dynamic -content generator to give HitBox a unique identifier for the page (such as a product-description page) as well as its dynamically generated content (such as a specific product ID). In this way, HitBox can provide accurate statistics for dynamic content.

Web server down Generally, when a Web server goes down, browsers will load requested pages from cache if possible. Log files do not record these page views because the server cannot detect pages served from cache, and also because the server may stop building the log file. Because HitBox data is gathered directly from browsers and analyzed by WebSideStory, HitBox records these page views accurately.

False page views Web surfers following familiar paths often jump between pages very quickly without viewing their content. Log file analysis cannot detect this situation, and consequently counts each jump as a page view

Copyright © 2000 WebSideStory, Inc. All rights reserved.

6


even though the user does not actually view the page, potentially resulting in significant overcounting of page views. Both log file analysis and HitBox can identify this situation by measuring time spent on a page. However, HitBox provides a unique alternative solution to this problem: The site owner inserts the HitBox code at the end at the HTML for a page, or following key content. If a user leaves the page too quickly, the HitBox code will not be loaded, in which case HitBox will not record a page view. This technique makes it possible to obtain more realistic page-view counts in this problematic situation without analyzing time spent on pages.

Information access Real time In the fast-paced world of e-business, timely Web intelligence is critical. HitBox collects, analyzes and delivers information in real time, enabling a site owner to observe traffic and visitor activity on the site as it happens. In contrast, although small log files can be analyzed fairly quickly, the large size of typical log files requires that they be analyzed in batch processing during off-hours, resulting in delays of a day or more.

Ready access In a typical e-business, many people in many functional areas need ready access to the company’s Web intelligence. Although log file analysis reports can be made available throughout an organization on a network, intranet or Web page, this information is static and typically at least one day old. With HitBox, authorized users throughout the company can access up-to-the-minute information anytime, over virtually any Internet connection.

Security and reliability Although most Web site owners attempt to maintain high levels of security and availability for their information systems, the resources required to ensure near-perfect security and availability are beyond the means of many site owners – adding significant risk to an in-house log file analysis system. Moreover, when the owner experiences a problem with system availability, other IS functions, such as sales, accounting and MIS systems, may compete with a log file analysis system for limited resources. With HitBox, all information is gathered, processed, delivered and warehoused through WebSideStory’s stateof-the-art infrastructure – giving the site owner extremely high levels of security and availability, and reducing contention when resources are limited.

IT Resources Hardware, software and staff Log file analysis is typically performed using proprietary software running on the site owner’s system, and additional software may be required to view the resulting reports. In addition, log files for a large site can be very large – gigabytes for a single day. (For this reason, many Web hosting providers do not

Copyright © 2000 WebSideStory, Inc. All rights reserved.

7


normally provide log files.) With all this, a company’s IT group can face significant cost and operational burden to: §

Pay the Web hosting provider to generate log files.

§

Acquire the CPU and storage capacity to run the software and manage log files.

§

License, install and upgrade the analysis and viewing software.

§

Provide and train staff to run the analysis software and manage log files.

§

Train employees to use the viewing software.

In contrast, HitBox requires no additional hardware, software or staffing whatsoever. WebSideStory handles all data processing and storage, and the only tools the company’s users require are the familiar Web browsers already installed on their workstations. This enables the site owner to devote all its resources to its real business.

Implementation For log file analysis, the analysis software (and possibly viewing software as well) needs to be installed and configured on the site owner’s system. In addition, IT staff and users need to be trained to operate and use the system. This can be a time-consuming and disruptive process. In contrast, to implement HitBox, the site owner simply inserts a small section of HitBox code into the code for each page on the site. This process can be performed automatically for fast implementation on large sites. Once this is done, people throughout the company can start using HitBox information immediately – typically in a day or two, with minimal impact on normal operations.

Expertise Vendors of log file analysis software generally provide some level of support for their customers, but to use a log file analysis system effectively, the site owner must acquire and maintain the expertise to operate, maintain and troubleshoot the entire system. HitBox requires little expertise of this type – aside from the simple task of adding HitBox code to the site’s pages, virtually all HitBox tasks can be performed easily by any user familiar with a standard Web browser. This simplicity enables the company to focus on using its Web intelligence instead of obtaining it.

Site performance Building log files requires a significant amount of CPU time and disk I/O on the Web server. In addition, analyzing log files, if done on the Web server, can consume a great deal of the server’s CPU time. This consumption of system resources can degrade site performance on a busy Web server. HitBox consumes none of these resources – since all data processing is performed by WebSideStory, the Web server’s entire capacity can be used to operate the site at peak performance.

System capacity Log file analysis systems are typically offered at a few pre-set levels of capacity, constraining the site owner in terms of the traffic volume and numbers of users that the system handles. Upgrading the system to increase its capacity can be costly, and the site owner may end up paying for too much capacity rather than settling for too little. With the HitBox service, the service level can be easily scaled to the site

Copyright © 2000 WebSideStory, Inc. All rights reserved.

8


owner’s needs, freeing the owner from the extra cost of upgrades or unused capacity and enabling the service to grow smoothly with the owner’s business.

Controlling cost Log file analysis typic ally requires the site owner to acquire significant hardware and software at the outset, and to upgrade periodically going forward. This can impose significant IT costs, often occurring unpredictably. In contrast, HitBox is provided for a simple monthly fee, with no up-front investment. This enables the site owner to control and predict costs effectively.

Conclusion It’s a familiar phenomenon in the Information Age: The appearance of new technology drives the development of new tools for using it. And in the evolution of these tools, early, limited techniques gradually give way to more sophisticated techniques with greater capabilities. This is just what’s happening in the field of Web analytics: log file analysis, a valuable but simplistic early technique for visitor traffic analysis, is being replaced by the client-based analysis technique pioneered by HitBox – a more advanced, more powerful solution. The HitBox technique is based on three key innovations: obtaining data from users’ browsers instead of Web server log files, processing and delivering traffic information in real time over any Internet connection, and providing these capabilities as a convenient outsourced service. By virtue of these innovations, HitBox provides better information than log file analysis, gives site owners faster, more reliable access to that information, and does it all more cost-effectively. Thanks to these advantages, HitBox’s audience-analysis technique promises to be the future of Internet intelligence.

WebSideStory is the world’s leading source of real-time Web analytics services for e-business. We offer a unique combination of technical and business capabilities: the innovative HitBox technology for real-time analysis of Internet traffic and e-commerce activity; a vast compilation of Internet usage data; and an expert service-provider business approach. We combine these capabilities in HitBox Enterprise, the industry’s most powerful high-volume Web analytics service, and a variety of related services, products and sites. Together, these offerings give e-businesses the Internet intelligence they need to enhance the effectiveness of their Web sites and maximize the return on their marketing investment.

Copyright © 2000 WebSideStory, Inc. All rights reserved.

9

hitbox  

differenza tra log analyzers e page tagging

Read more
Read more
Similar to
Popular now
Just for you