Data mining outside the firewall

Mine the Web

How do your company’s pricing models compare to that of competitors? Are customers making it to your site’s deep links or leaving shortly after visiting the home page?

In-depth answers to these questions can be found through mining the Web — that is, discovering and analyzing Web page content, descriptions found in Web documents, overall Web structure, and Web site usage and access patterns.

Web mining is an externally focused relative of business intelligence. Retrieving data outside the firewall can be done via agent technology, by tapping into Web site logs or by adding data retrieval methods into Web site applications. IT managers can turn to their existing data-mining tools to examine structured Web data and also use text-mining tools to examine unstructured Web data.

Eyeballs count. In setting up the Web-mining process, first define the business problem and the types of information desired. For example, with competition fierce for site visitors’ time and attention, comparing link counts and page rankings of your company’s Web site to others can affect the number of page views and, ultimately, revenue. This data can be uncovered by mining search engine data either via text-mining tools or through a data-mining wrappering strategy. 

Analyze page weighting within your company’s sector to see which companies are most effectively drawing visitors and achieving high search-engine ranking. Then examine the content, site structure, and page layout of high- and low-ranking companies. Finally, consider taking a broader view, analyzing the Web as a whole and examining those sites that are the most effective in terms of traffic and page rankings.

Likewise, analyzing the structure of your Web pages can yield useful insights. Using available tools, you can analyze the number of links into and out of various content. And usually, the more links, the more useful the content.

Looking inside. Do visitors to your site hit the main page, but seldom go any deeper? Access trends can pinpoint a site structure that may need to be redesigned to increase traffic. The same tools and techniques used to mine outside the firewall can reveal how customers interact with your site. Analysis of this information might lead you to provide precise content dynamically, choose tight or loose site structure, or opt for customized services, such as online customer representatives.

Web server logs can yield some of the information needed to perform usage and access analysis of your site. But additional data gathering with third-party tools or in-house scripting programs may be needed to capture enough elements to make the analysis useful.

Inside or out? Data gathering for Web-content mining can be handled in-house, but a fair number of service providers can also tackle the task and may offer the capability of notifying you when content changes. You might consider using a service provider when large data sets are involved to reduce the overhead on your network when gathering data.

Quite a few commercial and open source tools exist to assist with Web mining efforts. For example, NetGenesis from SPSS collects and analyzes Web data and transforms it into useful metrics; and QL2 Software’s WebQL includes a development interface, querying capabilities, and a deployment engine to extract the data needed.

Web mining extends data mining beyond the corporate walls. And including the Web in your mining strategy can improve your Web presence and increase your competitive intelligence.

From CIO: 8 Free Online Courses to Grow Your Tech Skills
Join the discussion
Be the first to comment on this article. Our Commenting Policies