How do your company’s pricing models compare to that of competitors? Are customers making it to your site’s deep links or
leaving shortly after visiting the home page?
In-depth answers to these questions can be found through mining the Web — that is, discovering and analyzing Web page content,
descriptions found in Web documents, overall Web structure, and Web site usage and access patterns.
Web mining is an externally focused relative of business intelligence. Retrieving data outside the firewall can be done via
agent technology, by tapping into Web site logs or by adding data retrieval methods into Web site applications. IT managers
can turn to their existing data-mining tools to examine structured Web data and also use text-mining tools to examine unstructured
Web data.
Eyeballs count. In setting up the Web-mining process, first define the business problem and the types of information desired. For example,
with competition fierce for site visitors’ time and attention, comparing link counts and page rankings of your company’s Web
site to others can affect the number of page views and, ultimately, revenue. This data can be uncovered by mining search engine
data either via text-mining tools or through a data-mining wrappering strategy.
Analyze page weighting within your company’s sector to see which companies are most effectively drawing visitors and achieving
high search-engine ranking. Then examine the content, site structure, and page layout of high- and low-ranking companies.
Finally, consider taking a broader view, analyzing the Web as a whole and examining those sites that are the most effective
in terms of traffic and page rankings.
Likewise, analyzing the structure of your Web pages can yield useful insights. Using available tools, you can analyze the
number of links into and out of various content. And usually, the more links, the more useful the content.
Looking inside. Do visitors to your site hit the main page, but seldom go any deeper? Access trends can pinpoint a site structure that may
need to be redesigned to increase traffic. The same tools and techniques used to mine outside the firewall can reveal how
customers interact with your site. Analysis of this information might lead you to provide precise content dynamically, choose
tight or loose site structure, or opt for customized services, such as online customer representatives.
Web server logs can yield some of the information needed to perform usage and access analysis of your site. But additional
data gathering with third-party tools or in-house scripting programs may be needed to capture enough elements to make the
analysis useful.
Inside or out? Data gathering for Web-content mining can be handled in-house, but a fair number of service providers can also tackle the
task and may offer the capability of notifying you when content changes. You might consider using a service provider when
large data sets are involved to reduce the overhead on your network when gathering data.
Quite a few commercial and open source tools exist to assist with Web mining efforts. For example, NetGenesis from SPSS collects
and analyzes Web data and transforms it into useful metrics; and QL2 Software’s WebQL includes a development interface, querying
capabilities, and a deployment engine to extract the data needed.
Web mining extends data mining beyond the corporate walls. And including the Web in your mining strategy can improve your
Web presence and increase your competitive intelligence.