April 14, 2006

Google Sitemaps provides better, fresher crawls of Web pages

Free beta service helps you find and fix problems that hurt your search rankings

Web analytics products such as ClickTracks and HitsLink indicate how sites fare in search rankings, but they do little to help fix poor-performing Web properties. Google Sitemaps fills that void.

Currently in beta, Google Sitemaps allows you submit URLs to the Google index and inform Google when these pages change. Certainly, having Google index deep pages benefits many organizations; e-tailers, for example, could increase revenue because potential buyers find items that might ordinarily fall into an indexing crevasse.

Google Sitemaps is complemented by Sitemap Generator, a Python script for creating sitemap files. These contain the actual list of URLs in XML. Google also accepts sitemap files in OAI-PMH (Open Archives Initiative Protocol for Metadata Harvesting) or text.

Sitemap Generator performed well in my tests. It works by scanning URL lists, access logs, or directory paths. Moreover, pages you submit can be for traditional Web sites plus those that serve content for mobile devices. Flexibility aside, Generator runs on the Web server and requires that you control it though a CLI -- a scenario some IT departments will not allow. However, Google lists various utilities and code snippets (many free) that could be used for building sitemaps remotely.

Sitemaps provides very helpful reports. I displayed the top queries on Google that returned pages from my site. Additionally, I saw how many of those results directed traffic to my site. Another report lists common words in your site content.

Equally helpful are troubleshooting reports, including the percentage of pages that were crawled successfully and pages that the Googlebot could not access. Turning to the robots .txt analysis, I quickly discovered that a certain section of my site was inadvertently blocked. Even better: You can simulate changes to the robots.txt file, see how Google crawlers will react, and make sure there are no errors before changing the robots.txt file on your site.

Similarly, Sitemaps’ error report lists pages and the specific problem Googlebot encountered. Fixing these issues -- such as broken redirects -- can quickly increase your crawl coverage.

Google Sitemaps Beta is available for free.

Mike Heck is a contributing editor of the InfoWorld Test Center.
Close

On Twitter now

Data management

Powered by Twitter

On Twitter now

additional resources
White Paper - How to Improve Delivery of Advanced Web Applications

White Paper

Virtual Workforce: The Key to Expanding The Business While Cutting Costs

Get the independent advice and expertise you need to support a virtual workforce.

Go inside:
The three-step approach to making a virtual workforce a reality.
The four flavors of client virtualization technologies.
The three key initiatives that solve IT challenges.
Download now »
White Paper: Successfully Secure Your Wireless LAN With Wi-Fi firewalls.

White Paper

Addressing Linux Threats Leveraging Fewer Resources

The increase in Linux popularity has increased the frequency and sophistication of malware attacks. Read this 2 page white paper now to learn how you can protect your Linux environment with real-time protection that is certified by all major Linux vendors.

Download now »
White Paper - The 2009 Handbook of Application Delivery

White Paper

The 2009 Handbook of Application Delivery

Ensuring acceptable application delivery will become even more difficult over the next few years. As a result, IT organizations need to ensure that the approach that they take to resolving the current application delivery challenges can scale to support the emerging challenges. This handbook elaborates on the key tasks associated with planning, optimization, management and control and provides decision criteria to help IT organizations choose appropriate solutions.

Download now »
White Paper - Is Your Backup System Outdated?

White Paper

Mid-range Storage Considerations

A common misconception is that mid-range storage requirements are dramatically different than that of a larger enterprise. Mid-range storage users may require less capacity, but they have similar functionality and management requirements. This ESG paper examines mid-range storage needs and reviews a new solution that adjusts size while retaining value, performance and functionality.

Download now »

Sign up to receive Data Management Resource Alerts

Subscribe to the Today's Headlines: First Look Newsletter

Find out what will be news for the day, with our first-thing-in-the-morning briefing.

©1994-2010 Infoworld, Inc.