Free Newsletters
Technology & Business Daily

InfoWorld
Log-in | Register

Google Sitemaps provides better, fresher crawls of Web pages

Free beta service helps you find and fix problems that hurt your search rankings

By Mike Heck
April 14, 2006
 

Web analytics products such as ClickTracks and HitsLink indicate how sites fare in search rankings, but they do little to help fix poor-performing Web properties. Google Sitemaps fills that void.

Free IT resource

Hear how top CIOs turn change into a competitive advantage.

Sponsored by HP

Free IT resource

Attend the SOA Executive Forum: Breaking SOA Bottlenecks SOAExecForum.com/may2007

Sponsored by InfoWorld

Currently in beta, Google Sitemaps allows you submit URLs to the Google index and inform Google when these pages change. Certainly, having Google index deep pages benefits many organizations; e-tailers, for example, could increase revenue because potential buyers find items that might ordinarily fall into an indexing crevasse.

Google Sitemaps is complemented by Sitemap Generator, a Python script for creating sitemap files. These contain the actual list of URLs in XML. Google also accepts sitemap files in OAI-PMH (Open Archives Initiative Protocol for Metadata Harvesting) or text.

Sitemap Generator performed well in my tests. It works by scanning URL lists, access logs, or directory paths. Moreover, pages you submit can be for traditional Web sites plus those that serve content for mobile devices. Flexibility aside, Generator runs on the Web server and requires that you control it though a CLI -- a scenario some IT departments will not allow. However, Google lists various utilities and code snippets (many free) that could be used for building sitemaps remotely.

Sitemaps provides very helpful reports. I displayed the top queries on Google that returned pages from my site. Additionally, I saw how many of those results directed traffic to my site. Another report lists common words in your site content.

Equally helpful are troubleshooting reports, including the percentage of pages that were crawled successfully and pages that the Googlebot could not access. Turning to the robots .txt analysis, I quickly discovered that a certain section of my site was inadvertently blocked. Even better: You can simulate changes to the robots.txt file, see how Google crawlers will react, and make sure there are no errors before changing the robots.txt file on your site.

Similarly, Sitemaps’ error report lists pages and the specific problem Googlebot encountered. Fixing these issues -- such as broken redirects -- can quickly increase your crawl coverage.

Google Sitemaps Beta is available for free.





 


 
Mike Heck is a contributing editor for the InfoWorld Test Center.
 

TOP NEWS:


»  AMD refreshes low-power Quad-Core Opterons lineup
Low-power Quad-Core Opteron chips have an average power consumption of 55 watts

»  RIM's BlackBerry Bold beats Apple to the 3G punch
The well-connected BlackBerry Bold 9000 supports tri-band HSDPA and quad-band EDGE, 802.11a/b/g Wi-Fi, stereo Bluetooth, and both assisted and autonomous GPS

»  Can Sun rejuvenate Java?
Promised technologies begin to emerge that could finally make JavaFX a more serious competitor to Flex, Silverlight, and scripting

»  Rich Web development: Is the browser doomed?
The Web is evolving into a full-fledged app-delivery platform, calling into question the browser's ability to fulfill the needs of today's rich Internet apps

»  You don't know tech: The InfoWorld news quiz
Match your weekly tech news wits against our snarky quiz master

»  Spinning off fabs would be risky for AMD, analysts say
AMD has expressed a desire to control chip-manufacturing costs, which has created speculation that the company might sell off its chip fabrication plants




BRINGING PERFORMANCE VALIDATION "INTO THE LIFECYCLE"
Today's enterprise apps are complex and ever-changing, which makes delivering high performance difficult. By virtualizing the behavior of application services and data in a VSE, teams can answer this challenge with validation best practices and test tools to ensure solid performance throughout the lifecycle. Register now to attend this webcast! Sponsor: ITKO

»  Click here to view this Webcast
  The Data Protection You've Been Looking For
Enterprise data is of supreme importance. If you can't find it quickly, it's worthless. If you lose it, it's a crisis. This IT Strategy Guide explores how to keep your data safe.

»  Click here to download now

- Special Advertising Partners -
WHITE PAPERS
 
  • Virtually Limitless Virtual Storage - Do you need virtualization space savings of 50% or more with virtually no performance impact? You might be able to get storage...
  • Invisible IT? - The goal of IT is to become an invisible entity within a larger organization. Eliminating visibility and road blocks IT ...
  • It Really Is Easy to be Green - "Green IT" is a popular concept. And IT organizations are learning the influence that IT purchase decisions have on data...
  • Key Strategies For SOA Testing - SOA requires a unique approach to testing. Unless you're willing to reorient your testing procedures and technology now,...
  • Eliminate Botnet Security Risks - Botnets are widely regarded as the top threat to network security. This Whitepaper explains how botnets have traditionally...
  • Zero Day Protection For Your Network - Zero day attacks are a growing threat because they pass undetected through conventional signature-based defenses. Rather...

» Technology White Papers Library

Technology White Papers by Topic

Technology White Papers E-mail Alert

Find out when the latest white paper is available:
 
 
INFOWORLD MARKETPLACE
 
» BUY A LINK NOW
 
SEE ALSO
• ClickTracks and HitsLink cull Web site stats without the stress
• Is your Web site performing?


FIND PRODUCTS AND COMPANIES
» COMPLETE PRODUCT GUIDE



TECHNOLOGY INDEX
• Applications
• Application Development
• Security
• Networking
• Wireless
• Platforms
• Hardware
• Data Management
• Storage
• Web Services
• Business
• Telecom
• Professional Services
• Standards

TECH WATCH 


What's the 411 on GOOG-411?
Just as Google has become synonymous with "performing a Web search," 411 is understood to mean "information" -- as in "what's the 411?" I was thus surprised to discover, from a billboard, no less, that the king of search is taking on the ...

Apple HTML source reveals 'iPhone Extreme'
"This one's a stretch..." reports AppleInsider. Um, yeah. Reporting on HTML code sightings of product names could be called a stretch, but iPhone Extreme has a ring to it. Now, that sounds like the product Apple should have released first, rather ...

COLUMNISTS

Unified under law
Ephraim Schwartz's Column and Blog (InfoWorld) - In the litigious world we live in, deploying a unified communications platform in your enterprise could...
» MORE COLUMNISTS

MORE INFOWORLD BLOGS


Open Sources 
Product Management
When I joined MySQL four years ago, there was quite a lot of debate about product management. We didn't actually have ...

Zero Day 
Botnet herders tending smaller flocks
New research backs up the theory that botnet operators are keeping their networks smaller in a continued effort to keep ...



• Advice Line
• Database Underground
• The Deep End
• Enterprise Mac
• Geeks in Paradise
• Grid Meter
• The Gripe Line
• InfoWorld Daily
• Inside IT
• IT Troubleshooter
• ITXtreme
• Open Sources
• ProdBlog
• Real World SOA
• Reality Check
• Security Adviser
• SMB IT
• The Storage Network
• Tech Watch
• Virtualization Report
• Zero Day

ADVERTISEMENT


RESOURCE CENTERadvertisement 

GOVERNMENT IT & POLICY
'If you don't go after the network, you're never going to stop these guys. Never.'
From the State Department, All the News for Inquiring Minds
TechPresident, the Internet Citizenry's New Consensus Taker



Sponsored Technology Links

 
 
 HOME  NEWS  BLOGS  PODCASTS  VIDEOS  TECHNOLOGIES  TEST CENTER  EVENTS  CAREERS  IT EXEC-CONNECT   About | Advertise | Awards | RSS | Contact Us 

Copyright © 2008, Reprints, Permissions, Licensing, IDG Network, Privacy Policy, Terms of Service.
All Rights reserved. InfoWorld is a leading publisher of technology information and product reviews on topics including viruses,
phishing, worms, firewalls, security, servers, storage, networking, wireless, databases, and web services.

CIO :: ComputerWorld :: CSO :: Demo :: GamePro :: Games.net :: IDG Connect :: IDG World Expo
Industry Standard :: IT World :: JavaWorld :: LinuxWorld :: MacUser :: Macworld :: Network World :: PC World :: Playlist