Web analytics products such as ClickTracks and HitsLink indicate how sites fare in search rankings, but they do little to help fix poor-performing Web properties. Google Sitemaps fills that void.
Currently in beta, Google Sitemaps allows you submit URLs to the Google index and inform Google when these pages change. Certainly, having Google index deep pages benefits many organizations; e-tailers, for example, could increase revenue because potential buyers find items that might ordinarily fall into an indexing crevasse.
Google Sitemaps is complemented by Sitemap Generator, a Python script for creating sitemap files. These contain the actual list of URLs in XML. Google also accepts sitemap files in OAI-PMH (Open Archives Initiative Protocol for Metadata Harvesting) or text.
Sitemap Generator performed well in my tests. It works by scanning URL lists, access logs, or directory paths. Moreover, pages you submit can be for traditional Web sites plus those that serve content for mobile devices. Flexibility aside, Generator runs on the Web server and requires that you control it though a CLI -- a scenario some IT departments will not allow. However, Google lists various utilities and code snippets (many free) that could be used for building sitemaps remotely.
Sitemaps provides very helpful reports. I displayed the top queries on Google that returned pages from my site. Additionally, I saw how many of those results directed traffic to my site. Another report lists common words in your site content.
Equally helpful are troubleshooting reports, including the percentage of pages that were crawled successfully and pages that the Googlebot could not access. Turning to the robots .txt analysis, I quickly discovered that a certain section of my site was inadvertently blocked. Even better: You can simulate changes to the robots.txt file, see how Google crawlers will react, and make sure there are no errors before changing the robots.txt file on your site.
Similarly, Sitemaps’ error report lists pages and the specific problem Googlebot encountered. Fixing these issues -- such as broken redirects -- can quickly increase your crawl coverage.
Google Sitemaps Beta is available for free.
This whitepaper explains the terminology and concepts behind Data Replication technologies and establishes some sizing rules through worked examples. Learn the new paradigm in disaster tolerance—protect data anywhere.
Download now »Server virtualization is a popular option for dealing with mounting datacenter costs. Another equally promising approach is the use of an Application Delivery Controller. Citrix NetScaler provides a low-cost way for organizations to reduce their server count and accrue cost savings from a reduction in space, cooling, power and personnel.
Download now »
The emergence of WLANs has created a new breed of security threats to enterprise networks.
Included in HP ProCurve WLAN solutions is security technology that alleviates threats from WLANs through:
* Monitoring wireless activity inside and out of the enterprise
* Classifying WLAN transmissions into harmful and harmless
* Preventing transmissions that pose a security threat to the enterprise network
* Locating participating devices for physical remediation
Effectively address data protection challenges, implementing solutions that help store and protect businesscritical data while cutting costs and improving efficiency and reliability.
Download now »
Sign up to receive Data Management Resource Alerts
