These days, despite near-universal acclaim for the technology, I have a real love/hate relationship with RSS. The love part of the relationship derives from the profound changes in my information production and consumption habits during the past year and a half. During that time, I’ve been blogging and producing content with RSS. Whereas my e-mail client, MS Word, and Google used to rule my desktop, I now find myself using Bloglines, Feedster, and Technorati throughout the day and writing to my internal and external blogs using ecto. Although the plumbing is quite simple, I’m still fascinated by all the background pinging (as new Weblog content is posted) and the real-time indexing of fresh content. When Dave Sifry at Technorati reports that the median time from Weblog content posting until that content is available for search on Technorati is seven minutes, I see a paradigm shifting. Despite “only” being XML, RSS is the driving force fulfilling the Web’s original promise: making the Web useful in an exciting, real-time way.
For all these reasons, I’ve been a strong advocate of RSS within InfoWorld, and will continue to be. We have several Weblogs and dozens of RSS feeds on our site, and judging from our server logs, our readers are adopting the feeds at a rapid rate -- and that’s where the hate part of the relationship begins. Beneath the widespread adoration, communications between client and server in RSS implementations are often pretty dumb, and attention to scaling issues that arise when serving RSS has been scant. The situation with RSS reminds me of the early days of the Web, when the excitement of simply having a Web site obscured concerns about scaling the site, and the user base was small enough that you rarely got called on your lack of attention to scaling issues.
Several months ago, I spoke to a Web architect at a large media site and asked why his site didn’t support RSS. He raised the concern that thousands (or even millions) of dumb clients could wreak havoc on a popular Web site. Back when I was at CNN.com, I recall that our servers got needlessly pounded by a dumb client (IE4) requesting RSS-like CDF files at frequent intervals regardless of whether they had changed. As the popularity of RSS feeds at InfoWorld started to surge, I began to notice that most of the RSS clients out there requested and downloaded our feeds regardless of whether the feeds themselves had changed. At the time, we hadn’t quite reached the RSS tipping point, so I filed these thoughts away for later -- but “later” came sooner than I thought.
Fast forwarding to the present, InfoWorld.com now sees a massive surge of RSS newsreader activity at the top of every hour, presumably because most people configure their newsreaders to wake up at that time to pull their feeds. If I didn’t know how RSS worked, I would think we were being slammed by a bunch of zombies sitting on compromised home PCs. Our hourly RSS surge has all the characteristics of a distributed DoS attack, and although the requests are legitimate and small, the sheer number of requests in that short time period creates some aggravating scaling issues. These issues aren’t enough to make me want to abandon RSS (in fact, I’ll keep pushing it), but its workings can create operational annoyances. If RSS is going to go from fairly big to absolutely huge, we’re all going to need to do a little more work on the plumbing.