August 04, 2009

Critical Windows 7 bug risks derailing product launch

An apparent fatal flaw in the NTFS driver stack may bring Microsoft's Windows 7 impending victory parade to a grinding halt

Oh boy! It appears that Microsoft’s glowing track record with Windows 7 is about to come to an abrupt and unceremonious end. According to various Web sources, the RTM build 7600.16385 includes a potentially fatal bug that, once triggered, could bring down the entire OS in a matter of seconds.

The bug in question -- a massive memory leak involving the chkdsk.exe utility -- appears when you attempt to run the program against a secondary (that is, not the boot partition) hard disk using the "/r" (read and verify all file data) parameter. The problem affects both 32- and 64-bit versions of Windows 7 and is classified as a "showstopper" in that it can cause the OS to crash (Blue Screen of Death) as it runs out of physical memory.

[ Get InfoWorld's 21-page hands-on look at the next version of Windows, plus deployment tips on security, Windows Server 2008 integration, and Windows XP migration, all from InfoWorld’s editors and contributors. | Read the Test Center review of Windows 7 RTM. Follow these seven steps to better Windows 7 security. ]

I tested for the bug against three different Windows 7 OS configurations on two different hardware platforms: an Intel Atom-based netbook running the 32-bit version, an Intel Core 2 Duo notebook running the 64-bit version, and a VMware Workstation 6.5.2 virtual machine running the 32-bit version.

In each case, the utility executed the first three stages of the test correctly using modest amounts of memory (several hundred megabytes). Then, when it entered the fourth stage (a read test), the chkdsk.exe utility's memory consumption started to climb rapidly until several gigabytes had been allocated to its process and the test systems in question began to run out of memory.

Note: I did not succeed in causing the systems to “blue screen” as others have reported. However, I did observe chkdsk.exe consume up to 90 percent of the available physical memory on a 2GB VMware virtual machine. After that, the utility appeared to hang while all other operations in the OS slowed to a crawl for lack of RAM.

White Paper

D2D Virtual Tape Library Replication Primer

This whitepaper explains the terminology and concepts behind Data Replication technologies and establishes some sizing rules through worked examples. Learn the new paradigm in disaster tolerance—protect data anywhere.

Download now »

White Paper

An Alternative to Virtualization for Datacenter Cost Savings

Server virtualization is a popular option for dealing with mounting datacenter costs. Another equally promising approach is the use of an Application Delivery Controller. Citrix NetScaler provides a low-cost way for organizations to reduce their server count and accrue cost savings from a reduction in space, cooling, power and personnel.

Download now »

White Paper

Why Your Firewall, VPN, and IEEE 802.11i Aren't Enough to Protect Your Network

The emergence of WLANs has created a new breed of security threats to enterprise networks.

Included in HP ProCurve WLAN solutions is security technology that alleviates threats from WLANs through:
* Monitoring wireless activity inside and out of the enterprise
* Classifying WLAN transmissions into harmful and harmless
* Preventing transmissions that pose a security threat to the enterprise network
* Locating participating devices for physical remediation

Download now »

White Paper

Bringing the Edge to the Data Center

Effectively address data protection challenges, implementing solutions that help store and protect business–critical data while cutting costs and improving efficiency and reliability.

Download now »
ctryon 5-Aug-09 9:01am
OK, I'm waiting for all the rabid Randall Bashers to start throwing darts and rotten tomatoes...
mkleinpaste 5-Aug-09 9:19am
"OK, I'm waiting for all the rabid Randall Bashers to start throwing darts and rotten tomatoes..." I'm in! SPLAT! Mwah ha ha ha ha ha ha haaaaaaa! I knew it was too good to be true! I knew it! I said, "Nope. This time they'll get it right. Windows 7 is looking great and amazingly stable in comparison."
mkleinpaste 5-Aug-09 9:22am
But, noooooo. They had to go and dash my dreams of Windows without problems. I should have known this 3rd rate company wouldn't even be able to produce a 3rd rate product.
de-void 5-Aug-09 9:36am
This is a bug that actually lies on both side of the fence. In versions of Windows prior to Win7, ChkDsk creates a small buffer and caches little data, thus it thrashes the living daylights out of your HDD while it scans/repairs your HDD. This results in scans of today's 1TB+ hard drives often taking DAYS to complete! In Win7, ChkDsk caches disk data structures created and returned by the chipset driver. Alas, the bug is that some drivers don't keep track of how much memory they're allocating and basically starve the machine to death. Rest assured that this will be fixed with some urgency! ;)
buggybugbug 5-Aug-09 9:42am
1 reply
Well, it's a bad bug, but "derail" and "grinding halt"? Ehhh. So they patch it and it's done. It's not like this is the first time a bug has been found in an RTM build. There's probably a hotfix mechanism for OEMs so that this never even hits end customers, and if there isn't it's the first update on WU. As for the smarmy "Microsoft product" comment, find me a Linux or OSX release that is completely bug free and I will gladly eat my hat.
gunner@gulftel.com 5-Aug-09 1:39pm
I agree with you 100%. Sounds like a problem that could be fixed via Windows Updates... Maybe they could slipstream a fix... no dealbreaker to me, and I run chkdsk on drives several times a week in my work. One of the fallacies about Linux is that it's "bugfree." Heck, with Redhat I was getting DOZENS of updates to the various packages every week (tho, admittedly many of those "updates" were actually version upgrades). Still, I prefer Linux myself.
chrisjmiller 5-Aug-09 9:42am
Am I alone in thinking this sounds like a non-event? Assuming a patch is available by October, the first thing to do after you install W7 is to run MS Update - ChkDsk is not a utility that I run on new builds. After all, does anyone imagine that no new 'critical' bugs will be found in (e.g.) Internet Explorer between now and October?
SpeedMan 5-Aug-09 9:51am
Okay, so there's a bug. It sounds like the problem is all of one line of code, a missing free statement. Inconvenient to everybody to be sure. Sounds like they should do a new RTM build. Which will put everybody behind schedule. But, doesn't sound like time to jump on the "Windows 7 sucks" bandwagon yet.
MAS 5-Aug-09 10:06am
God only knows that this type of problem hasn't happened before with any other operating system.

Fix it. Re-RTM it. Done.

Better now than later!

DaveN 5-Aug-09 10:11am
Assuming this is really an MS bug, they'll just fix it in normal servicing. During the OS install when it offers to check for updates, this will be one of the fixes, so when the OS comes up for the first time, it'll already be fixed.
FUDyou 5-Aug-09 10:13am
Wow nice FUD Randall! You shold be banned from using computers.
seannerd 5-Aug-09 10:44am
What is this - the Fox News of the IT world? You sensational headline is unwarrented. I don't even like Microsoft, but telling us it will derail the launch or the you know exactly what the flaw is (I suspect you don't), is disingenuous. It is news - it is likely a bug, but why not report the bug, and leave the speculation to the reader? This isn't reporting, it is editorializing. Bad form. - Sean
FGS 5-Aug-09 11:19am
Who cares about MS? The people who made MS were people like me, the geeks who popularized pcs. And all of us have jumped ship to either linux or mac. The question is, will everyone else follow us again? Duh, course they will - otherwise we'll just refuse to sort their computers out for them anymore. So I guess it is down to economics after all - get linux or osx and have nice carefree life, or take your pc down to pc world every five minutes to clear the latest virus off in exchange for a nice fat bill. And then theres the fact that all of these netbooks we're giving to the third world have linux on them. Why would a recipient want to then spend 200 quid for an inferior os with a seriously deficient immune system? Do you want to buy a problem or some medicine for your sick mum? So in 10 years time, half the world will be using linux by default. And even without that, its not stock figures or bug reports you want to look at, its the general internet forums. They're full of people saying "I switched to linux /osx and I've never looked back....." Talk about needing to pull something out of the hat, ms is in the position of philandering husband, who's burnt all his bridges, trying to persuade his wife that he's a changed man - maybe he is, but the wife doesn't give a FF.
Gray_Hair 5-Aug-09 11:27am

As much as I love to bash M$oft this does strike me as the ideal time to find this kind of bug. However, I doubt it is as simple as speedman thinks, M$oft has a habit of building ugly architectures that encourage their coders to rely on external code that is often complex enough that unanticipated side-effects eat their lunch. Or as in this case, left at the mercy of behaviors in code of completely unknown vetting, since a non-M$ofty wrote it.

Still it is just another RTM cycle and they have a huge code-base from which to cobble something together. With a little mid-night oil, I doubt it will even have much schedule impact.

BigRonG 5-Aug-09 12:48pm
Perhaps this is related to the IE 8 bug that randomly strikes me. Some process (never seems to be the same one) develops a memory leak leading to slow down of the excruciating kind. When the product is killed using its own 'end' mechanism, the leak process continues. Sometimes it can be killed with the process manager and sometimes it requires a reboot. The processes that I have noticed causing this issue are games, websites, and communication programs (VPNs, etc.). IE7 also appeared to have this fault but it appeared less regularly.
WrldBFree 5-Aug-09 1:00pm
I used to think people were too hard on Randall, but articles like this just give them justification. 1. he blames the NTFS driver Stack, presumably because he likes to sound smart, but he has no knowledge of how the NTFS driver stack works or the code behind it, so how do you jump to the conclusion of the NTFS driver stack, I dont know. 2. He clearly states he cant get any machine to actually crash, just consume large amounts of memory. Running chkdsk on large drives is a very time consuming process and could take days on xp/2003/2008/vista if you have TB's of data. Did you ever think this may be by design, that it caches a lot of the file structures to improve performance, sicne you should only run chkdsk /r if you need to actually repair the file system and you may not want to wait a day til you can use your drive again? You probably would conclude that SQL and Exchange have "memory leaks" because they also are designed to consume large amounts of memory to improve database performance. Also, all that has been discovered is a symptom which is a "potential" bug, until someone looks at the code and verifies whether it is doign this because of an actual bug or it was the intended purpose, it is not a bug. Especially sicne majority who has tried can only get it to consume large amounts of memory but not actually crash.. including myself it also stops consuming additional memory when it reaches 1.1gb of 2gb on my machine and i still have 10% memory available and my system runs just fine while it is running. SO I REPEAT THIS COULD VERY WELL BE BY DESIGN TO SPEED UP THE PROCESS OF RECOVERING A CORRUPTED FILE SYSTEM. but of course to sensationalists, its a "show stopper" anyone with any credibility would wait for the investigation to take place and do ACTUAL DEBUGGING against the source code, Also im sure something Randall has never done.. so why not wait and find out, instead of jumping the gun for a news story and looking like a moron?
rcprimak 5-Aug-09 1:32pm
WrldBFree makes some very good points. However, I have recently (in the past week or so) read an IDG News report that Intel has admitted to a flaw in their firmware coding and intends to fix it ASAP. This may not have anything to do with the massive amounts of memory usage when running Win-7 Chkdisk on a big hard drive, but one never knows -- at least I do not know.

What I do know is that massive resource hogging is fairly typical of Microsoft Windows 64 bit operating systems. A friend of mine has Vista-64 running on a laptop, and the amount of disk usage grew massively for a month or so, then started going down and up in what appeared to be a cycle. In a shorter-term cycle (per session), Physical Memory usage follows a similar pattern, even without the sort of caching several commenters have suggested.

Whether Intel or Microsoft has something to patch, it all can be done in plenty of time for the October release date for Windows 7. For myself, I'd wait a few months and find out who screams the loudest about which "features", aka bugs, in the new OS version. Then I'd buy in the second round of sales promotions.

Of course, that's for consumer-level buyers. A corporate IT Department will have to be more careful and wait longer, perhaps as Randall says, for Service Pack 1 or even SP2.

ideabrdg 5-Aug-09 3:29pm
1 reply
For a well-researched and reasoned perspective read this: A killer Windows 7 bug? Sorry, no http://blogs.zdnet.com/Bott/?p=1235
rcprimak 7-Aug-09 2:28pm

I saw that post as a link provided by Woody Leonhard in his Askwoody.com Woindows patch watch blog. Yes, Randall has overstated what is really a highly technical aspect of checkdisk /r. Randall says that he runs this command for every drive he is about to commit to any critical task, before actually trying to use the disk. That may be Best Practices for an IT Development Shop, but most business, IT and Home users will never have any reason to use this parameter from the Command Line. The Windows 7 sky is not falling. And the memory usage is not a Memory Leak -- it is deliberate and limited, always designed to leave the System with at least 50 MB of RAM overhead, which usually prevents the BSOD crash.

If you want to experience a true Memory Leak, try running the freeware antispyware product Super Antispyware. Run its updater on a fully-patched Windows XP Pro SP3 computer with limited RAM and a single-core processor, running IE8 fully patched. What results is a kernel driver memory leak and a true BSOD. At least that's what the Microsoft crash report output page says.

JimAchuff 6-Aug-09 5:35am
No question that this is a serious problem, but I’m guessing that Microsoft will have a fix for this within a week and it will become a non-issue in no time. Jim Achuff http://virtualizationexchange.blogspot.com/
britkit10 6-Aug-09 10:08am
Microsoft's blue screen of death never ceases to come back and haunt...even if they do fix this problem, it doesn't mean that their other problems will magically disappear also. I can't wait to see how many people will need data recovery despite having Windows 7.
RobMark 6-Aug-09 10:33am
This is a cr@p article! It 99.9999% of the people out there will not be running this utility against an external drive before MS has a fix for it! Randall Kennedy: You must like the role of CHICKEN LITTLE! There is no way MS would think about delaying to RTM!
marob 6-Aug-09 11:03am
Randall - I suggest you read http://en.wikipedia.org/wiki/Memory_leak. The "leak" you reported is a diagnostic tool legitimately using memory which it deallocates properly when finished. Calling it a leak is pure sensationalism on your part. And think about the purpose of chkdsk. It's a utility that does a deep sector scan and repair of your hard drive. I'm sure Microsoft didn't build the chkdsk /R feature with the intention of users running it while multitasking. SQL Server can allocate a similar amount system memory during normal startup. Hope you don't write an article about that nasty "ship-stopper" next.
WrldBFree 6-Aug-09 11:52am
I wonder if because it is "blog" it does not require any integrity, since it is an article of opinion and not fact. I would imagine an article like this printed in say the New York times would require a retraction. Randall should we expect a retraction for your false claims?
ticedoff8 6-Aug-09 12:21pm
1 reply
This is pretty funny. What I hear from 90% of these posts is MS Apologists. It's like arguing about the color of the bus that ran you over; It misses the point - You were run over by a bus - again. If chkdsk causes a system to run out of memory - how many other other utilities cause a similar problem. Not just the disk utilities, what about any other program that has a problem - Windows has no concept of self preservation.
WrldBFree 6-Aug-09 12:47pm
1 reply
ticedoff8 you are missing the point, you run chkdsk /r because you have a corrupted disk/file system... it is using a lot of memory as per what Steven Sinofsky said by design to speed up the process of REPAIRING a damaged/corrupted disk/filesystem. What youare also missing here, is that your system does NOT run out of memory, and as other applications demand memory chkdsk may release some of the memory it is using back to the O/S for other applications, you can run 2 or 3 chkdsk /r on different disks at the same tiem as some has tested and you never run out of of memory....the BSOD and the system freezing crashing is not happening from everyone who has tested this outside of a select handful that said it did.
ticedoff8 6-Aug-09 1:05pm
2 replies
Respectfully, no, it wasn't I that missed the point. It doesn't matter if it's chkdsk, xcopy or the latest version of "Halo" - the point is that Windows can be crashed by a rogue / insane app. That doesn't seem right. MS has had 25+ years "to get it right" (starting with MS-DOS) and Windows 7 still seems to miss that important piece of functionality. And, it seems, there are a lot of people who don't care. More importantly, they will argue forever why it isn't important - which, also, doesn't seem right.
chrisjmiller 6-Aug-09 1:57pm
1 reply
Please tell us of an OS that can't "be crashed by a rogue /insane app". Here's a clue - no operating system yet written falls into this category and Alan Turing's work from 70 years ago strongly suggests it is not possible. But if you doubt this, feel free to write your own OS that will be guaranteed crash-proof. You'll be a billionaire by the time you're 21, which by my guess will be in about 7 years time.
fushigi 10-Aug-09 5:15am
"Please tell us of an OS that can't "be crashed by a rogue /insane app"."

IBM i (formerly OS/400) and z/OS for IBM's Power systems and Mainframe. That's two operating systems for you that aren't going to be brought down by renegade apps. They're server-class OSes only, but I point them out to note that the capability of an OS to maintain stability and availability can be achieved.

WrldBFree 6-Aug-09 2:08pm
ticedoff8 1. you missed the point again, chkdsk DOES NOT CRASH THE System. (please dont miss this point again) 2. User mode code (which is where apps run) can not crash the O/S only code that runs in the kernel (eg drivers etc) also, you need to read on Kernel patch protection and a number of other security and validity checks that the kernel implements if you are going to have an intellectual discussion on what windows can and cannot do in terms of code integrity/exception handling and so on....
tcapun 6-Aug-09 4:27pm
1 reply
I'm only guessing but with 40 years of programming experience behind that guess, I'd say it took an MS programmer less time to locate that bug and provide a fix for it than it took Randall to write this story.

BUT the bigger issue is how poor is MS Qual checking that such a bug could escape to the public domain?

AND, most importantly, where there is smoke, there is fire, so HOW BIG is the iceberg under this tip?
honeymonster 9-Aug-09 11:57pm
There is not bug. See my comment below. MS QA seems to be right where it should be.
MrChen 7-Aug-09 3:57am
Oh no, half of California is going to fall into the ocean!
unclen00b 7-Aug-09 7:33am
I came all the way over here just to register and post this comment. This article is garbage. This guy is clearly a sensationalist and not a technical expert. For those who enjoy "tabloid" style articles, by all means continue to read his crap. YOU sir, are a MO-RON!
Steve_Hauck 8-Aug-09 9:11am
1 reply
So.. What you are saying is that with a second physical hard drive, I would need to run chkdsk.exe E:(E: is my secondary drive)/r? If I NEVER run that command line everything will work fine. I am willing to bet that 99.9% of the people including techs. will not and have never run chkdsk on a secondary drive.. Windows 7 works GREAT!!! Get a grip pal, you are an idiot...
honeymonster 10-Aug-09 12:01am

Even if you run that command, with that option, against a second physical hard drive, you (and your system) will be fine.

chkdsk.exe performs exactly like it is supposed to. While it uses a lot of RAM it does so to speed up the repair. And it has been designed to not use *all* of the RAM, only the available, physical RAM minus some 50MB. It respects already allocated RAM, and the system remains responsive during the operation.

See my comment below for more info.

honeymonster 9-Aug-09 11:28pm

Please everyone update your info on this issue before commenting.

There is no bug in chkdsk.exe. There is no memory leak. chkdsk.exe does take up a lot of memory when run with the /R option against a non-system partition/drive. This is a deliberate design decision to help speed up the repair process.

Per Microsoft, chkdsk.exe will allocate as much as possible of available physical memory, leaving at least 50MB of physical memory free during the operation with the /R option.

In other words, chkdsk will not allocate memory unbounded. Rather it will measure how much physical memory is available at the time of launch and allocate memory based on that measure. This means that even if the command is invoked on a running server, it will not cause the server to start trashing as it will respect the memory allocations already made - and then some.

I have confirmed this on my own system. It's simple and anyone can make the experiment: First run chkdsk /r on a non-system disk with no other apps started. Use resource monitor to see how RAM allocations ramp up. When leveling out take a note of how much RAM is used by chkdsk.exe. When the process completes you repeat the experiment, but this time you start a lot of memory hungry applications and let them allocate memory before launching chkdsk.exe /r. Note how chkdsk.exe uses less memory during this second run. The memory allocated by chkdsk.exe clearly depends on how much available RAM at the time of launch.

Incidently, my computer remained responsive during both tests. I was able to launch new programs, open Word, take screenshots (many), paste them into word etc. While I could tell that the system was working, it did not feel sluggish at all.

Now, the crash (BSOD) was reported by a single user who mistakenly assumed that it was connected to the operation of chkdsk.exe (because it happened while he was running chkdsk.exe). This has now been determined to be a chipset driver issue. Said user has updated drivers from motherboard manufacturer and has reported back that the crash issue was solved.

In conclusion: There is no bug in chkdsk.exe. There is no bug in the NTFS driver stack. chkdsk.exe has been optimized to finish ASAP by allocating as much memory as possible with as minimal impact as possible on other running processes. There may be a chipset driver issue in an earlier version of a 3rd party driver. This issue is not found in current drivers.

Now, what remains to be asked is this:

Even if there had been a "massive memory leak" as originally reported, how can anyone claim that a memory leak in a rarely used utility (chkdsk.exe), only in a more rarely used option, only in a even more rarely occurring scenario (repairing a non-system volume on a multi-volume system) amounts to a "showstopper" bug which risk derailing the Windows 7 launch? It is hilarious! Look at who made that claim and his posting history. Not to mention his totally unsubstantiated assertion about a bug in the "NTFS driver stack". WTF? Mr. Randal C. Kennedy is either grossly incompetent or an extremely cynical professional troll. Either way, FAIL!

Loerps 17-Aug-09 6:40am
mkleinpaste, I'm with you. I'd like to throw a few of each at Randall at this point. honeymonster, thank you for clarifying the situation. This entire blog would actually be kind of funny if it weren't so sad to see that some folks are willing to follow Randall right off the edge of a cliff. How clearly it demonstrated the difficulty of clearing the garbage after someone dumps a negative comment on a company, in this case Microsoft.

Sign up to receive InfoWorld Resource Alerts

Subscribe to the Today's Headlines: First Look Newsletter

Find out what will be news for the day, with our first-thing-in-the-morning briefing.

©1994-2009 Infoworld, Inc.