What started as a startling first-hand account on myce.com about OneDrive for Business modifying files has spread widely and sometimes wildly. While it's true that OneDrive for Business modifies Office files as it syncs -- and modifying files can cause no end of headaches for, say, forensic investigations -- it's also true that SkyDrive Pro (now known as OneDrive for Business) was blithely changing the Office files you submitted for many years. It's an old problem that's come home to roost.
Here's how Sean Byrne at myce.com summarizes his findings:
Word, Excel, and Publisher files (.docx, .xlsx, and .pub file extensions) ... grew by about 8KB ... these Microsoft Office files had what appears to be uniquely identifiable code added, potentially making it possible to match them to a company and possibly even to a specific user's account. ... Even though OneDrive for Business modified these files, it left the Date Modified attribute in every file unchanged, so to an unsuspecting user who just checks when the files were modified, they appear untouched. For example, the Word file shows a modified time of 16:14:14 for both the original and synced file, even though the file sizes are clearly different. The only files that remain untouched are those that were placed in the synced folder on the original computer, so even if a user checks the files they place in a synced folder, they would not know anything is being modified unless they physically took those files to another computer with the matching synced folder to compare them ... we found that the consumer version of OneDrive (formerly SkyDrive) does not appear to any modify files, whether synced with the desktop product or through the Web interface. We also tested BitTorrent Sync and found that it does not modify any files either, even when testing a 1GB folder with a wide range of file types.
If you aren't expecting it, that kind of behavior can be very disconcerting -- especially the part about Microsoft modifying files without warning and without changing the modify date/time.
Few people seem to realize it, but Microsoft has had that precise problem for years. It's a vestige of SharePoint, the mechanism used to implement OneDrive for Business (but not the consumer version of OneDrive). Yes, any Office file you or your company sticks in SharePoint gets modified, probably without your knowledge or consent, in precisely this way.
Mike Smith, who's been running SharePoint classes for ages, described the problem in this Nov. 20, 2010, blog post after running down a question posted on an MSDN forum:
I uploaded the same file (a Word 2003 document) three different ways, and got two different file sizes in the library, and when downloaded they were all different from the the originally uploaded file. SharePoint changed the files! Next I wrote some .Net code to access the documents via the API and the size reported from SPFileItem.File.Length is the same as the downloaded numbers (140,288, 139,776, 139,776). Remember ... the original file on C: was 139,264 bytes. Now I opened each of the "uploaded and then downloaded" files in a HEX viewer:
The file uploaded by clicking Upload Multiple and the file uploaded with dragging from C: (in Windows Explorer) to Open with Windows Explorer: All bytes identical until the end of the file where there is what looks like random bytes (different in both files) and an incomplete fragment of an XML structure (same in both files). (junk in the upload buffer???)
File uploaded by clicking Upload: First byte changed from 00 to D0 bytes/text added at the end of the file with metadata from the library columns!
HackerNews has an irreverent and accurate -- if NSFW -- analysis of the situation. SharePoint's cavalier treatment of Office documents can be traced back to SharePoint's progenitor, Office Server. If you're using SharePoint (or, equivalently, OneDrive for Business), it would be well worth your time to understand the context of this problem.
This story, "Microsoft OneDrive for Business changes files," was originally published at InfoWorld.com. Get the first word on what the important tech news really means with the InfoWorld Tech Watch blog. For the latest developments in business technology news, follow InfoWorld.com on Twitter.