If you're like me, you would rather buy a few new disks than cull through your stuff to delete files you probably don't need anymore. I think that in my case, whenever I've decided to clean house I've deleted files that I found myself desperately needing a few weeks later. These would be things like an FC2 ISO set, for instance. It's large and I'm not planning on installing or using FC2 anytime soon, so out it goes.
Then two weeks later, out of nowhere, I get pulled into a problem with someone's FC2 box, and I need to build a VM to replicate it, or something like that. Back to the Web to pull down another set.
Well, I decided to at least get an idea of the ages of the files in one of my main filestores. Not just creation time, but last access time, and last modified time. This is obviously a job for Perl, and I'm sure there are one hundred similar scripts floating around the Internet like flotsam, but a cursory Google search didn't pull up anything promising. So I fired up vim and typed up fileages.pl. It should run on any POSIX OS with Perl 5, but I've only tested it on Linux and FreeBSD 6.1.
It's very simple: Walk a directory structure with File::Find, and note the mtime, atime, and ctime of each file. Then, compile all that info and dump out a summary. Optionally, dump out the info for every file. The usage is also simple:
Usage: fileages.pl [-dhs] [-t (atime|ctime|mtime)] [-p <path>] -d detailed output -h this help -s supplemental ages -t (atime|ctime|mtime) type of scan. Default is atime -p <path> path to scan If -p isn't specified, then the current directory is used.
By default it runs on the current directory, looking for atime, and only outputs a summary for files 30, 60, 90, 180, 365, and 730 days old. Optionally, using the -s flag, additional times are recorded, including 1, 3, 7, 14 days old. The -d flag will display info for every file seen, so on a large filesystem the output can be enormous. The script is fairly CPU intensive, but used less than 9MB RAM on a million-file run.
I've run this on some large filestores, and found some interesting results (to me, anyway). Walking a 850GB store with over 1 million files resulted in this:
[pvenezia@bop ~]$ fileages.pl -s -p /bigdisk --------------------------------- Path: /bigdisk Scanning for atime Total files scanned: 1006417 ----- Total files older than 1 days: 2929 Total size: 14.20 GB ----- Total files older than 3 days: 2023 Total size: 17.07 GB ----- Total files older than 7 days: 991 Total size: 59.58 GB ----- Total files older than 14 days: 21 Total size: 11.28 GB ----- Total files older than 30 days: 1803 Total size: 24.35 GB ----- Total files older than 60 days: 1672 Total size: 42.91 GB ----- Total files older than 90 days: 52161 Total size: 83.05 GB ----- Total files older than 180 days: 14011 Total size: 86.97 GB ----- Total files older than 365 days: 360239 Total size: 146.05 GB ----- Total files older than 730 days: 561901 Total size: 214.23 GB
Note that this scan took 9 minutes running on a RAID5 array accessed via an NFS mount on a gigabit network. Obviously, I have lots and lots of stuff that I haven't even touched in two or more years.
In the corporate world, I've used this script to prove to a few folks exactly how old their files are, and making the case for deleting or offlining gigs and gigs of relatively useless data. After all, nothing speeds up the backup window more than backing up fewer files.
Might be time for me bite the bullet too. Or not.