
Over the weekend, I was able to dramatically reduce the file size of my pre-Thurrott.com archives, which dropped from 116 GB to about 48 GB.
At a high level, this work resembled what I did with my loose photos and scans: I assessed what I had, volume- and organizational-wise, and then decided where to start. This was an even more daunting task with the documents archive because it contains an incredible volume of personal and work-related documents and other files and, inevitably, even more photos and images to organize and archive. And it is spread out over two locations, OneDrive and my NAS.
With the loose photos and scans work, I had worked in OneDrive by collecting everything into a central (master) folder of that material, syncing it offline to a PC, and then working with the files locally in File Explorer. But after reconnecting my NAS to the home network last week and assessing what I had and where, I decided to copy part of my NAS-based documents archive to the same local PC to work with that locally at an acceptable rate of speed. (Working with the files directly on the NAS is very slow.) But of course I had to start somewhere.
On my NAS, we have user-based folders (like “Paul”) under Documents. And my “Paul” folder contains several sub-folders, including “_To file” (a massive collection of folders and files that needs to be sorted through), “Books” (backups of my more recent older books), “Other documents” (which includes, among other things, our personal documents archives from the 1990s, including my oldest books), “Penton” (literally my work archive from the 1990s through 2012), “Text files” (mostly out-of-date), “Travel” (mostly out-of-date), “Visual Studio Projects” (out-of-date), and “Web sites” (an archive of sites like the Internet Nexus, Thurrott.com when it was a personal site, and a few other things). “_To file” takes up 835 GB (with over 554,000 files and 30,000 folders!) and the rest (minus “Penton”) is about 83 GB.

“_To file” is going to have to wait and, boy, am I not eager to figure that one out. But two items stood out to me here as places to start: that personal archive (about 8 GB) and Penton (116 GB). So I copied each folder to a different PC to get to work, with the initial goal of just going through each and, where necessary, reorganizing. For example, with the personal archive, I wanted to get my old book files into a more central location so that all of my book files are together (and in multiple places). That was straightforward. But the Penton archive was a lot more complex. And much bigger.
The first step was to organize and clean up the folders first, reorganizing the top-level structure. It has looked like this since 2012 when I finally moved on to my current document archiving structure:

It’s a bit busy and disorganized, but the real mess—and the real volume of files—is in that “SuperSite” folder, which became a dumping ground for most of my document archives. (99 percent-ish of the disk space used by “Penton” is in “SuperSite.”) And it looked pretty organized until I dove into it.

I cleaned this up, went into “Other,” and pulled out topics (“Office,” “Zune,” “Hardware,” etc. that should be in the top level of “SuperSite,” and did a bunch of culling of unnecessary stuff. In the end, I ended up with something like this, though it wasn’t appreciably smaller, just slightly cleaner and better organized. So I ended up with this top-level structure in “Penton”:

And this structure in “SuperSite,” which is the motherload.

The key folder here, of course, is “_Windows,” not just because I have always focused so much on Windows, but also because this became the basis for my current organizational structure for files. I kind of used this folder for “Windows and other very important things.” It is by far the biggest part of “SuperSite” and is by far the most important part of the archive. Here’s how each of these folders broke down, ordered by size:
I wasn’t surprised that Windows was the biggest part—I know I saved everything regardless of size there—but the sizes of the Xbox and Apple (and probably Windows Phone) folders told me that I had a lot of unnecessary videos at the very least. And so it seemed like figuring out what the biggest files were was a good place to start. And for that, I turned to a familiar tool, WinDirStat.

And it found a lot of big files, including ZIP, WMV, MOV, MP4, M4V, and EXE files, most of which I knew I could expunge. I started with the videos (WMV, MOV, MP4, M4V, plus AVI and a few others) to see what I could do there and found that just removing all of them—not necessarily the right approach—would save close to 50 GB(!) of space, mostly from the Xbox, Windows, and Windows Phone folders as expected. So that was my initial focus, and I pulled out about 21 GB of video files for potential posting to my YouTube channel, which I suspect will quickly become a pretty valuable resource. To me, because I can host these videos there for free, and to you because you may be interested in a lot of what I have. (I’ve since posted over 25 of them and have many more to go.)

Part of what I culled was stupefyingly pointless, and there were more installer EXEs in there than I care to admit. For example, I had the original Gears of War for Windows ISO in there, taking up 7.7 GB of space. Just … unbelievable.

Using WinDirStat, I was able to locate, triage, and organize/remove enough cruft that I got my “SuperSite” folder down to just about 48 GB, an incredible disk space savings of over 50 percent. But I got greedy: by this time, I could see that the biggest file types (by total disk space) were all image formats like JPG, BMP, TIF, and PNG. And I wondered if further compression was possible. For example, what if I could bulk convert all of the BMP and PNG files to a high-quality JPG format that would, overall, take up a lot less disk space? How low could I go?
Here, I will not bore you with how the second half of my Sunday went. With how I researched this topic, tried and failed with several solutions, finally found some software (Pixillion, which I ended up paying for) that I thought would do the trick, and then struggled over many, many hours to figure out how to do this by making multiple copies of my copy of the “Penton” folder and slowly running pass after pass against those copies so that I could arrive with a version that had the exact same number of total files but took up less total disk space. I will likewise not bore you with the fact that this app does not let you specify exactly which file types you want to convert ahead of time, that culling the list by file type after the initial folder scan is arduous and slow, and often crashes the app. Or that I ran one main pass overnight last night that ultimately failed after many long hours.

What I will report is that, in an early test, a single folder of 2.54 GB of BMP files was reduced to 451 MB of identical JPGs, so this was my inspiration to keep trying. And that I ultimately failed, for whatever reasons, to do this reliably and simply decided that 48 GB was better than 116 GB and it was time to move on. This is a project, not a career.
And so that was that: I have an “Archive” folder in my OneDrive’s “Documents” folder (recently renamed from “Work archive”) that now contains “Books,” “Eternal Spring,” “Personal,” and “Work” folders. The “Work” folder has what was there before—year-based folders with all of my work-related writing from 2012 through 2023—plus that newly size-reduced “Penton” folder. The original version of that thing is still sitting out there elsewhere in OneDrive (not that it matters, but in a 238 GB ” _Sort and archive” folder in the root of OneDrive that I’m now using as a staging area for everything that needs to be, well, sorted and archived). And assuming I don’t discover any issues with the new version, I will delete the old one in time.

I have also copied this smaller, new version of the “Penton” folder back to the NAS. And will upload it to Google Drive and Microsoft 365 commercial for archival purposes this week. And so this small but important part of my 2023 digital decluttering initiative is done, and more quickly than I’d imagined.
Of course, this also means I need to pick a new folder structure full of clutter to attack next. I will try to figure that out tonight.
More soon.
With technology shaping our everyday lives, how could we not dig deeper?
Thurrott Premium delivers an honest and thorough perspective about the technologies we use and rely on everyday. Discover deeper content as a Premium member.