Digital Decluttering: Photo Consolidation Mission Accomplished (Premium)

Finally, one photo collection

I thought I was winding down major combat operations in my photo consolidation project in early January. At the time, I had consolidated all three of my overlapping photo collections into a single master collection, and thanks to my growing experience with various automation tools, I expected to finish that up quickly. At the time of my last update, on January 2nd, I conservatively noted only that it would conclude by the end of the month. But I knew it would happen much more quickly than that. I was doing great.

Until, of course, I wasn’t.

Thanks to my ADHD-addled brain, which can reward or betray me depending on variables I will never understand, I had overlooked something important. And the successes I was having in early January—during which I was consolidating photos from 2013 and newer at a torrid pace—were, if not illusory, certainly not as impressive as I had thought.

Here’s what happened.

For the first “half” of this consolidation project (photos through 2012), I first combined two of the three collections (from my OneDrive Camera roll and Photo collection folders) into a single OneDrive collection. And then I consolidated my third collection, from a Google Photos takeout download, into that, creating the final collection. I used a variety of automation tools to reduce the grunt work as much as possible, but the final step was to work manually year-by-year, day-by-day, comparing the photos side-by-side, with the Google collection on the left and the master OneDrive collection on the right. I would move or delete the Google files, folder by folder, until a year was complete and then move on to the next year. Day by day. Tedious, exacting work. But it was happening.

After doing that for several years’ worth of photos, I just moved forward into the second “half” of the project (which was half only by folder count, but was much bigger in size than the first half). And as noted, that work went very quickly, and I was surprised by the progress. Within a few days, it was clear I was going to “finish” this project, at least in the sense that I’d then have a single photo collection of whatever size (I was guessing around 400 GB) that I could replicate in various online services and on my NAS. And going forward, as my wife and I automatically backed up all our phone photos to multiple places, I’d never have to worry too much about this again.

But then I discovered my mistake. In moving forward to those newer years (2013 and up), I had skipped a step. I had never consolidated the OneDrive Camera roll and Photo collection folders into a single OneDrive collection. And so I was just consolidating the Google collection into the OneDrive Photo collection folder. The OneDrive Camera roll folder, which by then contained only photos from 2013 and newer) had never been consolidated. That’s why it had gone so quickly: I was only doing half the work.

What bothers me most here is that this never occurred to me naturally. Maybe I had just spent so much time—so many hours over so many days and over so many months—doing the same repetitive work that I just blanked on it. I don’t know. What I do know is that I noticed my OneDrive storage usage increasing as I did this work. And that as it crept above 800 GB (of 1 TB), and then to almost 900 GB, that I had to figure out what was happening. As noted, I figured the final size of this collection would be about 400 GB, and I knew that my documents archive in OneDrive was roughly 250 GB. So what was taking up all that space?

Well, it turns out it was that Camera roll folder. The photos in there, from 2013 and newer, took up an additional 250 GB of storage. And because they had never been consolidated—deduplicated, basically—they were eating into my storage allotment as the final collection (in the OneDrive Photo collection folder) grew.

I didn’t figure this out until I was almost done—or, thought I was almost done, on January 8—as I was consolidating 2022. I assume it’s obvious how breathtakingly deflating this realization was. I was on the cusp of finishing this project, supposedly, and now I would have to go through this process yet again for the biggest folders of photos in the two collections (2013 through 2023). All that work would need to happen again.

And so it did.

First, I decided to move the 2013 through 2022 folders out of the OneDrive Camera roll folder to free up the 250 GB of space in OneDrive. That was time-consuming but easy: I made two copies, one a collection of organized folders on the laptop’s desktop that I would consolidate from, the other on a USB-C SSD drive. I then copied the USB-C SSD version, in turn, to my NAS and to my Google Drive for safety (temporarily, just so they were in another place). Once both copies were completed, I deleted the original folders in the OneDrive Camera roll folder, emptied the OneDrive recycle bin (which you have to do from the web), and waited for Microsoft to release the storage. That took a day or so, but it happened.

And then I turned back to the consolidation work, which in my mind involved reconsolidating the years I had just worked on. Once again, I pulled out the automated folder organization and file deduplication skills I had honed throughout this project. But this time, things moved slowly. There were far more unique photos on the left side of this side-by-side view—the desktop-based versions of the year folders that used to be in Camera roll—than expected, despite the automated deduplication I had done. In some cases, there were no duplicates at all. This was troubling because the unique files from these newer years should have been in Google Photos, not OneDrive. What was I missing?

I may never know. But after plodding slowly through several years of photos over many days—each year took approximately two and a half days (with an hour or so in the morning and two-ish hours in the afternoon/evening) to consolidate—I rethought the process. And the issue, I saw, was one I had dealt with throughout the project, but now amplified: There are two (or more) versions of a photo, some of higher quality/resolution, some of lower. They sometimes have the same names, sometimes have similar names (like FILE.jpg vs. FILE-1.jpg), and sometimes have completely different names. Comparing these things manually was the obvious way to make sure I got it right, but doing so was also tedious and time-consuming. There had to be a better way.

A week after I started down this terrible path, I finally figured out a better way. I have been using AllDup for deduplicating two folders of files, and this app supports various comparison methods. For the first pass, I would compare the combination of file name, file extension, and file size (and, optionally, file content), because if those criteria were all met, those two files were absolutely duplicates and I could safely delete one of the two. (I configure them to go to the Recycle bin just in case, of course.) I compared files using other combinations, too, but that was the big one.

What I kept running into on this pass was a situation where in manually comparing folders of photos, some of the same photos would be found on both sides of the comparison, but they would have different names. And so I finally wondered what if I forgot about the filenames and other criteria and tested only for file extension and file size? This would root out the duplicates with different names, and because I was consolidating into the Photo collection folder on the right, I could delete (or move) these types of duplicates from the former Camera roll folder on the left.

I tested this multiple times using subsets of the files and the results were impressive when I manually checked the changes, with no errors and many, many duplicates removed. In one case, 2021, the folder size went from 39.1 GB (over 11,000 files) to just 7.21 GB (to over 2,000 files).

This almost seemed too easy, and I was worried about losing something important. But then I reminded myself that most of the original photos from these most recent years had been backed up from phones to Google Photos, and those photos were the original quality and were already consolidated into the master collection. And because this process seemed to be accurately removing literally several thousand duplicates per year, leaving just hundreds of photos behind to compare, the process could proceed much more quickly.

And so it did.

I started finishing off one year’s worth of consolidating every day, a much faster pace. By this past Wednesday, I had completed consolidated through the first half of 2019, and I could once again see the light at the end of the tunnel. And by this morning, I had finally done it: I completed the consolidation of our photos from 2023. Its total size on disk is about 454 GB, with a little under 150,000 files, so about 50 GB more than I expected. Not bad.

But now the real work begins.

When it comes to this kind of project, you’re never really done. There are still scans to be sorted and integrated into the collection. I need to upload this updated single collection to at least three other places—Google Photos, Amazon Photos, and my NAS—and ensure that all the meta-data is correct as I do so. (There are still scans in there, especially, with the wrong dates.) And before I can do that, I have to remove the existing versions of those collections in the other locations.

I have ideas about how I can semi-automate this and do it in logical stages. I would like to get this done, if possible, before we head to Mexico City on February 3, which is just two weeks away. But I’ve missed so many self-imposed deadlines with this stuff, and I’m mentally beat up from all the work it took to get to this place. There were so many setbacks, so many times that the finish line was in sight and then, suddenly, wasn’t. It’s intruded on other things, personal and professional, and I need that to stop.

As I write this, I’m backing up the “final” collection to the portable SSD, so I can copy it, in turn, over to the NAS using my desktop PC, which is connected via Ethernet. Over the weekend, I will try to start on the next steps.

But for now, I need a bit of a break.

Gain unlimited access to Premium articles.

With technology shaping our everyday lives, how could we not dig deeper?

Thurrott Premium delivers an honest and thorough perspective about the technologies we use and rely on everyday. Discover deeper content as a Premium member.

Tagged with

Share post

Thurrott