Digital Decluttering: Tagging, Deduping, and Replicating the Photo Collection (Premium)

Tagging, Deduping, and Replicating the Photo Collection

With my photo collection consolidation project finally completed after several months of work, it was time to move on to the next steps: Cleaning up the photos (which involved ensuring that all of the photos have reasonably good “Date taken” meta-data, with as few duplicates as possible, and so on) and then uploading the collection to various cloud services and my NAS for backup/replication purposes. Given the volume of files involved, I knew I’d need to work in stages. And I figured the skills I developed during the consolidation process would help. Which they did.

This is what the photo collection looks like in OneDrive, which is where I did all this work. The root of the folder has an “Old pictures” folder plus one folder for each year from 2000 to now, and that “Old pictures” folder likewise has one subfolder for each year from 1999 on back.

I began with the photos in “Old pictures” for the obvious, logical reason that this folder represents the beginning years of the collection. But also because this part of the collection is relatively small from a file count/disk size perspective, and it would be easier to experiment with. (As with everything I’ve been doing, I made sure to work with copies of the collection and not the source, just in case.) For example, there are only 90 files in the collection with a 1969 date or older.

The first hurdle was figuring out how I could easily find those photos with no “Date taken” meta-data. But I had an idea.

I had previously used two utilities, Bulk Rename Utility (BRU) and MediaSorter, to organize loose photos into organized, date-based folders, and files with incorrect or missing meta-data were made obvious by the way they were copied: Files without meta-data were dumped into the root of the destination folder (instead of in a date-based folder), and files with wildly wrong meta-data were copied into a sub-folder that didn’t make any sense (like a 2016 folder while I was consolidating 2004 or whatever). So I figured I could use one of those tools to copy some subset of the collection to the desktop, organize the copied files in date-based folders, and then examine the files that weren’t correctly organized: Those photos would need to be fixed in the source (master) collection.

After experimenting with MediaSorter, which is the simpler tool, I ended up relying on the more complex and more powerful BRU instead. This requires a very specific in-app configuration, which it loses every time I close it, but it works perfectly: Working in batches—all folders in “Old pictures” through the 1960s, and then the 1970s, 1980s, and 1990s in turn—I made local, organized copies of these subsets of the photo collection. Some had a few issues. Some had many issues, especially in the middle of the 1980s, where I Past Paul had let down future Paul by scanning tons of photos but not correctly tagging them. But it went surprisingly quickly. By Saturday, I had cleaned up all of “Old pictures.” And by Sunday night, I had cleaned up through 2005.

To do this work, I created “Work” and “Work 2” folders on the laptop’s desktop. Then, I copied whatever subset of photos I was working on at the time—1944 through 1969 in the first pass—to the “Work” folder. And then I configured BRU to scan that folder and copy its contents to “Work 2” as if I was organizing them for the first time. This would copy the files in “Work” into new date-based subfolders in “Work 2.”

Anything organized correctly would be in date-appropriate folders, which I could delete. What was left was some combination of incorrect date-based folders and loose photos that were not tagged with “Date taken” meta-data. And so I would just fix whatever was left. For example, the 1970s batch had 37 loose items (and no incorrect date-based folders). So I snapped the “Work” folder on the left and whatever sub-folder in the OneDrive “Photo collection” on the right and found and fixed each non-tagged photo in turn.

When I was done, “Work” was empty, so I would then delete the contents of “Work” and move on to the next batch. As noted, some data range subsets were worse than others. There were over 350 loose files from the first half of the 1980s that needed to be tagged, for example, and then a similar number from the late 1980s as well.

Regardless, it went quickly. And once I got out of the 1980s, the number of loose, improperly tagged files declined dramatically. I raced through the 1990s and then moved year-by-year when I hit the 2000s because this is where the number of photos each year escalates. And by Monday morning, I had cleaned up the collection through 2005.

When I finished cleaning up everything in “Old pictures,” I decided that this part of the collection should be replicated in multiple places—Google Photos, Amazon Photos, and the NAS—and that I could likewise do the same in five-year batches, starting with 2000-2005, going forward. Copying to the NAS was straightforward enough. But when I looked at Google Photos, I was reminded that the web interface doesn’t let you drag-and-drop folder structures. You can only copy individual photos that way.

And this meant I would—ironically—need to “de-organize” the photos, copying them in batches out from their organized, data-based folder structures and into individual folders that consisted only of files with no sub-folders. Hilarious.

Here, again, I figured my earlier experiences would pay off, and they did: I used MediaSorter to copy the photos in batches—the 1960s and earlier, the 1970s, the 1980s, and then the 1990s—to new folders on the desktop. This process had a delightful side effect: Because MediaSorter catches and then ignores duplicates, it also de-duplified the resulting “raw” collection. At first, I used the CVS file that the app generates to root out the duplicates in the source (master) collection. But after a while, I realized this was unimportant and that most of the duplicates in there were not problematic.

Long story short, I created “de-organized” folders for all the eras in “Old pictures” and was able to easily drag and drop the contents of each into Google Photos (and Amazon Photos). Though some of the larger subsets needed to be broken into two groups: Google Photos won’t let you drag in 5,000-ish files at once, but you can drag in 2,500-ish files back-to-back with no issues.

When I moved on to the 2000s, I decided to process each year individually, as noted, and so in time I batched 2000 through 2005 into a single raw dump that I am uploading to the cloud this morning. (There were over 10,200 files in that batch.)

But I also deleted the existing files that were already in Google Photos from that era first because I wasn’t sure how it would handle duplicates and wanted just one copy of each there with the correct meta-data. And deleting from Google Photos also ensured that mis-tagged and untagged scans would likewise be removed. This was easy enough, though each bulk deletion took a few minutes, especially in the latter years when there are far more photos. (Well, that, and it’s curiously scary to delete memories: I had to keep reminding myself that I had multiple copies of everything.)

As of this writing, I have organized and replicated my newly consolidated and cleaned-up photo collection through 2005, and I have since finished cleaning up 2006 and am now working on 2007. Once I finish 2010, I’ll upload that 2005-2010 batch to the NAS, Google Photos, and Amazon Photos. I also decided to keep a copy of the “de-organized” batches on the NAS as well, and have copies of everything on two portable SSDs just in case.

And then I’ll just keep going. Obviously, the newer years have many, many more photos than the older years, and I’m sure that volume will introduce its own challenges. But I’d be surprised if any of the newer years had more loose, untagged or mis-tagged files than the 1980s because the collection gets cleaner as I move forward in time. But I’m going to find out. And I still hope to finish this entire project before we leave for Mexico on February 3.

Gain unlimited access to Premium articles.

With technology shaping our everyday lives, how could we not dig deeper?

Thurrott Premium delivers an honest and thorough perspective about the technologies we use and rely on everyday. Discover deeper content as a Premium member.

Tagged with

Share post

Thurrott