Over the weekend, I put together a little tool that scans executable files for PNG images containing useless Adobe Extensible Metadata Platform (XMP) metadata. I ran it against a vanilla Windows 10 image and was surprised that Windows contains a lot of this stuff.
Adobe XMP, generally speaking, is an Adobe technology that serializes metadata like titles, internal identifiers, GPS coordinates, and color information into XML and jams it into things, like images. This data can be extremely valuable in some cases but Windows doesn’t need or use this stuff. It just eats up disk space and CPU cycles.
Thanks to horrible Adobe Photoshop defaults, it’s very easy to unknowingly include this metadata in your final image assets. So easy, almost all the images on this site are chock full of it. (Blame Paul.) But you can appreciate my surprise when a bunch of important Windows binaries showed up in my tool.
Windows Explorer, for example, is a critical Shell component in the startup hot path. But despite its importance, it’s comprised of ~20% pure garbage. ApplicationFrame.dll, responsible for Windows app title bars and frame gizmos, is ~41% garbage. Twinui, imageres, and other related components scored with much lower numbers but couldn’t fully escape Adobe XMP.
It sounds like nitpicking but this stuff impacts Windows in all sorts of areas, such as boot time, redistributable and on-disk image size, servicing (delta patches), and runtime Authenticode validation. Funnily enough, these are the exact areas that a special performance team at Microsoft are supposed to be monitoring. Oops.
Thankfully, fixing this should be relatively simple for Microsoft. It just needs to squeeze PNG assets before or at code check in time. Alternatively (or additionally), an optimization pass could be performed at compile time to catch outliers, though could increase overall product build time.
But teams at Microsoft may want to check in with the Microsoft Edge team first. Eric Lawrence – former Microsoft Internet Explorer Program Manager – pointed out the Edge team already runs a set of tools against its assets to strip out garbage, and further optimize its binaries using Google’s ZopFli compression algorithm.
You can find the code for my tool on GitHub. But be sure to also check out Eric Law’s parallel analysis using the venerable PNGDistill tool, showing additional savings are possible.
Technical note: My tool only measures Adobe XMP present in three types of PNG text chunks — iTXt, tEXt, and zTXt. Additional Adobe-specific metadata could be stored in other chunks and is out of scope for this post and tool.