Ok, this is a long one, but worth the read if you enjoy the challenges of debugging and working around problems in code that you have absolutely no control over. It’s also a great read if you just hate IE and want to laugh at the brief bout of insanity it caused me. 🙂
First, a little backgrounder…
Since the creation of Mimeo in 1998 we’ve been using 256 indexed color GIFs with transparency to display the content pages of our users’ documents. PNG has been around for a long time, but it wasn’t supported well (if at all) in the major browsers back then. These days our lowest common denominator browser which we consider to be IE6 these days, but the truth is we still work in IE5.5 as well. Here are our browser stats for the month of April in order of share descending:
- IE6 66%
- IE7 19.52%
- Mozilla/FireFox ~9%
- Safari ~2%
- IE 5.5 ~0.19%
- Netscape 7/8 < 0.1%
- Opera < 0.05%
The whole point of the project was to raise our image quality so that our users get a much better WSYWIG experience. One of Mimeo’s core features is that you can see exactly what your document is going to look like before you submit the job for printing (flash demo here). This helps you identify issues that you may not have noticed in your native application (Word, PowerPoint, Acrobat, etc.) during design time as well as issues that may arise during document formatting (i.e. where hole punches/binding will punch through content, how a clear cover will look on your document, etc.). So, we knew we wanted 24-bit color. We also knew that we needed alpha-transparency because we actually layer the content images on top of images that represent the paper stock that the user will be printing on. Therefore we need to present to the best of our ability how the color will actually look when printed on that paper in terms of both texture and color. These two simple requirements made it a no-brainer: 24-bit PNG was the format for us.
Now, on to the meat of this post…
IE7, Mozilla/FireFox, Safari all just work out of the box. 24-bit PNGs display in all their alpha-transparent glory with absolutely no problems what-so-ever. That’s great, but remember 66% of our user base is on IE6 and, as we all probably know by now, IE6 doesn’t support alpha-transparency properly out of the box. Why, even after so many years and service packs, this remains the case is beyond me. It’s certainly one of the easiest topics to bring up when you want to beat up on Microsoft because there’s just no excuse for it. None.
So, you can’t just simply do <img src=”myAlphaImage.png” />. Luckily there is something in IE since 5.5+ that does handle alpha image transparency properly which you can use to hack around this limitation called the AlphaImageLoader filter. I won’t go into the details, but if you search the web you’ll find tons of explanations of how to use it to solve the very problem I’m talking about here. The problem with those explanations is that it’s very rare you see the images being loaded dynamically. It’s usually just a static image in a page. Well, the document building process I mentioned earlier is a full out DHTML control that needs to load these images in on the fly as you flip through the pages of your document. Ok, so the good news is the filter API is of course programmable in the IE DOM, so we can just do something like this:
function ApplyIEPngImageSourceHack(pageImg, imageUrl){pageImg.src = "spacer.png";// Get the filter off the imgvar alphaFilter = pageImg.filters["DXImageTransform.
Microsoft.AlphaImageLoader"];// If the filter doesn't exist, add it nowif(!alphaFilter){// Initialize the filter with the source providedpageImg.style.filter = "progid:DXImageTransform.
Microsoft.AlphaImageLoader(sizingMethod="scale",
src=" + imageUrl + "")";alphaFilter = pageImg.filters["DXImageTransform.
Microsoft.AlphaImageLoader"];}else{// The filter already existed, just update its sourcealphaFilter.src = imageUrl;}// TODO: it's not this simple, read on...}
You’d expect something that should work pretty darn good. In fact, that’s basically the code we originally wrote to solve the problem. Again, I hate to sound like a broken record, but the document building experience is some complex DHTML. We’re positioning, hiding and showing elements as you navigate the document, we’re changing the src on several images at a time (some with the hack above, some without because they’re GIF or PNG 8-bit)… suffice to say there’s a lot going on. So, as soon as we started testing we ran into our first problem which is that as you would navigate a document, roughly 40% of the time, the majority of page would completely black out. So immediately everyone panicked and thought we were doomed. My position was that if it worked 60% of the time there has to be a specific mechanism that is causing the issue that other 40% of the time. As I began investigating this problem, it seemed to be stemming from the fact that so many things on the screen were changing in terms of the CSS display and visibility properties. If you know anything about filters, you know that they support the notion of transitions (i.e. they animate from one state to the next) and one of the triggers of transitions is a change to the visibility of the element to which the filter is applied. As a stab in the dark, I added the following line of code where the TODO is in the last code sample:
alphaFilter.apply();
The apply method is basically used to tell the filter to capture it’s current state in terms of layout and visibility and track changes between then and a call to the play method. It really shouldn’t be necessary to call this in the AlphaImageLoader’s case because there’s no transition, but for whatever reason, this change alone, seemed to make this bug disappear. I say “seemed” because it still appeared every now and then. Later we noticed that once we were no longer including the gAMA header in the PNGs we generated the problem went away entirely. Since we didn’t need gamma settings, this was the final solution to the problem for us. Whether or not the apply call is still needed at this point is unknown as there was no sense in removing something that originally had solved the problem and introduced no performance impact.
Sooo, we had images displaying great in IE6 now and looking 100% the same as the browsers that support the format natively. Victory is ours!!! Pop the champagne open and deliver this puppy! Welllllllllllll not quite. 🙁 All of a sudden now that the thing was actually usable, we quickly discovered we had another problem as we started expanding to our more complicated test cases. Depending on the number of PNGs simultaneously being displayed on the page with this hack the browser would occasionally lock up and become completely unresponsive both in terms of rendering and input. We had no clue what it could be, but I know the symptoms of Win32 STA message pump becoming locked up when I see them. The question was, what was IE locking up on? Clearly it had to be the AlphaImageLoader, but what aspect of it? Were we doomed yet again? To figure out this problem I literally had to attach to IE with the native debugger and figure out where we were in the stack. It turned out that the problem lied within the URL loading logic of the AlphaImageLoader. They seem to have some kind of flaw in their URL Moniker API callback synchronization. What it is I don’t know or care, all I know is I needed a way to work around it.
Once I figured out it was a problem with loading from a URL it struck me that I could easily have IE download the image to the cache for me via a img element and then just point to the same URL for the AlphaImageLoader which would hopefully use the cached image. Originally we were using the very same img element that the filter was applied to, but this would cause a nasty flash because IE would finish downloading the PNG, display it using the standard img rendering pipeline which would cause IE to flash the bKGd color of the image since it couldn’t render the pixels as alpha transparent. This was obviously too jarring of an experience to subject the majority of our user base to, so I quickly came up with an approach that would create a new img element in the DOM for any img element passed to the function that would act as a proxy for downloading the PNG. The code now looked something like this:
function ApplyIEPngImageSourceHack(pageImg, imageUrl){var proxyImg = pageImg.alphaProxyImg;// Create the proxy img the first time throughif(!proxyImg){pageImg.src = "spacer.png";proxyImg = document.createElement("IMG");proxyImg.ID = pageImg.ID + "_AlphaLoadingProxy";proxyImg.style.display = "none";proxyImg.originalImg = pageImg;proxyImg.onload = function(){var alphaFilter = this.originalImg.filters["DXImageTransform.Microsoft.AlphaImageLoader"];if(alphaFilter == null){this.originalImg.style.filter ="progid:DXImageTransform.Microsoft.AlphaImageLoader(sizingMethod="scale", src=" + this.src + "")";alphaFilter = this.originalImg.filters["DXImageTransform.Microsoft.AlphaImageLoader"];}else{alphaFilter.src = imageUrl;}alphaFilter.apply();};// Append the proxy image into the parent elementof the originalpageImg.parentElement.appendChild(proxyImg);// Associate the proxy image with the original image
// for future lookupspageImg.alphaProxyImg = proxyImg;}// Kick the download off with the proxy imgproxyImg.src = imageUrl;}
Quite a bit more verbose, eh? Just think, that doesn’t even include cancellation of async loads and load error handling. ;P Anyway, it worked great… or at least it seemed to originally. Sure, all the JavaScript code was executing the way we expected and logically it all seemed to make sense, but whenever there was a network hiccup or we had a slower connection we were still experiencing lock ups. I was baffled, so I took at look at the traffic that was being produced using Fiddler. I noticed something quite strange, there were now two requests being made for the same URL, but the URLs were ever so slightly different. See, our image URLs include characters on the query string that should be properly encoded. So like good little developers we were using encodeURIComponent to encode them as we built the image urls. What was happening is that first the proxy img element would request the properly encoded URL. Then, as you see in the highlighted logic of the onload event handler above, all it does is assign the exact same URL from its own src property to the AlphaImageLoader’s src property. What’s the problem then? AlphaImageLoader’s src property actually decodes the properly encoded query string parameters, then, because the URL is then different, the cache lookup fails and so AlphaImageLoader has to go download the image itself again which puts us back to square one with the lock ups. Why the hell AlphaImageLoader is decoding the query string is beyond me, but, once again, we had to find a work around.
To get around this bug, we just ended up not encoding the query string components properly anymore in the IE<6 hacked code path. We simply leave them as is, but this obviously won’t work well for characters which have special meaning. Since the user is potentially supplying these characters via a text box in certain scenarios, we had to come up with a way to escape them. For “rn” we escape the slashes like so “\r\n” otherwise AlphaImageLoader would remove them altogether. For the other special URL characters (i.e. &, =, #, +, etc.) we had to resort to converting them to “well known” literal escape sequences. We couldn’t use real URL escape sequences because the AlphaImageLoader also decoded those, so we ended up using characters at the very top of the Unicode spectrum (e.x. uDFF9) that our users wouldn’t ever actually type in. We then simply detect these special sequences on the server and unescape them before actual processing occurs.
So to wrap up, something that should have just involved changing the extension of a URL on an img src ended up taking us weeks to debug and solve. At one point the project actually had to be put on the back burner where it stagnated in the source control repository for about two months until I got some time to resurrect it and drill down to discover the work-arounds for the issues I mentioned above. 🙁 Not trying to toot my own horn or anything, but can you imagine the average web developer having to deal with this? I mean the only reason I was able to solve this was because of my intimate knowledge of MSHTML, Win32 and some solid native code debugging skills. This is why Microsoft catches so much heat and deservedly so. I’m already quite disappointed that we heard nothing about IE v.Next at Mix. Someone needs to speak up over on the IE blog. Perhaps they just putting too much focus on Silverlight?
Anyway, the project is finally released and the images it’s providing look great! While I’m sure our customers are delighted with the higher fidelity experience that we’re providing, they’ll never be able to appreciate the insane amount of work it took to get them there and that’s why I figured I’d share the experience with my readers who, hopefully, can. 😉
Instead of instantiating a DOM img element, why not use the native Javascript Image() object?
Great question Rich! I should have covered that in my write up because it seems like the more straight forward approach. The problem is, I needed to know when the image was done loading so that I could perform the swap. Unfortunately the Image object does not have any events available (at least not in IE6, don’t think it’s in other browsers either), hence I had to use the DOM element and hook it’s onload/onerror events.
Heh. I ran in to the exact same problem (handling escaped URIs) with the AlphaImageLoader a few years back. Interesting bit of trivia – did you notice that the AlphaImageLoader doesn’t observe the 2-per-server HTTP connection limit? Makes for some damn fast page loading when you’ve got lots of images. 😉
Kevin,Actually I didn’t notice that. Never really got the chance because of the URL loading lock-up bug I mentioned. Since we have to load with a proxy img element to work around that issue, we’re stuck with the connection limit. :(I really wish browser vendors would come together and agree to at least double it and make it four at some point in the near future. With all the AJAX going on today it can really be a hell of bottleneck.Cheers,Drew
Yeah, I never saw the lock-up problem – maybe because my images were super-small.I totally agree about the connection limit. It seems a bit archaic in this day and age – a remnant from the early days of the web.
Hi,Can you give one or more examples of how to use your code?Also, shouldn’t the following line “var proxyImg = pageImg.alphaProxyImg;” be “var proxyImg = pageImg.proxyImg;”?Thank you,Daniel
Daniel,Sorry, I actually left out a line. At the very end we need to assign the local variable proxyImg back to the original img element in a slot named alphaProxyImg. I updated the code.Cheers,Drew