Here’s a tidbit of info that will hopefully help some people out. It took me a couple stabs at Google keyword combinations to find the answer. Basically the problem was that we’re taking user input in an HTML page which can include Unicode characters and need to pass that on the query string to an ASP.NET handler for processing. What was happening is that when the data was got to ASP.NET the Unicode characters seemed to have been stripped out. Now, being a very Unicode concious person I’m wondering why the heck this could possibly happen considering I’ve got my <globalization requestEncoding=”utf-8″ responseEncoding=”utf-8″> in the web.config. It dawned on me relatively quickly that these only control HTTP content, not the URL encoding which is entirely handled by the browser. Therefore the problem had to be that the client wasn’t encoding the URL properly. Next we found the client script that was building the URL, but it was using the escape function to properly encode any invalid characters (or so we thought, read on…). Next, I checked the URL being sent to the server using IEHttpHeaders and the character was encoded as %E9 (é was the unicode character in question btw). Well %E9 is the ISO-Latin encoding. The Unicode encoding is supposed to be %u00E9. The problem is that the escape function is locked into a basic functionality which doesn’t truly leverage proper Unicode encoding (see appendix B of the EMCAScript specification). A new set of URI encoding/decoding functions were introduced with ECMAScript which handle Unicode. The one that takes escape’s place is called encodeURIComponent. Needless to say we quickly did a replace in all files on our ECMAScript codebase to eradicate the undesireable encoding behavior of the legacy escape function.
- Related Content by Tag
- .NET
- Web Development
It took you some keyword combinations to get the answer, but I just typed “unicode characters querystring” and got sent right to your site. Thanks a lot. This was starting to tick me.