My blog has moved and can now be found at http://blog.aniljohn.com

No action is needed on your part if you are already subscribed to this blog via e-mail or its syndication feed.

Monday, May 3, 2004
« Architecture Webcasts for Week of May 3 | Main | Columbia, MD ASP.NET User Group (CMAP) -... »

One of the basic tenets of Secure Coding is that "All input is Evil" until it has been validated to be otherwise.

Both in my DevDays presentation as well as in the "Improving Web Application Security" book, one of the Defense in Depth countermeasures when it comes to input validation is to set the correct character encoding in your web application.  The recommendation in both is to use the "ISO-8859-1" encoding. 

You do this because:

  • "To successfully restrict what data is valid for your Web pages, it is important to limit the ways in which the input data can be represented. This prevents malicious users from using canonicalization and multi-byte escape sequences to trick your input validation routines."
  • Using "safe" character encodings mitigates the possibility of using Unicode and multibyte encodings to disguise harmful characters. For example, an attacker might compromise URL authorizations incorporating the user name "Bob" by employing a legitimate account named "Bxb," where "x" is an oddball encoding of the letter "o"

"ISO-8859-1" is called the Latin-1 encoding and should work fine for any western European language including English. So safety in this case is enforced by limiting the character set to what is possible in a western European language. But try to display a language like Russian or Hebrew or Hindi and you'll get a bunch of un-displayable characters on the screen.

Of course you can use the UTF-8 character set, but then you will expand the ways in which input data can be represented. Which in turn leads to the possibility of canonicalization attacks.

So the take away from this is not necessarily to use "ISO-8859-1" at all times for all languages, as much as it is to limit the character encodings that are used in your web app such that you can realistically validate the info that comes in.  So for example, if you are running a Hebrew language website, an option may be to use an encoding that limits the input to only what is possible in that language.

BTW, to find out more about Unicode and Character Sets, I would point you to the following article by Joel Spolsky, which is probably the most lucid explanation of the topic that I've come across.

“The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!)“
http://www.joelonsoftware.com/articles/Unicode.html

[Now Playing: Tujhe Yaad Na Meri Aayee - Kuch Kuch Hota Hai]

Tags:: Security
5/3/2004 4:39 PM Eastern Daylight Time  |  Comments [1]  |  Disclaimer  |  Permalink   
Sunday, May 8, 2005 12:06:46 AM (Eastern Daylight Time, UTC-04:00)
Great article! I've often wondered why articles on security always mention using ISO-8859-1.
<br>However, you didn't mention how to set the ISO-8859-1 character set and this link below explains exactly how to.
<br><a target="_new" href="http://www.aspnetresources.com/blog/unicode_in_vsnet.aspx">http://www.aspnetresources.com/blog/unicode_in_vsnet.aspx</a>
<br>Please note that setting the encoding in the meta tag and in the globalization section of web.config is very different. This difference seems to be VERY, VERY important.
Jimmy Seow
Comments are closed.