Minh T. Nguyen

        "Enemy's Gate Is Down"
Search this site:

Minh Tri Nguyen Minh T. Nguyen enderminh Vietnamese nguyentriminh blog Visual Studio .NET Tips and Tricks Nguyễn Trí Minh
posts - 203, comments - 798, trackbacks - 120

Vietnamese Conversions Update

I've updated the Vietnamese Conversions XML Web Service today. There was a bug in the UTF8->Unicode conversion. Well, it's not really a bug, but here's the scenario:

If you convert Vietnamese unicode into UTF-8 and then send the verbatim UTF-8 escape sequences over email, all daggers (byte value 160) are converted into single spaces (byte value 32). To the “naked eye”, daggers look like spaces, but they do have different byte values, so if you convert them back into Unicode, it will be messed up. I have been told that this applies to nearly all conversion utilities such as VPSKeys, VNI, VietPad, etc. But don't blame these software, because they do the conversion right. It's the email gateways that we need to blame to convert daggers into spaces.

In the Vietnamese unicode set, there are only four characters that when converted contain daggers:

Unicode UTF-8 sequence Byte values
à Ã 195 160
Ạ225 186 160
Ơ Æ 198 160
á» 225 187 160


At any rate, I've added some code to convert those daggers back into spaces when it meets one of these four scenarios. I've also send the code to some of the creators of the other software utilities and hope they can update their software as well if they think it's necessary.

Again, it's not a bug per se, so consider this an enhancement to make the conversion a bit more intelligent.

posted on Tuesday, May 11, 2004 7:52 AM

Feedback

# re: Vietnamese Conversions Update

Hello all. cool blog.
2/16/2007 12:50 PM | hydrocodone

Post Comment

Title  
Name  
Url
Comment   
Enter the code you see: