I've updated the Vietnamese Conversions XML Web Service today. There was a bug in the UTF8->Unicode conversion. Well, it's not really a bug, but here's the scenario:
If you convert Vietnamese unicode into UTF-8 and then send the verbatim UTF-8 escape sequences over email, all daggers (byte value 160) are converted into single spaces (byte value 32). To the “naked eye”, daggers look like spaces, but they do have different byte values, so if you convert them back into Unicode, it will be messed up. I have been told that this applies to nearly all conversion utilities such as VPSKeys, VNI, VietPad, etc. But don't blame these software, because they do the conversion right. It's the email gateways that we need to blame to convert daggers into spaces.
In the Vietnamese unicode set, there are only four characters that when converted contain daggers:
| Unicode |
UTF-8 sequence |
Byte values |
| à |
à |
195 160 |
| Ạ |
Ạ|
225 186 160 |
| Ơ |
Æ |
198 160 |
| Ỡ |
á» |
225 187 160 |
At any rate, I've added some code to convert those daggers back into spaces when it meets one of these four scenarios. I've also send the code to some of the creators of the other software utilities and hope they can update their software as well if they think it's necessary.
Again, it's not a bug per se, so consider this an enhancement to make the conversion a bit more intelligent.