COLUMNS

Dealing with Shift-JIS

Historically CJK languages have been hard to deal with in the world of software. Japan's notorious Shift-JIS is sometimes a nightmare for young developers who knows only about the current global standard UTF-8. This is mainly because Windows OS and Office software in Japan for example was set to Shift-JIS by default. Excel these days still opens files with Shift-JIS, meaning UTF-8 based files get corrupt.

If you are looking at this article perhaps you are trying to make an integration with a rather old Japanese system, where the worst case is that you're dealing with fixex-byte csv or txt files to generate.

Here is a quick chart to understand the byte definition with Shift-JIS.

 Type of character Bytes
English alphabet 1
Common special characters lie {}, |, ~, [], _, ^, \ 1
Half width Japanese Katakana 1
Full width Japanese Katakana (Kanji, Hiragana, Katakana) 2

 

You can use a Character Counter against your sample texts so that you can test your own logic.

Lastly, there are certain characters that do not exist in Shift-JIS (like Chinese), which means that if simply won't be able to render / show up. If you are generating a file that includes such text, it should be omitted.

 

← Return to list