COLUMNS
Dealing with Shift-JIS
Historically CJK languages have been hard to deal with in the world of software. Japan's notorious Shift-JIS is sometimes a nightmare for young developers who knows only about the current global standard UTF-8. This is mainly because Windows OS and Office software in Japan for example was set to Shift-JIS by default. Excel these days still opens files with Shift-JIS, meaning UTF-8 based files get corrupt.
If you are looking at this article perhaps you are trying to make an integration with a rather old Japanese system, where the worst case is that you're dealing with fixex-byte csv or txt files to generate.
Here is a quick chart to understand the byte definition with Shift-JIS.
Type of character | Bytes |
English alphabet | 1 |
Common special characters lie {}, |, ~, [], _, ^, \ | 1 |
Half width Japanese Katakana | 1 |
Full width Japanese Katakana (Kanji, Hiragana, Katakana) | 2 |
You can use a Character Counter against your sample texts so that you can test your own logic.
Lastly, there are certain characters that do not exist in Shift-JIS (like Chinese), which means that if simply won't be able to render / show up. If you are generating a file that includes such text, it should be omitted.