更新日時

Dealing with Shift-JIS

Historically CJK languages have been hard to deal with in the world of software. Japan's notorious Shift-JIS is sometimes a nightmare for young developers who knows only about the current global standard UTF-8. This is mainly because Windows OS and Office software in Japan for example was set to Shift-JIS by default. Excel these days still opens files with Shift-JIS, meaning UTF-8 based files get corrupt.

If you are looking at this article perhaps you are trying to make an integration with a rather old Japanese system, where the worst case is that you're dealing with fixed-byte csv or txt files to generate.

Here is a quick chart to understand the byte definition with Shift-JIS.

 Type of character Bytes Example Value
English alphabet 1 flagship
Common special characters like {}, |, ~, [], _, ^, \ 1 __^
Half width characters 1

フラッグシップ
1 2 3

Full width characters (Kanji, Hiragana, Katakana) 2 フラッグシップ
1 2 3

 

Half-width and Full-width Characters in Japanese

Japanese text uses both half-width and full-width characters, and the difference is more than visual — it affects how data is encoded and stored.

In Shift-JIS, full-width characters (kanji, hiragana, katakana, and full-width symbols) are represented as two bytes, while half-width characters (ASCII letters, digits, punctuation, and half-width katakana) use one byte. This means that “one character” on screen might take either one or two bytes in memory. When working with legacy systems or fixed-width files, this distinction can lead to byte-count mismatches, misaligned data, or even garbled text if strings are cut at arbitrary byte positions. Developers dealing with Shift-JIS or other legacy encodings should always remember: byte length ≠ character length. Use safe string-handling functions, normalize full/half-width forms where possible, and validate text after conversions to prevent encoding errors.

 

You can use a Character Counter against your sample texts so that you can test your own logic.

Lastly, there are certain characters that do not exist in Shift-JIS (like Chinese), which means that if simply won't be able to render / show up. If you are generating a file that includes such text, it should be omitted.

 

Store Assistant