Locale-Independent String Sorting in Java

In Java programming language, Arrays.sort or Collections.sort methods from the Class Library are used to sort a list of strings. These methods use the compareTo method of the Comparable interface for objects. The standard compareTo implementation of the String class uses ASCII codes of the characters, and this works great in English.

When you need to sort strings in languages other than English, or more commonly you need to write a software for global use, then you can use the Collator class.

Below is the Turkish alphabet in order.

abcçdefgğhıijklmnoöprsştuüvyz

Let’s sort the characters of the alphabet with Arrays.sort method.

String[] alphabet = "abcçdefgğhıijklmnoöprsştuüvyz".split("");

Arrays.sort(alphabet);

The result is not ordered in Turkish, as expected.

abcdefghijklmnoprstuvyzçöüğış

Let’s try the Collator class.

Arrays.sort(alphabet, Collator.getInstance(Locale.forLanguageTag("tr")));

This works just in Turkish unsurprisingly. When you need to make your code work in other languages too, you can use getDefault instead of forLanguageTag.

Arrays.sort(alphabet, Collator.getInstance(Locale.getDefault()));

I will write about internationalization in detail, but until then, have a look at What’s wrong with Turkey? by Jeff Atwood