Skip to main content

International Components for Unicode

ICU Home
  · ICU Home
ICU4C Demos
  · Converter Explorer
  · IDNA
  · Locale Explorer
  · Normalization Browser
  · Regular Expressions
  · String Compare
  · Transforms
  · Unicode Browser
ICU4J Demos
  · Demo Page
Tools
  · Data Customizer
 

Related Websites

Unicode Consortium

Common Locale Data

IBM Open Source

Globalize
Your E-Business

Sun: Java i18n forum

 

Display Problems?

ICU  >  Demonstrations  >  Locale Explorer > Root > English > United States
Language
English
Region / Variant
United States
calendar
(default)
collation
(default)
currency
(default)
Sublocales:
United States under other languages: Spanish, Hawaiian

This example demonstrates sorting (collation) in this locale.
















Original

00: bad
01: Bad
02: Bat
03: bat
04: bäd
05: Bäd
06: bät
07: Bät
08: côté
09: coté
10: côte
11: cote
12: black-bird
13: blackbird
14: black-birds
15: blackbirds

Collated

01: bad
02: Bad
05: bäd
06: Bäd
04: bat
03: Bat
07: bät
08: Bät
13: black-bird
15: black-birds
14: blackbird
16: blackbirds
12: cote
10: coté
11: côte
09: côté
(Click 'Fetch rules for locale' above, to edit rules)


Instructions:

  • Type in the lines of text you want to sort under Input Text.
  • Select the Options you want, and hit Sort.
  • The two output columns will show the original order and the sorted order, each numbered according to the original line. Any lines in the same box (with the same color) are sorted identically, according to the options you provide.
  • If you want to try changing the sorting rules, hit Edit Rules. It inserts the rules for the current locale, which you can then alter and try sorting with. You will need to know the format of the rules: see Collation in the ICU User Guide for more information.
    • Note: if you hit Edit Rules again, it will replace whatever you have altered!

Options:

  • ICU implements the Unicode Collation Algorithm, which is a multi-level sort.
    1. If there are any differences in base letters, that determines the result
    2. Otherwise, if there are any differences in accents*, that determines the results
    3. Otherwise, if there are any differences in case*, that determines the results
    4. Otherwise, if there are any differences in punctuation*, that determines the results
  • The Level option determines which of the above levels to take into account when sorting.
  • With Force Case, the normal case order (a < A vs. A < a) can be changed.
  • If Punctutation = Base, then punctuation is treated like base letters. If punctuation is Shifted, it is ignored except at L4.
  • A Case level can be used to keep a case level even if the strength is L1 or L2.
  • A Hiragana level adds a special level for JIS compatibility. It is only used if the level is L4 .. L5.
  • French accents force accents to be considered backwards, for the end of the string forwards.
  • With Full Normalization, all strings are compared


For more information, see the ICU userguide

Help

divider
Your settings: (click to change)
Powered by ICU 3.9.3
Sunday, May 11, 2008 11:42:34 AM PT
Timezone ID: PST8PDT (Change)
Label Locale:
Language
English
Region / Variant
(none)
Transliteration: off
divider

Help   Transliteration Help     XML Source   Compare     File a bug