|
JavaTM 2 Platform Std. Ed. v1.3.1 |
|||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: INNER | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Object | +--java.text.Collator | +--java.text.RuleBasedCollator
The RuleBasedCollator
class is a concrete subclass of
Collator
that provides a simple, data-driven, table
collator. With this class you can create a customized table-based
Collator
. RuleBasedCollator
maps
characters to sort keys.
RuleBasedCollator
has the following restrictions
for efficiency (other subclasses may be used for more complex languages) :
The collation table is composed of a list of collation rules, where each rule is of three forms:
< modifier > < relation > < text-argument > < reset > < text-argument >The following demonstrates how to create your own collation rules:
b c
is treated as bc
.
'@' : Indicates that accents are sorted backwards, as in French.
'&' : Indicates that the next rule follows the position to where the reset text-argument would be sorted.
This sounds more complicated than it is in practice. For example, the following are equivalent ways of expressing the same thing:
Notice that the order is important, as the subsequent item goes immediately after the text-argument. The following are not equivalent:a < b < c a < b & b < c a < c & a < b
Either the text-argument must already be present in the sequence, or some initial substring of the text-argument must be present. (e.g. "a < b & ae < e" is valid since "a" is present in the sequence before "ae" is reset). In this latter case, "ae" is not entered and treated as a single character; instead, "e" is sorted as if it were expanded to two characters: "a" followed by an "e". This difference appears in natural languages: in traditional Spanish "ch" is treated as though it contracts to a single character (expressed as "c < ch < d"), while in traditional German a-umlaut is treated as though it expanded to two characters (expressed as "a,A < b,B ... & ae;ã & AE;Ã"). [ã and à are, of course, the escape sequences for a-umlaut.]a < b & a < c a < c & a < b
Ignorable Characters
For ignorable characters, the first rule must start with a relation (the examples we have used above are really fragments; "a < b" really should be "< a < b"). If, however, the first relation is not "<", then all the all text-arguments up to the first "<" are ignorable. For example, ", - < a < b" makes "-" an ignorable character, as we saw earlier in the word "black-birds". In the samples for different languages, you see that most accents are ignorable.
Normalization and Accents
RuleBasedCollator
automatically processes its rule table to
include both pre-composed and combining-character versions of
accented characters. Even if the provided rule string contains only
base characters and separate combining accent characters, the pre-composed
accented characters matching all canonical combinations of characters from
the rule string will be entered in the table.
This allows you to use a RuleBasedCollator to compare accented strings even when the collator is set to NO_DECOMPOSITION. There are two caveats, however. First, if the strings to be collated contain combining sequences that may not be in canonical order, you should set the collator to CANONICAL_DECOMPOSITION or FULL_DECOMPOSITION to enable sorting of combining sequences. Second, if the strings contain characters with compatibility decompositions (such as full-width and half-width forms), you must use FULL_DECOMPOSITION, since the rule tables only include canonical mappings. For more information, see The Unicode Standard, Version 2.0.)
Errors
The following are errors:
RuleBasedCollator
throws
a ParseException
.
Examples
Simple: "< a < b < c < d"
Norwegian: "< a,A< b,B< c,C< d,D< e,E< f,F< g,G< h,H< i,I< j,J < k,K< l,L< m,M< n,N< o,O< p,P< q,Q< r,R< s,S< t,T < u,U< v,V< w,W< x,X< y,Y< z,Z < å=a?,Å=A? ;aa,AA< æ,Æ< ø,Ø"
Normally, to create a rule-based Collator object, you will use
Collator
's factory method getInstance
.
However, to create a rule-based Collator object with specialized
rules tailored to your needs, you construct the RuleBasedCollator
with the rules contained in a String
object. For example:
Or:String Simple = "< a < b < c < d"; RuleBasedCollator mySimple = new RuleBasedCollator(Simple);
String Norwegian = "< a,A< b,B< c,C< d,D< e,E< f,F< g,G< h,H< i,I< j,J" + "< k,K< l,L< m,M< n,N< o,O< p,P< q,Q< r,R< s,S< t,T" + "< u,U< v,V< w,W< x,X< y,Y< z,Z" + "< å=a?,Å=A?" + ";aa,AA< æ,Æ< ø,Ø"; RuleBasedCollator myNorwegian = new RuleBasedCollator(Norwegian);
Combining Collator
s is as simple as concatenating strings.
Here's an example that combines two Collator
s from two
different locales:
// Create an en_US Collator object RuleBasedCollator en_USCollator = (RuleBasedCollator) Collator.getInstance(new Locale("en", "US", "")); // Create a da_DK Collator object RuleBasedCollator da_DKCollator = (RuleBasedCollator) Collator.getInstance(new Locale("da", "DK", "")); // Combine the two // First, get the collation rules from en_USCollator String en_USRules = en_USCollator.getRules(); // Second, get the collation rules from da_DKCollator String da_DKRules = da_DKCollator.getRules(); RuleBasedCollator newCollator = new RuleBasedCollator(en_USRules + da_DKRules); // newCollator has the combined rules
Another more interesting example would be to make changes on an existing
table to create a new Collator
object. For example, add
"& C < ch, cH, Ch, CH" to the en_USCollator
object to create
your own:
// Create a new Collator object with additional rules String addRules = "& C < ch, cH, Ch, CH"; RuleBasedCollator myCollator = new RuleBasedCollator(en_USCollator + addRules); // myCollator contains the new rules
The following example demonstrates how to change the order of non-spacing accents,
// old rule String oldRules = "=?;?;?;?" // main accents + ";?;?;?;?" // main accents + ";?;?;?;?" // main accents + ";?;?;?;?" // main accents + ";?;?;?;?" // main accents + "< a , A ; ae, AE ; æ , Æ" + "< b , B < c, C < e, E & C < d, D"; // change the order of accent characters String addOn = "& ? ; ? ; ?"; RuleBasedCollator myCollator = new RuleBasedCollator(oldRules + addOn);
The last example shows how to put new primary ordering in before the
default setting. For example, in Japanese Collator
, you
can either sort English characters before or after Japanese characters,
// get en_US Collator rules RuleBasedCollator en_USCollator = (RuleBasedCollator)Collator.getInstance(Locale.US); // add a few Japanese character to sort before English characters // suppose the last character before the first base letter 'a' in // the English collation rule is ? String jaString = "& ? < ?, ? < ?, ?"; RuleBasedCollator myJapaneseCollator = new RuleBasedCollator(en_USCollator.getRules() + jaString);
Collator
,
CollationElementIterator
Fields inherited from class java.text.Collator |
CANONICAL_DECOMPOSITION, FULL_DECOMPOSITION, IDENTICAL, NO_DECOMPOSITION, PRIMARY, SECONDARY, TERTIARY |
Constructor Summary | |
RuleBasedCollator(String rules)
RuleBasedCollator constructor. |
Method Summary | |
Object |
clone()
Standard override; no change in semantics. |
int |
compare(String source,
String target)
Compares the character data stored in two different strings based on the collation rules. |
boolean |
equals(Object obj)
Compares the equality of two collation objects. |
CollationElementIterator |
getCollationElementIterator(CharacterIterator source)
Return a CollationElementIterator for the given String. |
CollationElementIterator |
getCollationElementIterator(String source)
Return a CollationElementIterator for the given String. |
CollationKey |
getCollationKey(String source)
Transforms the string into a series of characters that can be compared with CollationKey.compareTo. |
String |
getRules()
Gets the table-based rules for the collation object. |
int |
hashCode()
Generates the hash code for the table-based collation object |
Methods inherited from class java.text.Collator |
compare, equals, getAvailableLocales, getDecomposition, getInstance, getInstance, getStrength, setDecomposition, setStrength |
Methods inherited from class java.lang.Object |
finalize, getClass, notify, notifyAll, toString, wait, wait, wait |
Constructor Detail |
public RuleBasedCollator(String rules) throws ParseException
rules
- the collation rules to build the collation table from.ParseException
- A format exception
will be thrown if the build process of the rules fails. For
example, build rule "a < ? < d" will cause the constructor to
throw the ParseException because the '?' is not quoted.Locale
Method Detail |
public String getRules()
public CollationElementIterator getCollationElementIterator(String source)
CollationElementIterator
public CollationElementIterator getCollationElementIterator(CharacterIterator source)
CollationElementIterator
public int compare(String source, String target)
compare
in class Collator
java.text.Collator
source
- the source string.target
- the target string.CollationKey
,
Collator.getCollationKey(java.lang.String)
public CollationKey getCollationKey(String source)
getCollationKey
in class Collator
java.text.Collator
source
- the string to be transformed into a collation key.CollationKey
,
Collator.compare(java.lang.String, java.lang.String)
public Object clone()
clone
in class Collator
java.lang.Object
CloneNotSupportedException
- if the object's class does not
support the Cloneable
interface. Subclasses
that override the clone
method can also
throw this exception to indicate that an instance cannot
be cloned.OutOfMemoryError
- if there is not enough memory.Cloneable
public boolean equals(Object obj)
equals
in class Collator
obj
- the table-based collation object to be compared with this.public int hashCode()
hashCode
in class Collator
java.lang.Object
Object.equals(java.lang.Object)
,
Hashtable
|
JavaTM 2 Platform Std. Ed. v1.3.1 |
|||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: INNER | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
Java, Java 2D, and JDBC are trademarks or registered trademarks of Oracle and/or its affiliates, in the US and other countries.
Copyright © 1995, 2010 Oracle and/or its affiliates. All rights reserved.