org.apache.solr.analysis
Class CapitalizationFilterFactory

java.lang.Object
  extended by org.apache.solr.analysis.BaseTokenFilterFactory
      extended by org.apache.solr.analysis.CapitalizationFilterFactory
All Implemented Interfaces:
TokenFilterFactory

public class CapitalizationFilterFactory
extends BaseTokenFilterFactory

A filter to apply normal capitalization rules to Tokens. It will make the first letter capital and the rest lower case.

This filter is particularly useful to build nice looking facet parameters. This filter is not appropriate if you intend to use a prefix query.

The factory takes parameters:
"onlyFirstWord" - should each word be capitalized or all of the words?
"keep" - a keep word list. Each word that should be kept separated by whitespace.
"keepIgnoreCase - true or false. If true, the keep list will be considered case-insensitive.
"forceFirstLetter" - Force the first letter to be capitalized even if it is in the keep list
"okPrefix" - do not change word capitalization if a word begins with something in this list. for example if "McK" is on the okPrefix list, the word "McKinley" should not be changed to "Mckinley"
"minWordLength" - how long the word needs to be to get capitalization applied. If the minWordLength is 3, "and" > "And" but "or" stays "or"
"maxWordCount" - if the token contains more then maxWordCount words, the capitalization is assumed to be correct.

 <fieldType name="text_cptlztn" class="solr.TextField" positionIncrementGap="100">
   <analyzer>
     <tokenizer class="solr.WhitespaceTokenizerFactory"/>
     <filter class="solr.CapitalizationFilterFactory" onlyFirstWord="true"
             keep="java solr lucene" keepIgnoreCase="false"
             okPrefix="McK McD McA"/>   
   </analyzer>
 </fieldType>

Since:
solr 1.3
Version:
$Id: CapitalizationFilterFactory.java 1073344 2011-02-22 14:35:02Z koji $

Field Summary
protected  Map<String,String> args
          The init args
static int DEFAULT_MAX_WORD_COUNT
           
static String FORCE_FIRST_LETTER
           
static String KEEP
           
static String KEEP_IGNORE_CASE
           
protected  Version luceneMatchVersion
          the luceneVersion arg
static String MAX_TOKEN_LENGTH
           
static String MAX_WORD_COUNT
           
static String MIN_WORD_LENGTH
           
static String OK_PREFIX
           
static String ONLY_FIRST_WORD
           
 
Fields inherited from class org.apache.solr.analysis.BaseTokenFilterFactory
log
 
Constructor Summary
CapitalizationFilterFactory()
           
 
Method Summary
protected  void assureMatchVersion()
          this method can be called in the TokenizerFactory.create(java.io.Reader) or TokenFilterFactory.create(org.apache.lucene.analysis.TokenStream) methods, to inform user, that for this factory a luceneMatchVersion is required
 org.apache.solr.analysis.CapitalizationFilter create(TokenStream input)
          Transform the specified input TokenStream
 Map<String,String> getArgs()
           
protected  boolean getBoolean(String name, boolean defaultVal)
           
protected  boolean getBoolean(String name, boolean defaultVal, boolean useDefault)
           
protected  int getInt(String name)
           
protected  int getInt(String name, int defaultVal)
           
protected  int getInt(String name, int defaultVal, boolean useDefault)
           
protected  CharArraySet getSnowballWordSet(ResourceLoader loader, String wordFiles, boolean ignoreCase)
          same as getWordSet(ResourceLoader, String, boolean), except the input is in snowball format.
protected  CharArraySet getWordSet(ResourceLoader loader, String wordFiles, boolean ignoreCase)
           
 void init(Map<String,String> args)
          init will be called just once, immediately after creation.
 void processWord(char[] buffer, int offset, int length, int wordCount)
           
protected  void warnDeprecated(String message)
           
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 
Methods inherited from interface org.apache.solr.analysis.TokenFilterFactory
getArgs
 

Field Detail

DEFAULT_MAX_WORD_COUNT

public static final int DEFAULT_MAX_WORD_COUNT
See Also:
Constant Field Values

KEEP

public static final String KEEP
See Also:
Constant Field Values

KEEP_IGNORE_CASE

public static final String KEEP_IGNORE_CASE
See Also:
Constant Field Values

OK_PREFIX

public static final String OK_PREFIX
See Also:
Constant Field Values

MIN_WORD_LENGTH

public static final String MIN_WORD_LENGTH
See Also:
Constant Field Values

MAX_WORD_COUNT

public static final String MAX_WORD_COUNT
See Also:
Constant Field Values

MAX_TOKEN_LENGTH

public static final String MAX_TOKEN_LENGTH
See Also:
Constant Field Values

ONLY_FIRST_WORD

public static final String ONLY_FIRST_WORD
See Also:
Constant Field Values

FORCE_FIRST_LETTER

public static final String FORCE_FIRST_LETTER
See Also:
Constant Field Values

args

protected Map<String,String> args
The init args


luceneMatchVersion

protected Version luceneMatchVersion
the luceneVersion arg

Constructor Detail

CapitalizationFilterFactory

public CapitalizationFilterFactory()
Method Detail

init

public void init(Map<String,String> args)
Description copied from interface: TokenFilterFactory
init will be called just once, immediately after creation.

The args are user-level initialization parameters that may be specified when declaring the factory in the schema.xml

Specified by:
init in interface TokenFilterFactory

processWord

public void processWord(char[] buffer,
                        int offset,
                        int length,
                        int wordCount)

create

public org.apache.solr.analysis.CapitalizationFilter create(TokenStream input)
Description copied from interface: TokenFilterFactory
Transform the specified input TokenStream


getArgs

public Map<String,String> getArgs()

assureMatchVersion

protected final void assureMatchVersion()
this method can be called in the TokenizerFactory.create(java.io.Reader) or TokenFilterFactory.create(org.apache.lucene.analysis.TokenStream) methods, to inform user, that for this factory a luceneMatchVersion is required


warnDeprecated

protected final void warnDeprecated(String message)

getInt

protected int getInt(String name)

getInt

protected int getInt(String name,
                     int defaultVal)

getInt

protected int getInt(String name,
                     int defaultVal,
                     boolean useDefault)

getBoolean

protected boolean getBoolean(String name,
                             boolean defaultVal)

getBoolean

protected boolean getBoolean(String name,
                             boolean defaultVal,
                             boolean useDefault)

getWordSet

protected CharArraySet getWordSet(ResourceLoader loader,
                                  String wordFiles,
                                  boolean ignoreCase)
                           throws IOException
Throws:
IOException

getSnowballWordSet

protected CharArraySet getSnowballWordSet(ResourceLoader loader,
                                          String wordFiles,
                                          boolean ignoreCase)
                                   throws IOException
same as getWordSet(ResourceLoader, String, boolean), except the input is in snowball format.

Throws:
IOException