org.apache.solr.analysis
Interface TokenizerFactory

All Known Implementing Classes:
ArabicLetterTokenizerFactory, BaseTokenizerFactory, ChineseTokenizerFactory, CJKTokenizerFactory, ClassicTokenizerFactory, EdgeNGramTokenizerFactory, ICUTokenizerFactory, KeywordTokenizerFactory, LetterTokenizerFactory, LowerCaseTokenizerFactory, NGramTokenizerFactory, PathHierarchyTokenizerFactory, PatternTokenizerFactory, RussianLetterTokenizerFactory, SmartChineseSentenceTokenizerFactory, StandardTokenizerFactory, TrieTokenizerFactory, UAX29URLEmailTokenizerFactory, WhitespaceTokenizerFactory, WikipediaTokenizerFactory

public interface TokenizerFactory

A TokenizerFactory breaks up a stream of characters into tokens.

TokenizerFactories are registered for FieldTypes with the IndexSchema through the schema.xml file.

Example schema.xml entry to register a TokenizerFactory implementation to tokenize fields of type "cool"

  <fieldtype name="cool" class="solr.TextField">
      <analyzer>
      <tokenizer class="solr.StandardTokenizerFactory"/>
      ...
 

A single instance of any registered TokenizerFactory is created via the default constructor and is reused for each FieldType.

Version:
$Id: TokenizerFactory.java 929782 2010-04-01 02:15:27Z rmuir $

Method Summary
 org.apache.lucene.analysis.Tokenizer create(Reader input)
          Creates a TokenStream of the specified input
 Map<String,String> getArgs()
          Accessor method for reporting the args used to initialize this factory.
 void init(Map<String,String> args)
          init will be called just once, immediately after creation.
 

Method Detail

init

void init(Map<String,String> args)
init will be called just once, immediately after creation.

The args are user-level initialization parameters that may be specified when declaring a the factory in the schema.xml


getArgs

Map<String,String> getArgs()
Accessor method for reporting the args used to initialize this factory.

Implementations are strongly encouraged to return the contents of the Map passed to to the init method


create

org.apache.lucene.analysis.Tokenizer create(Reader input)
Creates a TokenStream of the specified input



Copyright © 2000-2011 Apache Software Foundation. All Rights Reserved.