org.apache.solr.analysis
Class PatternTokenizerFactory

java.lang.Object
  extended by org.apache.solr.analysis.BaseTokenizerFactory
      extended by org.apache.solr.analysis.PatternTokenizerFactory
All Implemented Interfaces:
TokenizerFactory

public class PatternTokenizerFactory
extends BaseTokenizerFactory

Factory for PatternTokenizer. This tokenizer uses regex pattern matching to construct distinct tokens for the input stream. It takes two arguments: "pattern" and "group".

group=-1 (the default) is equivalent to "split". In this case, the tokens will be equivalent to the output from (without empty tokens): String.split(java.lang.String)

Using group >= 0 selects the matching group as the token. For example, if you have:

  pattern = \'([^\']+)\'
  group = 0
  input = aaa 'bbb' 'ccc'
the output will be two tokens: 'bbb' and 'ccc' (including the ' marks). With the same input but using group=1, the output would be: bbb and ccc (no ' marks)

NOTE: This Tokenizer does not output tokens that are of zero length.

 <fieldType name="text_ptn" class="solr.TextField" positionIncrementGap="100">
   <analyzer>
     <tokenizer class="solr.PatternTokenizerFactory" pattern="\'([^\']+)\'" group="1"/>
   </analyzer>
 </fieldType>

Since:
solr1.2
Version:
$Id:$
See Also:
PatternTokenizer

Field Summary
protected  Map<String,String> args
          The init args
protected  int group
           
static String GROUP
           
protected  org.apache.lucene.util.Version luceneMatchVersion
          the luceneVersion arg
protected  Pattern pattern
           
static String PATTERN
           
 
Fields inherited from class org.apache.solr.analysis.BaseTokenizerFactory
log
 
Constructor Summary
PatternTokenizerFactory()
           
 
Method Summary
protected  void assureMatchVersion()
          this method can be called in the TokenizerFactory.create(java.io.Reader) or TokenFilterFactory.create(org.apache.lucene.analysis.TokenStream) methods, to inform user, that for this factory a luceneMatchVersion is required
 org.apache.lucene.analysis.Tokenizer create(Reader in)
          Split the input using configured pattern
 Map<String,String> getArgs()
           
protected  boolean getBoolean(String name, boolean defaultVal)
           
protected  boolean getBoolean(String name, boolean defaultVal, boolean useDefault)
           
protected  int getInt(String name)
           
protected  int getInt(String name, int defaultVal)
           
protected  int getInt(String name, int defaultVal, boolean useDefault)
           
protected  org.apache.lucene.analysis.CharArraySet getWordSet(ResourceLoader loader, String wordFiles, boolean ignoreCase)
           
static List<org.apache.lucene.analysis.Token> group(Matcher matcher, String input, int group)
          Deprecated.  
 void init(Map<String,String> args)
          Require a configured pattern
static List<org.apache.lucene.analysis.Token> split(Matcher matcher, String input)
          Deprecated.  
protected  void warnDeprecated(String message)
           
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 
Methods inherited from interface org.apache.solr.analysis.TokenizerFactory
getArgs
 

Field Detail

PATTERN

public static final String PATTERN
See Also:
Constant Field Values

GROUP

public static final String GROUP
See Also:
Constant Field Values

pattern

protected Pattern pattern

group

protected int group

args

protected Map<String,String> args
The init args


luceneMatchVersion

protected org.apache.lucene.util.Version luceneMatchVersion
the luceneVersion arg

Constructor Detail

PatternTokenizerFactory

public PatternTokenizerFactory()
Method Detail

init

public void init(Map<String,String> args)
Require a configured pattern

Specified by:
init in interface TokenizerFactory

create

public org.apache.lucene.analysis.Tokenizer create(Reader in)
Split the input using configured pattern


split

@Deprecated
public static List<org.apache.lucene.analysis.Token> split(Matcher matcher,
                                                                      String input)
Deprecated. 

This behaves just like String.split( ), but returns a list of Tokens rather then an array of strings NOTE: This method is not used in 1.4.


group

@Deprecated
public static List<org.apache.lucene.analysis.Token> group(Matcher matcher,
                                                                      String input,
                                                                      int group)
Deprecated. 

Create tokens from the matches in a matcher NOTE: This method is not used in 1.4.


getArgs

public Map<String,String> getArgs()

assureMatchVersion

protected final void assureMatchVersion()
this method can be called in the TokenizerFactory.create(java.io.Reader) or TokenFilterFactory.create(org.apache.lucene.analysis.TokenStream) methods, to inform user, that for this factory a luceneMatchVersion is required


warnDeprecated

protected final void warnDeprecated(String message)

getInt

protected int getInt(String name)

getInt

protected int getInt(String name,
                     int defaultVal)

getInt

protected int getInt(String name,
                     int defaultVal,
                     boolean useDefault)

getBoolean

protected boolean getBoolean(String name,
                             boolean defaultVal)

getBoolean

protected boolean getBoolean(String name,
                             boolean defaultVal,
                             boolean useDefault)

getWordSet

protected org.apache.lucene.analysis.CharArraySet getWordSet(ResourceLoader loader,
                                                             String wordFiles,
                                                             boolean ignoreCase)
                                                      throws IOException
Throws:
IOException


Copyright © 2000-2011 Apache Software Foundation. All Rights Reserved.