|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Objectorg.apache.solr.analysis.BaseTokenizerFactory
org.apache.solr.analysis.PatternTokenizerFactory
public class PatternTokenizerFactory
Factory for PatternTokenizer
.
This tokenizer uses regex pattern matching to construct distinct tokens
for the input stream. It takes two arguments: "pattern" and "group".
group=-1 (the default) is equivalent to "split". In this case, the tokens will
be equivalent to the output from (without empty tokens):
String.split(java.lang.String)
Using group >= 0 selects the matching group as the token. For example, if you have:
pattern = \'([^\']+)\' group = 0 input = aaa 'bbb' 'ccc'the output will be two tokens: 'bbb' and 'ccc' (including the ' marks). With the same input but using group=1, the output would be: bbb and ccc (no ' marks)
NOTE: This Tokenizer does not output tokens that are of zero length.
<fieldType name="text_ptn" class="solr.TextField" positionIncrementGap="100"> <analyzer> <tokenizer class="solr.PatternTokenizerFactory" pattern="\'([^\']+)\'" group="1"/> </analyzer> </fieldType>
PatternTokenizer
Field Summary | |
---|---|
protected Map<String,String> |
args
The init args |
protected int |
group
|
static String |
GROUP
|
protected Version |
luceneMatchVersion
the luceneVersion arg |
protected Pattern |
pattern
|
static String |
PATTERN
|
Fields inherited from class org.apache.solr.analysis.BaseTokenizerFactory |
---|
log |
Constructor Summary | |
---|---|
PatternTokenizerFactory()
|
Method Summary | |
---|---|
protected void |
assureMatchVersion()
this method can be called in the TokenizerFactory.create(java.io.Reader)
or TokenFilterFactory.create(org.apache.lucene.analysis.TokenStream) methods,
to inform user, that for this factory a luceneMatchVersion is required |
Tokenizer |
create(Reader in)
Split the input using configured pattern |
Map<String,String> |
getArgs()
|
protected boolean |
getBoolean(String name,
boolean defaultVal)
|
protected boolean |
getBoolean(String name,
boolean defaultVal,
boolean useDefault)
|
protected int |
getInt(String name)
|
protected int |
getInt(String name,
int defaultVal)
|
protected int |
getInt(String name,
int defaultVal,
boolean useDefault)
|
protected CharArraySet |
getSnowballWordSet(ResourceLoader loader,
String wordFiles,
boolean ignoreCase)
same as getWordSet(ResourceLoader, String, boolean) ,
except the input is in snowball format. |
protected CharArraySet |
getWordSet(ResourceLoader loader,
String wordFiles,
boolean ignoreCase)
|
static List<Token> |
group(Matcher matcher,
String input,
int group)
Deprecated. |
void |
init(Map<String,String> args)
Require a configured pattern |
static List<Token> |
split(Matcher matcher,
String input)
Deprecated. |
protected void |
warnDeprecated(String message)
|
Methods inherited from class java.lang.Object |
---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Methods inherited from interface org.apache.solr.analysis.TokenizerFactory |
---|
getArgs |
Field Detail |
---|
public static final String PATTERN
public static final String GROUP
protected Pattern pattern
protected int group
protected Map<String,String> args
protected Version luceneMatchVersion
Constructor Detail |
---|
public PatternTokenizerFactory()
Method Detail |
---|
public void init(Map<String,String> args)
init
in interface TokenizerFactory
public Tokenizer create(Reader in)
@Deprecated public static List<Token> split(Matcher matcher, String input)
@Deprecated public static List<Token> group(Matcher matcher, String input, int group)
public Map<String,String> getArgs()
protected final void assureMatchVersion()
TokenizerFactory.create(java.io.Reader)
or TokenFilterFactory.create(org.apache.lucene.analysis.TokenStream)
methods,
to inform user, that for this factory a luceneMatchVersion
is required
protected final void warnDeprecated(String message)
protected int getInt(String name)
protected int getInt(String name, int defaultVal)
protected int getInt(String name, int defaultVal, boolean useDefault)
protected boolean getBoolean(String name, boolean defaultVal)
protected boolean getBoolean(String name, boolean defaultVal, boolean useDefault)
protected CharArraySet getWordSet(ResourceLoader loader, String wordFiles, boolean ignoreCase) throws IOException
IOException
protected CharArraySet getSnowballWordSet(ResourceLoader loader, String wordFiles, boolean ignoreCase) throws IOException
getWordSet(ResourceLoader, String, boolean)
,
except the input is in snowball format.
IOException
|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |