org.biojava.bio.search
Class KnuthMorrisPrattSearch

java.lang.Object
  extended by org.biojava.bio.search.KnuthMorrisPrattSearch

public final class KnuthMorrisPrattSearch
extends Object

An object to find exact subsequences within a sequence.

Reference: KNUTH D.E., MORRIS (Jr) J.H., PRATT V.R., 1977, Fast pattern matching in strings, SIAM Journal on Computing 6(1):323-350. USAGE:

When the object is constructed the findMatches() method would be called. This will return an int[] giving the offsets (ie the location of the first symbol of each match in the text). The getKMPNextTable() returns the table of border lengths used in the algorithm. This method is protected as it is unlikely it will be needed except for debugging.

The algorithm finds exact matches therefore ambiguity symbols will match only themselves. The class cannot perform regular expressions. The class operates on all alphabets thus if searching for a DNA pattern you should compile both the pattern and its reverse complement.

WARNING the behaviour of a pattern containing gaps is undefined. It's not recommended that you try it.

Copyright: Copyright (c) 2002

Company: AgResearch

Version:
1.0
Author:
Mark Schreiber

Constructor Summary
KnuthMorrisPrattSearch(SymbolList pattern)
          Constructs a KMP matcher to find exact occurances of pattern in text using the Knuth-Morris-Pratt algorithm.
 
Method Summary
 int[] findMatches(SymbolList text)
          This will return an int[] giving the offsets of the matches in text (ie the location of the first symbol of each match in the text).
protected  int[] getKmpNextTable()
          Returns the table of border lengths
 SymbolList getPattern()
           
static void main(String[] args)
          Demo and Test method
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

KnuthMorrisPrattSearch

public KnuthMorrisPrattSearch(SymbolList pattern)
Constructs a KMP matcher to find exact occurances of pattern in text using the Knuth-Morris-Pratt algorithm.

A new class should be constructed for each occurance of pattern.

Parameters:
pattern - the pattern to search for
Method Detail

findMatches

public int[] findMatches(SymbolList text)
This will return an int[] giving the offsets of the matches in text (ie the location of the first symbol of each match in the text).

Parameters:
text - the text or sequence to search for the pattern
Returns:
an int[] giving the offsets to the matches or a zero length array if there are no matches.

getKmpNextTable

protected int[] getKmpNextTable()
Returns the table of border lengths

Returns:
an int[] of border lenghts

getPattern

public SymbolList getPattern()
Returns:
the pattern being searched for

main

public static void main(String[] args)
                 throws Exception
Demo and Test method

Parameters:
args - no arguments required
Throws:
Exception - if the test fails