![]() |
supermatcher |
% supermatcher tembl:ec\* tembl:eclac -word 50 -sbegin2 101 -send2 -101 Finds a match of a large sequence against one or more sequences Gap opening penalty [10.0]: 3.0 Gap extension penalty [0.5]: Output alignment [eclac.supermatcher]:
Mandatory qualifiers: [-seqa] seqall Sequence database USA [-seqb] seqset Sequence set USA -gapopen float Gap opening penalty -gapextend float Gap extension penalty -outfile align Output alignment file name Optional qualifiers: -datafile matrixf This is the scoring matrix file used when comparing sequences. By default it is the file 'EBLOSUM62' (for proteins) or the file 'EDNAFULL' (for nucleic sequences). These files are found in the 'data' directory of the EMBOSS installation. -width integer Alignment width -wordlen integer word length for initial matching -errorfile outfile Error file to be written to Advanced qualifiers: (none) General qualifiers: -help boolean Report command line options. More information on associated and general qualifiers can be found with -help -verbose |
Mandatory qualifiers | Allowed values | Default | |
---|---|---|---|
[-seqa] (Parameter 1) |
Sequence database USA | Readable sequence(s) | Required |
[-seqb] (Parameter 2) |
Sequence set USA | Readable sequences | Required |
-gapopen | Gap opening penalty | Number from 1.000 to 100.000 | 10.0 for any sequence type |
-gapextend | Gap extension penalty | Number from 0.100 to 10.000 | 0.5 for any sequence type |
-outfile | Output alignment file name | Alignment file | |
Optional qualifiers | Allowed values | Default | |
-datafile | This is the scoring matrix file used when comparing sequences. By default it is the file 'EBLOSUM62' (for proteins) or the file 'EDNAFULL' (for nucleic sequences). These files are found in the 'data' directory of the EMBOSS installation. | Comparison matrix file in EMBOSS data path | EBLOSUM62 for protein EDNAFULL for DNA |
-width | Alignment width | Any integer value | 16 |
-wordlen | word length for initial matching | Integer 3 or more | 6 |
-errorfile | Error file to be written to | Output file | supermatcher.error |
Advanced qualifiers | Allowed values | Default | |
(none) |
The output is a standard EMBOSS alignment file.
The results can be output in one of several styles by using the command-line qualifier -aformat xxx, where 'xxx' is replaced by the name of the required format. Some of the alignment formats can cope with an unlimited number of sequences, while others are only for pairs of sequences.
The available multiple alignment format names are: unknown, multiple, simple, fasta, msf, trace, srs
The available pairwise alignment format names are: pair, markx0, markx1, markx2, markx3, markx10, srspair, score
See: http://www.uk.embnet.org/Software/EMBOSS/Themes/AlignFormats.html for further information on alignment formats.
The output from the example follows:
######################################## # Program: supermatcher # Rundate: Mon May 20 16:32:00 2002 # Report_file: eclac.supermatcher ######################################## #======================================= # # Aligned_sequences: 2 # 1: ECLAC # 2: ECLAC # Matrix: EDNAFULL # Gap_penalty: 3.0 # Extend_penalty: 0.5 # # Length: 7277 # Identity: 7277/7277 (100.0%) # Similarity: 7277/7277 (100.0%) # Gaps: 0/7277 ( 0.0%) # Score: 36385.0 # # #======================================= ECLAC 101 atgtcgcagagtatgccggtgtctcttatcagaccgtttcccgcgtggtg 150 |||||||||||||||||||||||||||||||||||||||||||||||||| ECLAC 101 atgtcgcagagtatgccggtgtctcttatcagaccgtttcccgcgtggtg 150 ECLAC 151 aaccaggccagccacgtttctgcgaaaacgcgggaaaaagtggaagcggc 200 |||||||||||||||||||||||||||||||||||||||||||||||||| ECLAC 151 aaccaggccagccacgtttctgcgaaaacgcgggaaaaagtggaagcggc 200 ECLAC 201 gatggcggagctgaattacattcccaaccgcgtggcacaacaactggcgg 250 |||||||||||||||||||||||||||||||||||||||||||||||||| ECLAC 201 gatggcggagctgaattacattcccaaccgcgtggcacaacaactggcgg 250 ECLAC 251 gcaaacagtcgttgctgattggcgttgccacctccagtctggccctgcac 300 |||||||||||||||||||||||||||||||||||||||||||||||||| ECLAC 251 gcaaacagtcgttgctgattggcgttgccacctccagtctggccctgcac 300 ECLAC 301 gcgccgtcgcaaattgtcgcggcgattaaatctcgcgccgatcaactggg 350 |||||||||||||||||||||||||||||||||||||||||||||||||| ECLAC 301 gcgccgtcgcaaattgtcgcggcgattaaatctcgcgccgatcaactggg 350 ECLAC 351 tgccagcgtggtggtgtcgatggtagaacgaagcggcgtcgaagcctgta 400 |||||||||||||||||||||||||||||||||||||||||||||||||| ECLAC 351 tgccagcgtggtggtgtcgatggtagaacgaagcggcgtcgaagcctgta 400 ECLAC 401 aagcggcggtgcacaatcttctcgcgcaacgcgtcagtgggctgatcatt 450 |||||||||||||||||||||||||||||||||||||||||||||||||| ECLAC 401 aagcggcggtgcacaatcttctcgcgcaacgcgtcagtgggctgatcatt 450 ECLAC 451 aactatccgctggatgaccaggatgccattgctgtggaagctgcctgcac 500 |||||||||||||||||||||||||||||||||||||||||||||||||| ECLAC 451 aactatccgctggatgaccaggatgccattgctgtggaagctgcctgcac 500 ECLAC 501 taatgttccggcgttatttcttgatgtctctgaccagacacccatcaaca 550 |||||||||||||||||||||||||||||||||||||||||||||||||| ECLAC 501 taatgttccggcgttatttcttgatgtctctgaccagacacccatcaaca 550 ........... etc. .............
The file 'supermatcher.error' will contain any errors that occured during the program. This may be that wordmatch could not find any matches hence no suitable start point is found for the smith-waterman calculation.
Because it does a Smith & Waterman alignment (albeit in a narrow region around the diagonal shown to be the 'best' by a word match), this program can use huge amounts of memory if the sequences are large.
Because the alignment is made within a narrow area each side of the 'best' diagonal, if there are sufficient indels between the two sequences, then the path of the Smith & Waterman alignment can wander outside of this area. Making the width larger can avoid this problem, but you then use more memory.
The longer the sequences and the wider the specified alignment width, the more memory will be used.
If the program terminates due to lack of memory you can try the following:
Run the UNIX command 'limit' to see if your stack or memory usage have been limited and if so, run 'unlimit', (e.g.: '% unlimit stacksize').
Program name | Description |
---|---|
matcher | Finds the best local alignments between two sequences |
seqmatchall | Does an all-against-all comparison of a set of sequences |
water | Smith-Waterman local alignment |
wordmatch | Finds all exact matches of a given size between 2 sequences |