![]() |
wordmatch |
This program takes two sequences and finds regions where they are identical. These regions are reported in the output file (and optionally) in GFF (Gene Feature Format) files.
It will not find identical regions smaller than the specified wordsize.
% wordmatch tsw:hba_human tsw:hbb_human Finds all exact matches of a given size between 2 sequences Word size [4]: Output alignment [hba_human.wordmatch]:
Mandatory qualifiers: [-asequence] sequence Sequence USA [-bsequence] sequence Sequence USA -wordsize integer Word size [-outfile] align Output alignment file name Optional qualifiers: (none) Advanced qualifiers: -afeatout featout File for output of normal tab delimited GFF features -bfeatout featout File for output of normal tab delimited GFF features General qualifiers: -help boolean Report command line options. More information on associated and general qualifiers can be found with -help -verbose |
Mandatory qualifiers | Allowed values | Default | |
---|---|---|---|
[-asequence] (Parameter 1) |
Sequence USA | Readable sequence | Required |
[-bsequence] (Parameter 2) |
Sequence USA | Readable sequence | Required |
-wordsize | Word size | Integer 2 or more | 4 |
[-outfile] (Parameter 3) |
Output alignment file name | Alignment file | |
Optional qualifiers | Allowed values | Default | |
(none) | |||
Advanced qualifiers | Allowed values | Default | |
-afeatout | File for output of normal tab delimited GFF features | Writeable feature table | unknown.gff |
-bfeatout | File for output of normal tab delimited GFF features | Writeable feature table | unknown.gff |
The output is a standard EMBOSS alignment file.
The results can be output in one of several styles by using the command-line qualifier -aformat xxx, where 'xxx' is replaced by the name of the required format. Some of the alignment formats can cope with an unlimited number of sequences, while others are only for pairs of sequences.
The available multiple alignment format names are: unknown, multiple, simple, fasta, msf, trace, srs
The available pairwise alignment format names are: pair, markx0, markx1, markx2, markx3, markx10, srspair, score
See: http://www.uk.embnet.org/Software/EMBOSS/Themes/AlignFormats.html for further information on alignment formats.
The file produced in the above example is:
######################################## # Program: wordmatch # Rundate: Mon May 20 16:36:46 2002 # Report_file: hba_human.wordmatch ######################################## #======================================= # # Aligned_sequences: 2 # 1: HBA_HUMAN # 2: HBB_HUMAN #======================================= 5 HBA_HUMAN 58..62 HBB_HUMAN 63..67 4 HBA_HUMAN 14..17 HBB_HUMAN 15..18 4 HBA_HUMAN 116..119 HBB_HUMAN 121..124 #--------------------------------------- #---------------------------------------
The normal 'report' header is output. It contains the details of the program run and the input sequences.
The data lines consist of five columns separated by spaces or TAB characters. Each line contains the information on one identical region. The first column is the length of the match. The second column is the name of the first sequence. The third column is the start and end position of the match. The next two columns are the name and positions of the second sequence.
Program name | Description |
---|---|
matcher | Finds the best local alignments between two sequences |
seqmatchall | Does an all-against-all comparison of a set of sequences |
supermatcher | Finds a match of a large sequence against one or more sequences |
water | Smith-Waterman local alignment |