![]() |
etandem |
Input sequences are converted into ACGT or N (so ambiguity codes are ignored).
The score is +1 for a match, -1 for a mismatch.
The first copy of a repeat is ignored.
The highest score is kept for each start position and repeat size.
The lowest score to be reported is set by the threshold score. The threshold score can be set on the command-line using the -threshold qualifier, the default is 20. For perfect repeats, the score is the length of the repeat (except for the first copy). Reduce the threshold score a little if you wish to to allow mismatches. Each mismatch scores -1 instead of +1 so it scores 2 less than a perfect match of the same number of bases.
Running with a wide range of repeat sizes is inefficient. That is why equicktandem was written - to give a rapid estimate of the major repeat sizes.
% etandem Input sequence: embl:hhtetra Output file [hhtetra.tan]: Minimum repeat size [10]: 6 Maximum repeat size [6]:
Mandatory qualifiers: [-sequence] sequence Sequence USA -minrepeat integer Minimum repeat size -maxrepeat integer Maximum repeat size [-outfile] report Output report file name Optional qualifiers: (none) Advanced qualifiers: -threshold integer Threshold score -mismatch boolean Allow N as a mismatch -uniform boolean Allow uniform consensus -origfile outfile Output file name General qualifiers: -help boolean Report command line options. More information on associated and general qualifiers can be found with -help -verbose |
Mandatory qualifiers | Allowed values | Default | |
---|---|---|---|
[-sequence] (Parameter 1) |
Sequence USA | Readable sequence | Required |
-minrepeat | Minimum repeat size | Integer, 2 or higher | 10 |
-maxrepeat | Maximum repeat size | Integer, same as -minrepeat or higher | Same as -minrepeat |
[-outfile] (Parameter 2) |
Output report file name | Report file | |
Optional qualifiers | Allowed values | Default | |
(none) | |||
Advanced qualifiers | Allowed values | Default | |
-threshold | Threshold score | Any integer value | 20 |
-mismatch | Allow N as a mismatch | Yes/No | No |
-uniform | Allow uniform consensus | Yes/No | No |
-origfile | Output file name | Output file | <sequence>.etandem |
The output is a standard EMBOSS report file.
The results can be output in one of several styles by using the command-line qualifier -rformat xxx, where 'xxx' is replaced by the name of the required format. The available format names are: embl, genbank, gff, pir, swiss, trace, listfile, dbmotif, diffseq, excel, feattable, motif, regions, seqtable, simple, srs, table, tagseq
See: http://www.uk.embnet.org/Software/EMBOSS/Themes/ReportFormats.html for further information on report formats.
By default etandem writes a 'table' report file.
The output from the above example is:
######################################## # Program: etandem # Rundate: Thu Apr 11 13:31:10 2002 # Report_file: stdout ######################################## #======================================= # # Sequence: HHTETRA from: 1 to: 1272 # HitCount: 5 # # Threshold: 20 # Minrepeat: 6 # Maxrepeat: 6 # Mismatch: No # Uniform: No # #======================================= Start End Score Size Count Identity Consensus 793 936 120 6 24 93.8 acccta 283 420 90 6 23 84.8 taaccc 432 485 38 6 9 90.7 ccctaa 494 529 26 6 6 94.4 ccctaa 568 597 24 6 5 100.0 aaccct #--------------------------------------- #---------------------------------------
Program name | Description |
---|---|
einverted | Finds DNA inverted repeats |
equicktandem | Finds tandem repeats |
palindrome | Looks for inverted repeats in a nucleotide sequence |
Running with a wide range of repeat sizes is inefficient. That is why equicktandem was written - to give a rapid estimate of the major repeat sizes.
This application was modified for inclusion in EMBOSS by Peter Rice (pmr@sanger.ac.uk) Informatics Division, The Sanger Centre, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SA, UK.