stretcher

 

Function

Finds the best global alignment between two sequences

Description


 calculates a global alignment of two sequences

 Please cite: Myers and Miller, CABIOS (1989)
 version 2.0u. Modified for EMBOSS May 1999 

Usage

Here is a sample session with stretcher.

% stretcher tsw:hba_human tsw:hbb_human
Finds the best global alignment between two sequences
Output alignment [hba_human.stretcher]: 

Command line arguments

   Mandatory qualifiers:
  [-sequencea]         sequence   Sequence USA
  [-sequenceb]         sequence   Sequence USA
  [-outfile]           align      Output alignment file name

   Optional qualifiers:
   -datafile           matrix     This is the scoring matrix file used when
                                  comparing sequences. By default it is the
                                  file 'EBLOSUM62' (for proteins) or the file
                                  'EDNAFULL' (for nucleic sequences). These
                                  files are found in the 'data' directory of
                                  the EMBOSS installation.
   -gappenalty         integer    Gap penalty
   -gaplength          integer    Gap length penalty

   Advanced qualifiers: (none)
   General qualifiers:
  -help                boolean    Report command line options. More
                                  information on associated and general
                                  qualifiers can be found with -help -verbose


Mandatory qualifiers Allowed values Default
[-sequencea]
(Parameter 1)
Sequence USA Readable sequence Required
[-sequenceb]
(Parameter 2)
Sequence USA Readable sequence Required
[-outfile]
(Parameter 3)
Output alignment file name Alignment file  
Optional qualifiers Allowed values Default
-datafile This is the scoring matrix file used when comparing sequences. By default it is the file 'EBLOSUM62' (for proteins) or the file 'EDNAFULL' (for nucleic sequences). These files are found in the 'data' directory of the EMBOSS installation. Comparison matrix file in EMBOSS data path EBLOSUM62 for protein
EDNAFULL for DNA
-gappenalty Gap penalty Positive integer 12 for protein, 16 for nucleic
-gaplength Gap length penalty Positive integer 2 for protein, 4 for nucleic
Advanced qualifiers Allowed values Default
(none)

Input file format

Any 2 sequence USAs of the same type (DNA or protein).

Output file format

The output is a standard EMBOSS alignment file.

The results can be output in one of several styles by using the command-line qualifier -aformat xxx, where 'xxx' is replaced by the name of the required format. Some of the alignment formats can cope with an unlimited number of sequences, while others are only for pairs of sequences.

The available multiple alignment format names are: unknown, multiple, simple, fasta, msf, trace, srs

The available pairwise alignment format names are: pair, markx0, markx1, markx2, markx3, markx10, srspair, score

See: http://www.uk.embnet.org/Software/EMBOSS/Themes/AlignFormats.html for further information on alignment formats.

The output from the example follows:

########################################
# Program:  stretcher
# Rundate:  Mon May 20 16:25:11 2002
# Report_file: hba_human.stretcher
########################################
#=======================================
#
# Aligned_sequences: 2
# 1: HBA_HUMAN
# 2: HBB_HUMAN
# Matrix: EBLOSUM62
# Gap_penalty: 12
# Extend_penalty: 2
#
# Length: 148
# Identity:      64/148 (43.2%)
# Similarity:    89/148 (60.1%)
# Gaps:           9/148 ( 6.1%)
# Score: 272
#
#
#=======================================


                10        20        30        40
HBA_HU V-LSPADKTNVKAAWGKVGAHAGEYGAEALERMFLSFPTTKTYFPHF-DL
       : :.: .:. : : ::::  .. : :.::: :... .: :. .:  : ::
HBB_HU VHLTPEEKSAVTALWGKV--NVDEVGGEALGRLLVVYPWTQRFFESFGDL
               10          20        30        40

       50             60        70        80        90
HBA_HU SH-----GSAQVKGHGKKVADALTNAVAHVDDMPNALSALSDLHAHKLRV
       :      :. .::.:::::  :.....::.:.. .....::.::. ::.:
HBB_HU STPDAVMGNPKVKAHGKKVLGAFSDGLAHLDNLKGTFATLSELHCDKLHV
       50        60        70        80        90

           100       110       120       130       140
HBA_HU DPVNFKLLSHCLLVTLAAHLPAEFTPAVHASLDKFLASVSTVLTSKYR
       :: ::.::.. :. .:: :.  :::: :.:. .: .:.:...:. ::.
HBB_HU DPENFRLLGNVLVCVLAHHFGKEFTPPVQAAYQKVVAGVANALAHKYH
      100       110       120       130       140

#---------------------------------------
#---------------------------------------         

Data files

For protein sequences EBLOSUM62 is used for the substitution matrix. For nucleotide sequence, EDNAMAT is used. Others can be specified.

EMBOSS data files are distributed with the application and stored in the standard EMBOSS data directory, which is defined by EMBOSS environment variable EMBOSS_DATA.

Users can provide their own data files in their own directories. Project specific files can be put in the current directory, or for tidier directory listings in a subdirectory called ".embossdata". Files for all EMBOSS runs can be put in the user's home directory, or again in a subdirectory called ".embossdata".

The directories are searched in the following order:

Notes

None.

References

  1. Myers and Miller, CABIOS (1989)

Warnings

None.

Diagnostic Error Messages

None.

Exit status

It exits with a status of 0.

Known bugs

None.

See also

Program nameDescription
alignwrapAligns a set of sequences to a seed alignment
est2genomeAlign EST and genomic DNA sequences
needleNeedleman-Wunsch global alignment

Author(s)

This application was modified for inclusion in EMBOSS by Ian Longden (il@sanger.ac.uk) Informatics Division, The Sanger Centre, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SA, UK.

History

 Completed 13th May 1999.

Target users

This program is intended to be used by everyone and everything, from naive users to embedded scripts.

Comments