plotcon

 

Function

Plots the quality of conservation of a sequence alignment

Description

Displays a graphical representation of the similarity along a set of aligned sequences.

The similarity is calculated by moving a window of a specified length along the aligned sequences. Within the window, the similarity of any one position is taken to be the average of all the possible pairwise scores of the bases or residues at that position. The pairwise scores are taken from the specified similarity matrix. The average of the position similarities within the window is plotted.

The program is useful for determining where the quality of alignments is good or bad.

Usage

Here is a sample session with plotcon:

% plotcon -sformat msf alignment.msf

Command line arguments

   Mandatory qualifiers (* if not always prompted):
  [-msf]               seqset     File containing a sequence alignment
   -winsize            integer    Number of columns to average alignment
                                  quality over. The larger this value is, the
                                  smoother the plot will be.
*  -graph              xygraph    Graph type
*  -outfile            outfile    Display as data

   Optional qualifiers:
   -scorefile          matrix     This is the scoring matrix file used when
                                  comparing sequences. By default it is the
                                  file 'EBLOSUM62' (for proteins) or the file
                                  'EDNAFULL' (for nucleic sequences). These
                                  files are found in the 'data' directory of
                                  the EMBOSS installation.

   Advanced qualifiers:
   -data               boolean    Output the match data to a file instead of
                                  plotting it

   General qualifiers:
  -help                boolean    Report command line options. More
                                  information on associated and general
                                  qualifiers can be found with -help -verbose


Mandatory qualifiers Allowed values Default
[-msf]
(Parameter 1)
File containing a sequence alignment Readable sequences Required
-winsize Number of columns to average alignment quality over. The larger this value is, the smoother the plot will be. Any integer value 4
-graph Graph type EMBOSS has a list of known devices, including postscript, ps, hpgl, hp7470, hp7580, meta, colourps, cps, xwindows, x11, tektronics, tekt, tek4107t, tek, none, null, text, data, xterm, png EMBOSS_GRAPHICS value, or x11
-outfile Display as data Output file <sequence>.plotcon
Optional qualifiers Allowed values Default
-scorefile This is the scoring matrix file used when comparing sequences. By default it is the file 'EBLOSUM62' (for proteins) or the file 'EDNAFULL' (for nucleic sequences). These files are found in the 'data' directory of the EMBOSS installation. Comparison matrix file in EMBOSS data path EBLOSUM62 for protein
EDNAFULL for DNA
Advanced qualifiers Allowed values Default
-data Output the match data to a file instead of plotting it Yes/No No

Input file format

A set of gapped, aligned sequences.

Output file format

A graph of the quality of the alignment is plotted.

Data files

It reads in the specified similarity matrix.

EMBOSS data files are distributed with the application and stored in the standard EMBOSS data directory, which is defined by the EMBOSS environment variable EMBOSS_DATA.

To see the available EMBOSS data files, run:

% embossdata -showall

To fetch one of the data files (for example 'Exxx.dat') into your current directory for you to inspect or modify, run:


% embossdata -fetch -file Exxx.dat

Users can provide their own data files in their own directories. Project specific files can be put in the current directory, or for tidier directory listings in a subdirectory called ".embossdata". Files for all EMBOSS runs can be put in the user's home directory, or again in a subdirectory called ".embossdata".

The directories are searched in the following order:

Notes

None.

References

None.

Warnings

If you give it a set of unaligned sequences, it will plot the (poor!) quality of these as if they were aligned.

Diagnostic Error Messages

None.

Exit status

It always exits with status 0.

Known bugs

None.

See also

Program nameDescription
emmaMultiple alignment program - interface to ClustalW program
infoalignInformation on a multiple sequence alignment
prettyplotDisplays aligned sequences, with colouring and boxing
showalignDisplays a multiple sequence alignment
tranalignAlign nucleic coding regions given the aligned proteins

Author(s)

This application was written by Tim Carver (tcarver@hgmp.mrc.ac.uk)

History

Written (Sept 2000) - Tim Carver.

Target users

This program is intended to be used by everyone and everything, from naive users to embedded scripts.

Comments