![]() |
tranalign |
tranalign is a simple program that allows you to produce aligned cDNA sequences from aligned protein sequences. This can be very useful for phylogeny programs, e.g. in PHYLIP - dnadist, dnapars, dnaml, etc. In general, it is better to use protein sequences for multiple alignments, but to use DNA sequences for phylogeny. This can be time consuming when there are gaps in the aligned protein sequences.
tranalign takes a set of (unaligned) nucleic sequences and a set of aligned protein sequences. It reads the first nucleic sequence and the first protein sequence, translates the nucleic sequence in each of the three forward frames, compares the protein sequence to the translated nucleic sequence to find the protein coding region, and then writes out the nucleic sequence that encoded the protein.
The sequences must be in the same order in both sets of sequences. A common problem you should be aware of is that some alignment program (including clustalw/emma) will re-order the aligned sequences to group similar sequences together.
The protein library may include '-' characters to specify alignments. Each '-' character in the protein library is ignored during the sequence comparison but replaced by '---' in the nucleic sequence output to form the aligned nucleic sequences.
tranalign finds the coding regions for contiguous sequences only. It will not splice together different exons to produce a coding sequence. You should therefore use either mRNA sequences, or nucleic sequences which you have constructed to hold a contiguous coding region (maybe using extractseq or yank and union?).
% tranalign tranalign.seq tranalign.pep tranalign2.seq
Mandatory qualifiers: [-nsequence] seqall Nucleotide sequences to be aligned [-psequence] seqset Protein sequence alignment [-outseq] seqoutset Output sequence set USA Optional qualifiers: -table menu Code to use Advanced qualifiers: (none) General qualifiers: -help boolean Report command line options. More information on associated and general qualifiers can be found with -help -verbose |
Mandatory qualifiers | Allowed values | Default | |||||||||||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
[-nsequence] (Parameter 1) |
Nucleotide sequences to be aligned | Readable sequence(s) | Required | ||||||||||||||||||||||||||||||||||||
[-psequence] (Parameter 2) |
Protein sequence alignment | Readable sequences | Required | ||||||||||||||||||||||||||||||||||||
[-outseq] (Parameter 3) |
Output sequence set USA | Writeable sequences | <sequence>.format | ||||||||||||||||||||||||||||||||||||
Optional qualifiers | Allowed values | Default | |||||||||||||||||||||||||||||||||||||
-table | Code to use |
|
0 | ||||||||||||||||||||||||||||||||||||
Advanced qualifiers | Allowed values | Default | |||||||||||||||||||||||||||||||||||||
(none) |
The ID names of the nucleic acid and protein sequences are NOT checked to see if they correspond to each other. They can have any names.
There must be at least as many protein sequences as nucleic acid sequence - extra protein sequences are ignored.
Each of the nucleic acid sequences must have a corresponding protein sequence which is derived from the coding region of that nucleic acid sequence. The two sets of sequences must be in the same order.
"Guide protein sequence xxx not found in nucleic sequence xxx" - the region of the nucleic sequence which codes for the protein was not found. The coding region in the nucleic acid sequence must be a single contiguous sequence. The protein sequence might not be the corresponding one for this nucleic acid sequence if they are out of order.
Program name | Description |
---|---|
emma | Multiple alignment program - interface to ClustalW program |
infoalign | Information on a multiple sequence alignment |
plotcon | Plots the quality of conservation of a sequence alignment |
prettyplot | Displays aligned sequences, with colouring and boxing |
showalign | Displays a multiple sequence alignment |
tranalign was written in EMBOSS code by Gary Williams using the description of mrtrans as a guide (gwilliam@hgmp.mrc.ac.uk)
tranalign written (March 2002) - Gary Williams