backtranseq

 

Function

Back translate a protein sequence

Description

backtranseq takes a protein sequence and makes a best estimate of the likely nucleic acid sequence it could have come from. It does this by using a codon frequency table. For each amino acid, the corresponding most frequently occuring codon is used in the construction of the nucleic acid sequence.

Codon usage table name

backtranseq reads in a data file containing the codon frequency tables. The default codon frequency table is 'Ehum.cut' - the human codon frequency table. It is important to use a codon frequency table that is appropriate for the species that your protein comes from. See the Data Files section below for more details on these files.

Usage

Here is a sample session with backtranseq. Note that this is a human protein and so the default (human) codon frequency file is used (i.e. is not specified).

% backtranseq
Back translate a protein sequence
Input sequence: sw:opsd_human
Output sequence [opsd_human.fasta]: 

Here is a session using a drosophila sequence and codon table:

% backtranseq -cfile Edrosophila.cut
Back translate a protein sequence
Input sequence: sw:ach2_drome
Output sequence [ach2_drome.fasta]: 

Command line arguments

   Mandatory qualifiers:
  [-sequence]          sequence   Sequence USA
  [-outfile]           seqout     Output sequence USA

   Optional qualifiers:
   -cfile              codon      Codon usage table name

   Advanced qualifiers: (none)
   General qualifiers:
  -help                boolean    Report command line options. More
                                  information on associated and general
                                  qualifiers can be found with -help -verbose


Mandatory qualifiers Allowed values Default
[-sequence]
(Parameter 1)
Sequence USA Readable sequence Required
[-outfile]
(Parameter 2)
Output sequence USA Writeable sequence <sequence>.format
Optional qualifiers Allowed values Default
-cfile Codon usage table name Codon usage file in EMBOSS data path Ehum.cut
Advanced qualifiers Allowed values Default
(none)

Input file format

Any DNA sequence USA.

Output file format

The output is a nucleotide sequence containing the most favoured back translation of the specified protein, and using the specified translation table (which defaults to human).

The output from the backtranslation of the human protein sw:opsd_human follows:


% more opsd_human.fasta
>OPSD_HUMAN P08100 RHODOPSIN.
ATGAACGGCACCGAGGGCCCCAACTTCTACGTGCCCTTCAGCAACGCCACCGGCGTGGTG
AGAAGCCCCTTCGAGTACCCCCAGTACTACCTGGCCGAGCCCTGGCAGTTCAGCATGCTG
GCCGCCTACATGTTCCTGCTGATCGTGCTGGGCTTCCCCATCAACTTCCTGACCCTGTAC
GTGACCGTGCAGCACAAGAAGCTGAGAACCCCCCTGAACTACATCCTGCTGAACCTGGCC
GTGGCCGACCTGTTCATGGTGCTGGGCGGCTTCACCAGCACCCTGTACACCAGCCTGCAC
GGCTACTTCGTGTTCGGCCCCACCGGCTGCAACCTGGAGGGCTTCTTCGCCACCCTGGGC
GGCGAGATCGCCCTGTGGAGCCTGGTGGTGCTGGCCATCGAGAGATACGTGGTGGTGTGC
AAGCCCATGAGCAACTTCAGATTCGGCGAGAACCACGCCATCATGGGCGTGGCCTTCACC
TGGGTGATGGCCCTGGCCTGCGCCGCCCCCCCCCTGGCCGGCTGGAGCAGATACATCCCC
GAGGGCCTGCAGTGCAGCTGCGGCATCGACTACTACACCCTGAAGCCCGAGGTGAACAAC
GAGAGCTTCGTGATCTACATGTTCGTGGTGCACTTCACCATCCCCATGATCATCATCTTC
TTCTGCTACGGCCAGCTGGTGTTCACCGTGAAGGAGGCCGCCGCCCAGCAGCAGGAGAGC
GCCACCACCCAGAAGGCCGAGAAGGAGGTGACCAGAATGGTGATCATCATGGTGATCGCC
TTCCTGATCTGCTGGGTGCCCTACGCCAGCGTGGCCTTCTACATCTTCACCCACCAGGGC
AGCAACTTCGGCCCCATCTTCATGACCATCCCCGCCTTCTTCGCCAAGAGCGCCGCCATC
TACAACCCCGTGATCTACATCATGATGAACAAGCAGTTCAGAAACTGCATGCTGACCACC
ATCTGCTGCGGCAAGAACCCCCTGGGCGACGACGAGGCCAGCGCCACCGTGAGCAAGACC
GAGACCAGCCAGGTGGCCCCCGCC

Data files

The codon usage table is read by default from "Ehum.cut" in the 'data/CODONS' directory of the EMBOSS distribution. If the name of a codon usage file is specified on the command line, then this file will first be searched for in the current directory and then in the 'data/CODONS' directory of the EMBOSS distribution.

To see the available EMBOSS codon usage files, run:


% embossdata -showall

To fetch one of the codon usage tables (for example 'Emus.cut') into your current directory for you to inspect or modify, run:


% embossdata -fetch -file Emus.cut

Notes

None.

References

None.

Warnings

None.

Diagnostic Error Messages

"Corrupt codon index file" - the codon usage file is incomplete or empty.

"The file 'drosoph.cut' does not exist" - the codon usage file cannot be opened.

Exit status

This program always exits with a status of 0, unless the codon usage table cannot be opened.

Known bugs

None.

See also

Program nameDescription
chargeProtein charge plot
checktransReports STOP codons and ORF statistics of a protein sequence
coderetExtract CDS, mRNA and translations from feature tables
compseqCounts the composition of dimer/trimer/etc words in a sequence
emowseProtein identification by mass spectrometry
freakResidue/base frequency table or plot
iepCalculates the isoelectric point of a protein
mwcontamShows molwts that match across a set of files
mwfilterFilter noisy molwts from mass spec output
octanolDisplays protein hydropathy
pepinfoPlots simple amino acid properties in parallel
pepstatsProtein statistics
pepwindowDisplays protein hydropathy
pepwindowallDisplays protein hydropathy of a set of sequences
plotorfPlot potential open reading frames
prettyseqOutput sequence with translated ranges
remapDisplay a sequence with restriction cut sites, translation etc
showorfPretty output of DNA translations
showseqDisplay a sequence with features, translation etc
transeqTranslate nucleic acid sequences

Author(s)

This application was written by Alan Bleasby (ableasby@hgmp.mrc.ac.uk)

History

Completed 6 Oct 1999

Target users

This program is intended to be used by everyone and everything, from naive users to embedded scripts.

Comments