garnier

 

Function

Predicts protein secondary structure

Description

This is an implementation of the original Garnier Osguthorpe Robson algorithm (GOR I) for predicting protein secondary structure.

Secondary structure prediction is notoriously difficult to do accurately. The GOR I alogorithm is one of the first semi-successful methods.

The Garnier method is not regarded as the most accurate prediction, but is simple to calculate on most workstations.

The accuracy of any secondary structure prediction program is not much better than 70% to 80% at best. This is an early algorithm and will probably not predict with much better than about 65% accuracy.

The Web servers for PHD, DSC, and others are generally preferred.

Do not rely on this (or any other) program alone to make your predictions with. Use several programs and take a consensus of the results.

Usage

Here is a sample session with garnier.

% garnier
Input sequence: sw:amic_pseae
Output file [amic_pseae.garnier]: 

Command line arguments

   Mandatory qualifiers:
  [-sequencea]         seqall     Sequence database USA
  [-outfile]           report     Output report file name

   Optional qualifiers: (none)
   Advanced qualifiers:
   -idc                integer    idc param

   General qualifiers:
  -help                boolean    Report command line options. More
                                  information on associated and general
                                  qualifiers can be found with -help -verbose


Mandatory qualifiers Allowed values Default
[-sequencea]
(Parameter 1)
Sequence database USA Readable sequence(s) Required
[-outfile]
(Parameter 2)
Output report file name Report file  
Optional qualifiers Allowed values Default
(none)
Advanced qualifiers Allowed values Default
-idc idc param Integer from 0 to 6 0

The meaning and use of the parameter 'idc' is currently being investigated. The original author, Bill Pearson writes:

"In their paper, GOR mention that if you know something about the secondary structure content of the protein you are analyzing, you can do better in prediction. "idc" is an index into a set of arrays, dharr[] and dsarr[], which provide "decision constants" (dch, dcs), which are offsets that are applied to the weights for the helix and sheet (extend) terms. So, idc=0 says don't use the decision constant offsets, and idc=1 to 6 indicates that various combinations of dch,dcs offsets should be used. I don't remember what they are, but I must have gotten the values from their paper."

Input file format

Any protein sequence.

Output file format

The output is a standard EMBOSS report file.

The results can be output in one of several styles by using the command-line qualifier -rformat xxx, where 'xxx' is replaced by the name of the required format. The available format names are: embl, genbank, gff, pir, swiss, trace, listfile, dbmotif, diffseq, excel, feattable, motif, regions, seqtable, simple, srs, table, tagseq

See: http://www.uk.embnet.org/Software/EMBOSS/Themes/ReportFormats.html for further information on report formats.

By default garnier writes a 'tagseq' report file.

Here is the output from the example run.


######################################## # Program: garnier # Rundate: Mon Feb 11 13:42:25 2002 # Report_file: amic_pseae.garnier ######################################## #======================================= # # Sequence: AMIC_PSEAE from: 1 to: 384 # HitCount: 113 # # DCH = 0, DCS = 0 # # Please cite: # Garnier, Osguthorpe and Robson (1978) J. Mol. Biol. 120:97-120 # # # #======================================= . 10 . 20 . 30 . 40 . 50 GSHQERPLIGLLFSETGVTADIERSQRYGALLAVEQLNREGGVGGRPIET helix HHHHH HHHHH sheet EE EEEEE EE EEEE turns T TTTT TTTT coil CCCCC CCCCCC CC C CCCC . 60 . 70 . 80 . 90 . 100 LSQDPGGDPDRYRLCAEDFIRNRGVRFLVGCYMSHTRKAVMPVVERADAL helix HHHHHH HHHH H HHHHHH sheet E EEEE EEEE EEEE E turns TT TT T TTTTT TTT T T coil C CCC . 110 . 120 . 130 . 140 . 150 LCYPTPYEGFEYSPNIVYGGPAPNQNSAPLAAYLIRHYGERVVFIGSDYI helix HHH sheet EEE E EE E EEEE EEEEE turns T TTT TT T TT TT T TTTT coil CCC CC CCCCC CCC C C . 160 . 170 . 180 . 190 . 200 YPRESNHVMRHLYRQHGGTVLEEIYIPLYPSDDDVQRAVERIYQARADVV helix HHHH HHHHHHHHHHHHH sheet EEE EEEEEEE EEEE turns TTT TTT TTTT coil CC C CCCC CC . 210 . 220 . 230 . 240 . 250 FSTVVGTGTAELYRAIARRYGDGRRPPIASLTTSEAEVAKMESDVAEGQV helix HHHHHHH HHHHHHHHHHHHHHHHH sheet EEEE EE EEE E turns TTTTTT coil CCCCC CCC CC . 260 . 270 . 280 . 290 . 300 VVAPYFSSIDTAASRAFVQACHGFFPENATITAWAEAAYWQTLLLGRAAQ helix HHHHHHH HHHHHHHHHHHHH HHHH sheet EEEE EEE EE E turns TT TTT TT coil CC CCC C CCC . 310 . 320 . 330 . 340 . 350 AAGSWRVEDVQRHLYDICIDAPQGPVRVERQNNHSRLSSRIAEIDARGVF helix H HHHH HHH sheet EEEE EEEEE EEE EE turns TTTTTT T TT T TTT coil CCCCC C CCC CCC CCC . 360 . 370 . 380 QVRWQSPEPIRPDPYVVVHNLDDWSASMGGGALP helix sheet EE EEEEEEE E E turns TT TT TTT TTT

Data files

None.

Notes

The Garnier method is not regarded as the most accurate prediction, but is simple to calculate on most workstations.

The Web servers for PHD, DSC, and others are generally preferred.

Do not rely on this (or any other) program alone to make your predictions with. Use several programs and take a consensus of the results.

The 3D structure for the example sequence is known, although the 2D structure elements were not in the SwissProt feature table for release 38 when the test data was extracted.

DSSP shows:

 From     To   Structure
    9     13   E beta sheet
   21     39   H alpha helix
   50     54   E beta sheet
   60     72   H alpha helix
   78     81   E beta sheet
   85     97   H alpha helix
  101    104   E beta sheet
  117    119   E beta sheet
  128    136   H alpha helix
  142    148   E beta sheet
  151    166   H alpha helix
  170    177   E beta sheet
  183    196   H alpha helix
  200    204   E beta sheet
  208    221   H alpha helix
  229    231   E beta sheet
  236    239   H alpha helix
  244    247   H alpha helix
  251    254   E beta sheet
  263    273   H alpha helix
  284    303   H alpha helix
  308    315   H alpha helix
  320    322   E beta sheet
  325    329   E beta sheet
  336    337   E beta sheet
  341    345   E beta sheet
  351    356   E beta sheet

References

Garnier J, Osguthorpe DJ, Robson B Analysis of the accuracy and implications of simple methods for predicting the secondary structure of globular proteins. J Mol Biol 1978 Mar 25;120(1):97-120

Warnings

The accuracy of any secondary structure prediction program is not much better than 70% to 80% at best. This is an early algorithm and will probably not predict with much better than about 65% accuracy.

You are advised to use several of the latest Web-based prediction sites and combine them to make a consensus prediction.

Diagnostic Error Messages

None.

Exit status

It always exist with a status of 0.

Known bugs

None.

See also

Program nameDescription
helixturnhelixReport nucleic acid binding motifs
hmomentHydrophobic moment calculation
pepcoilPredicts coiled coil regions
pepnetDisplays proteins as a helical net
pepwheelShows protein sequences as helices
tmapDisplays membrane spanning regions

Author(s)

This program ('GARNIER') was originally written by William Pearson (wrp@virginia.edu) and released as part of his FASTA package.

This application was modified for inclusion in EMBOSS by Rodrigo Lopez (rls@ebi.ac.uk) European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, UK.

History

None.

Target users

This program is intended to be used by everyone and everything, from naive users to embedded scripts.

Comments