antigenic

 

Function

Finds antigenic sites in proteins

Description

Antigenic predicts potentially antigenic regions of a protein sequence, using the method of Kolaskar and Tongaonkar.

Analysis of data from experimentally determined antigenic sites on proteins has revealed that the hydrophobic residues Cys, Leu and Val, if they occur on the surface of a protein, are more likely to be a part of antigenic sites. A semi-empirical method which makes use of physicochemical properties of amino acid residues and their frequencies of occurrence in experimentally known segmental epitopes was developed by Kolaskar and Tongaonkar to predict antigenic determinants on proteins. Application of this method to a large number of proteins has shown that their method can predict antigenic determinants with about 75% accuracy which is better than most of the known methods. This method is based on a single parameter and thus very simple to use.

Usage

Here is a sample session with antigenic.

% antigenic
Finds antigenic sites in proteins
Input sequence: sw:act1_fugru
Minimum length [6]: 
Output file [act1_fugru.antigenic]: 

Command line arguments

   Mandatory qualifiers:
  [-sequence]          seqall     Sequence database USA
   -minlen             integer    Minimum length
  [-outfile]           report     Output report file name

   Optional qualifiers: (none)
   Advanced qualifiers: (none)
   General qualifiers:
  -help                boolean    Report command line options. More
                                  information on associated and general
                                  qualifiers can be found with -help -verbose


Mandatory qualifiers Allowed values Default
[-sequence]
(Parameter 1)
Sequence database USA Readable sequence(s) Required
-minlen Minimum length Integer from 1 to 50 6
[-outfile]
(Parameter 2)
Output report file name Report file  
Optional qualifiers Allowed values Default
(none)
Advanced qualifiers Allowed values Default
(none)

Input file format

The input sequence can be one or more protein sequences.

Output file format

The output is a standard EMBOSS report file.

The results can be output in one of several styles by using the command-line qualifier -rformat xxx, where 'xxx' is replaced by the name of the required format. The available format names are: embl, genbank, gff, pir, swiss, trace, listfile, dbmotif, diffseq, excel, feattable, motif, regions, seqtable, simple, srs, table, tagseq

See: http://www.uk.embnet.org/Software/EMBOSS/Themes/ReportFormats.html for further information on report formats.

By default antigenic writes a 'motif' report file.

The output from the above example is:


########################################
# Program: antigenic
# Rundate: Mon Feb 11 12:01:10 2002
# Report_file: act1_fugru.antigenic
########################################

#=======================================
#
# Sequence: ACT1_FUGRU     from: 1   to: 375
# HitCount: 18
#=======================================

Max_score_pos at "*"

(1) Score 1.207 length 9 at residues 214->222
               *
 Sequence: EKLCYVALD
           |       |
         214       222

(2) Score 1.187 length 15 at residues 131->145
                 *
 Sequence: AMYVAIQAVLSLYAS
           |             |
         131             145

(3) Score 1.166 length 8 at residues 5->12
              *
 Sequence: IAALVVDN
           |      |
           5      12

(4) Score 1.164 length 12 at residues 27->38
                *
 Sequence: PRAVFPSIVGRP
           |          |
          27          38

(5) Score 1.136 length 24 at residues 160->183
                        *
 Sequence: THTVPIYEGYALPHAILRLDLAGR
           |                      |
         160                      183

(6) Score 1.135 length 6 at residues 367->372
                *
 Sequence: PSIVHR
           |    |
         367    372

(7) Score 1.116 length 16 at residues 93->108
                     *
 Sequence: ELRVAPEEHPVLLTEA
           |              |
          93              108

(8) Score 1.113 length 7 at residues 295->301
            *
 Sequence: ANTVLSG
           |     |
         295     301

(9) Score 1.110 length 11 at residues 256->266
                   *
 Sequence: RCPEALFQPSF
           |         |
         256         266

(10) Score 1.107 length 17 at residues 336->352
                      *
 Sequence: KYSVWIGGSILASLSTF
           |               |
         336               352

(11) Score 1.102 length 15 at residues 62->76        
                 *
 Sequence: RGILTLKYPIEHGIV
           |             |
          62             76

(12) Score 1.086 length 19 at residues 232->250
                        *
 Sequence: SSSSLEKSYELPDGQVITI
           |                 |
         232                 250

(13) Score 1.083 length 6 at residues 327->332
              *
 Sequence: IKIIAP
           |    |
         327    332

(14) Score 1.074 length 7 at residues 317->323
              *
 Sequence: ITALAPS
           |     |
         317     323

(15) Score 1.068 length 7 at residues 186->192
                *
 Sequence: TDYLMKI
           |     |
         186     192

(16) Score 1.066 length 7 at residues 40->46
              *
 Sequence: HQGVMVG
           |     |
          40     46

(17) Score 1.045 length 7 at residues 269->275
           *
 Sequence: MESCGIH
           |     |
         269     275

(18) Score 1.034 length 7 at residues 51->57
            *
 Sequence: DSYVGDE
           |     |
          51     57


#---------------------------------------
#---------------------------------------            

By using the '-rformat gff' qualifier, a GFF file of the predicted regions can be produced. For example:

% antigenic -rformat gff
Finds antigenic sites in proteins
Input sequence(s): sw:act1_fugru
Minimum length [6]: 
Output file [act1_fugru.antigenic]: 

% more act1_fugru.antigenic
##gff-version 2.0
##date 2002-02-11
##Type Protein ACT1_FUGRU
ACT1_FUGRU      antigenic       site    214     222     1.207   +       .	Sequence "ACT1_FUGRU.1" ; note "*pos 218"
ACT1_FUGRU      antigenic       site    131     145     1.187   +       .	Sequence "ACT1_FUGRU.2" ; note "*pos 137"
ACT1_FUGRU      antigenic       site    5       12      1.166   +       .	Sequence "ACT1_FUGRU.3" ; note "*pos 8"
ACT1_FUGRU      antigenic       site    27      38      1.164   +       .	Sequence "ACT1_FUGRU.4" ; note "*pos 32"
ACT1_FUGRU      antigenic       site    160     183     1.136   +       .	Sequence "ACT1_FUGRU.5" ; note "*pos 173"
ACT1_FUGRU      antigenic       site    367     372     1.135   +       .	Sequence "ACT1_FUGRU.6" ; note "*pos 372"
ACT1_FUGRU      antigenic       site    93      108     1.116   +       .	Sequence "ACT1_FUGRU.7" ; note "*pos 103"
ACT1_FUGRU      antigenic       site    295     301     1.113   +       .	Sequence "ACT1_FUGRU.8" ; note "*pos 296"
ACT1_FUGRU      antigenic       site    256     266     1.110   +       .	Sequence "ACT1_FUGRU.9" ; note "*pos 264"
ACT1_FUGRU      antigenic       site    336     352     1.107   +       .	Sequence "ACT1_FUGRU.10" ; note "*pos 347"
ACT1_FUGRU      antigenic       site    62      76      1.102   +       .	Sequence "ACT1_FUGRU.11" ; note "*pos 68"
ACT1_FUGRU      antigenic       site    232     250     1.086   +       .	Sequence "ACT1_FUGRU.12" ; note "*pos 245"
ACT1_FUGRU      antigenic       site    327     332     1.083   +       .	Sequence "ACT1_FUGRU.13" ; note "*pos 330"
ACT1_FUGRU      antigenic       site    317     323     1.074   +       .	Sequence "ACT1_FUGRU.14" ; note "*pos 320"
ACT1_FUGRU      antigenic       site    186     192     1.068   +       .	Sequence "ACT1_FUGRU.15" ; note "*pos 191"
ACT1_FUGRU      antigenic       site    40      46      1.066   +       .	Sequence "ACT1_FUGRU.16" ; note "*pos 43"
ACT1_FUGRU      antigenic       site    269     275     1.045   +       .	Sequence "ACT1_FUGRU.17" ; note "*pos 269"
ACT1_FUGRU      antigenic       site    51      57      1.034   +       .	Sequence "ACT1_FUGRU.18" ; note "*pos 52"

Data files

Antigenic uses a data file called Eantigenic.dat.

EMBOSS data files are distributed with the application and stored in the standard EMBOSS data directory, which is defined by EMBOSS environment variable EMBOSS_DATA.

Users can provide their own data files in their own directories. Project specific files can be put in the current directory, or for tidier directory listings in a subdirectory called ".embossdata". Files for all EMBOSS runs can be put in the user's home directory, or again in a subdirectory called ".embossdata".

The directories are searched in the following order:

Here is the default Eantigenic.dat file:


#                                               Antigenic  Surface  Antigenic
# Amino     -- Occurrence of amino acids in --   frequency frequency propensity
# Acid       Epitopes      Surface     Protein   f(Ag)    f(s)      A(p)
  A             135          328         524     0.065    0.061     1.064
  C              53           97         186     0.026    0.018     1.412
  D             118          352         414     0.057    0.066     0.866
  E             132          401         499     0.064    0.075     0.851
  F              76          180         365     0.037    0.034     1.091
  G             116          343         487     0.056    0.064     0.874
  H              59          138         191     0.029    0.026     1.105
  I              86          193         437     0.042    0.036     1.152
  K             158          439         523     0.076    0.082     0.930
  L             149          308         684     0.072    0.058     1.250
  M              23           72         152     0.011    0.013     0.826
  N              94          313         407     0.045    0.058     0.776
  P             135          328         411     0.065    0.061     1.064
  Q              99          252         332     0.048    0.047     1.015
  R             106          314         394     0.051    0.058     0.873
  S             168          429         553     0.081    0.080     1.012
  T             141          401         522     0.068    0.075     0.909
  V             128          239         515     0.062    0.045     1.383
  W              19           55         103     0.009    0.010     0.893
  Y              71          158         245     0.034    0.029     1.161
Total          2066         5340        7944

Notes

References

  1. Kolaskar,AS and Tongaonkar,PC (1990). A semi-empirical method for prediction of antigenic determinants on protein antigens. FEBS Letters 276: 172-174.
  2. Parker,JMR, Guo,D and Hodges,RS (1986). Biochemistry 25: 5425-5432.

Warnings

The program will warn you if the sequence is not a protein or has ambiguity codes.

Diagnostic Error Messages

Exit status

It exits with status 0, unless a region is badly constructed.

Known bugs

None.

See also

Program nameDescription
digestProtein proteolytic enzyme or reagent cleavage digest
fuzzproProtein pattern search
fuzztranProtein pattern search after translation
helixturnhelixReport nucleic acid binding motifs
oddcompFinds protein sequence regions with a biased composition
patmatdbSearch a protein sequence with a motif
patmatmotifsSearch a PROSITE motif database with a protein sequence
pepcoilPredicts coiled coil regions
pregRegular expression search of a protein sequence
pscanScans proteins using PRINTS
sigcleaveReports protein signal cleavage sites

Author(s)

This application was written by Alan Bleasby (ableasby@hgmp.mrc.ac.uk)

Original program "ANTIGENIC" by Peter Rice (EGCG 1991)

History

Completed 9th March 1999

Target users

This program is intended to be used by everyone and everything, from naive users to embedded scripts.

Comments