prosextract

 

Function

Builds the PROSITE motif database for patmatmotifs to search

Description

Takes the IDentity, ACcession number and motif PAttern line contents from prosite entries. Also converts the PAttern into a regular expression and writes these four pieces to an output file - defaulted to be called 'prosite.lines'.

Usage

Here is a sample session with prosextract.

% prosextract
Extracting ID, AC & PA lines from the Prosite motif Database.
Enter name of prosite directory: data/PROSITE
	
% more prosite.lines
ASN_GLYCOSYLATION PS00001
N-glycosylation
N-{P}-[ST]-{P}
^N[^P][ST][^P]

CAMP_PHOSPHO_SITE PS00004
cAMP-
[RK](2)-x-[ST]
^[RK]{2}[^BJOUXZ][ST]

PKC_PHOSPHO_SITE PS00005
Protein
[ST]-x-[RK]
^[ST][^BJOUXZ][RK]

CK2_PHOSPHO_SITE PS00006
Casein
[ST]-x(2)-[DE]
^[ST][^BJOUXZ]{2}[DE]

etc.......

The output files named after the prosite accession numbers can now also be seen in the prosite directory. This files are automatically created after prosextract is run.

Command line arguments

   Mandatory qualifiers:
  [-infdat]            string     Enter name of prosite directory

   Optional qualifiers: (none)
   Advanced qualifiers: (none)
   General qualifiers:
  -help                boolean    Report command line options. More
                                  information on associated and general
                                  qualifiers can be found with -help -verbose


Mandatory qualifiers Allowed values Default
[-infdat]
(Parameter 1)
Enter name of prosite directory Any string is accepted An empty string is accepted
Optional qualifiers Allowed values Default
(none)
Advanced qualifiers Allowed values Default
(none)

Input file format

These files must be the "prosite.dat" and "prosite.doc" file of a Prosite distribution, containing all current prosite data.

Output file format

These files are held in the prosite subdirectory of the emboss data directory. The default names are "prosite.lines" and "PS*****" (accession number documentation files)

Data files

See Input file format above.

Notes

This program is most useful when used as a prerequisite for patmatmotifs.

References

  1. Bairoch, A., Bucher P. (1994) PROSITE: recent developments. Nucleic Acids Research, Vol 22, No.17 3583-3589.
  2. Bairoch, A., (1992) PROSITE: a dictionary of sites and patterns in proteins. Nucleic Acids Research, Vol 20, Supplement, 2013-2018.
  3. Peek, J., O'Reilly, T., Loukides, M., (1997) Unix Power Tools, 2nd Edition.

Warnings

The program will warn the user if the input file is incorrectly formatted.

Diagnostic Error Messages

As in warnings.

Exit status

Always exits with status 0

Known bugs

See also

Program nameDescription
aaindexextractExtract data from AAINDEX
cutgextractExtract data from CUTG
domainerReads protein coordinate files and writes domains coordinate files
funkyReads clean coordinate files and writes file of protein-heterogen contact data
groupsRemoves redundant hits from a scop families file
hetparseConverts raw dictionary of heterogen groups to a file in embl-like format
nrscopeConverts redundant EMBL-format SCOP file to non-redundant one
pdbparseParses pdb files and writes cleaned-up protein coordinate files
pdbtospConvert raw swissprot:pdb equivalence file to embl-like format
printsextractExtract data from PRINTS
rebaseextractExtract data from REBASE
scopeConvert raw scop classification file to embl-like format
scopnrRemoves redundant domains from a scop classification file
scopparseConverts raw scop classification files to a file in embl-like format
scopseqsAdds pdb and swissprot sequence records to a scop classification file
tfextractExtract data from TRANSFAC

Author(s)

This application was written by Sinead O'Leary (soleary@hgmp.mrc.ac.uk)

History

Completed March 24 1999.

Target users

This program is intended to be used by administrators responsible for software and database installation and maintenance.

Comments