scopalign

 

Function

Generate alignments for families in a scop classification file by using STAMP

Description

scopalign parses a SCOP classification file in EMBL-like format generated by the EMBOSS applications scope or nrscope, and domain coordinate files generated by the EMBOSS application domainer, and calls stamp to generate structural alignments for each SCOP family in turn.

VERY IMPORTANT NOTE

scopalign will only run with with a version of stamp which has been modified so that PDB ID codes of length greater than 4 characters are acceptable. This involves a trivial change to the stamp module getdomain.c (around line number 155), a 4 must be changed to a 7 as follows:
temp=getfile(domain[0].id,dirfile,4,OUTPUT);
temp=getfile(domain[0].id,dirfile,7,OUTPUT);

The modified code is kept on the HGMP file system in /packages/stamp/src2 WHEN RUNNING SCOPALIGN AT THE HGMP IT IS ESSENTIAL THAT THE COMMAND 'use stamp2' (which runs the script /packages/menu/USE/stamp2) IS GIVEN BEFORE SCOPALIGN IS RUN. This will ensure that the modified version of stamp is used.

Usage

Here is a sample session with scopalign:

% scopalign

Command line arguments

   Mandatory qualifiers:
  [-scopf]             infile     Name of scop classification file (embl
                                  format input)
  [-path]              string     Location of scop structure-based sequence
                                  alignment files (output)
  [-extn]              string     Extension of scop structure-based sequence
                                  alignment files (output)
  [-pathc]             string     Location of scop structure alignment files
                                  (output)
  [-extnc]             string     Extension of scop structure alignment files
                                  (output)

   Optional qualifiers: (none)
   Advanced qualifiers: (none)
   General qualifiers:
  -help                boolean    Report command line options. More
                                  information on associated and general
                                  qualifiers can be found with -help -verbose


Mandatory qualifiers Allowed values Default
[-scopf]
(Parameter 1)
Name of scop classification file (embl format input) Input file Escop.dat
[-path]
(Parameter 2)
Location of scop structure-based sequence alignment files (output) Any string is accepted ./
[-extn]
(Parameter 3)
Extension of scop structure-based sequence alignment files (output) Any string is accepted .salign
[-pathc]
(Parameter 4)
Location of scop structure alignment files (output) Any string is accepted ./
[-extnc]
(Parameter 5)
Extension of scop structure alignment files (output) Any string is accepted .palign
Optional qualifiers Allowed values Default
(none)
Advanced qualifiers Allowed values Default
(none)

Input file format

scopalign parses a SCOP classification file in EMBL-like format generated by the EMBOSS applications scope or nrscope, and domain coordinate files generated by the EMBOSS application domainer.

Output file format

The names of the output files are identical to the names of the families given in the SCOP classification records, except that if a file of a certain name already exists, then an "_1", "_2" etc will be added as appropriate.

The format of the scopalign output file (Figure 1) is similar to the output file generated by stamp when issued with the following three types of command:

(1) stamp -l ./stamps_file.dom -s -n 2 -slide 5 -prefix ./stamps_file -d ./stamps_file.set;sorttrans -f ./stamps_file.scan -s Sc 2.5 > ./stamps_file.sort;stamp -l ./stamps_file.sort -prefix ./stamps_file > ./stamps_file.log

(2) poststamp -f ./stamps_file.3 -min 0.5

(3) ver2hor -f ./stamps_file.3.post > ./stamps_file.out

However, the SCOP classification records for the appopriate family are written above the alignment, no dssp assignments are given, and only the 'Post similar' line is given. Also, 7 character domain identifier codes taken from the scop classificaiton file are given.

Figure 1 Example of scopalign output file

CL   All alpha proteins
XX
FO   Globin-like
XX
SF   Globin-like
XX
FA   Globins
XX
Number               10        20        30        40        50    
d1vrea_              LSAAQRQVVASTWKDIAgsdngAGVGKECFTKFLSAHHDMAAV f gFS
d3sdhb_      svydaaaqLTADVKKDLRDSWKVIG sd kKGNGVALMTTLFADNQETIGYfkrlGN
d3hbia_      svydaaaqLTADVKKDLRDSWKVIG sd kKGNGVALMTTLFADNQETIGYfkrlGN
d3sdha_      svydaaaqLTADVKKDLRDSWKVIG sd kKGNGVALMTTLFADNQETIGYfkrlGN
Post_similar --------11111111111111111-00-1111111111111111111111-0-111

Number        60        70        80        90       100       110 
d1vrea_      GAS   dpGVADLGAKVLAQIGVAVSHLgDEGKMVAEMKAVGVRHKgygnkhIKAEY
d3sdhb_      VSQgmandKLRGHSITLMYALQNFIDQLdNPDDLVCVVEKFAVNHI  t rkISAAE
d3hbia_      VSQgmandKLRGHSITLMYALQNFIDQLdNPDDLVCVVEKLAVNHI  t rkISAAE
d3sdha_      VSQgmandKLRGHSITLMYALQNFIDQLdNPDDLVCVVEKFAVNHI  t rkISAAE
Post_similar 111---0011111111111111111111011111111111111111--0-0011111

Number          120       130       140       150       160
d1vrea_      FEPlGASL LSAMEhriggkMNAAAKDAWAAAYADisgalisglqs
d3sdhb_      FGK INGPiKKVLA s k nFGDKYANAWAKLVAVvqa al     
d3hbia_      FGK INGPiKKVLA s k nFGDKYANAWAKLVAVvqa al     
d3sdha_      FGK INGPiKKVLA s k nFGDKYANAWAKLVAVvqa al     
Post_similar 111-1111-11111-0-0-1111111111111111100-00-----

Data files

None

Notes

scopalign will only run with with a version of stamp which has been modified so that PDB ID codes of length greater than 4 characters are acceptable. This involves a trivial change to the stamp module getdomain.c (around line number 155), a 4 must be changed to a 7 as follows:
temp=getfile(domain[0].id,dirfile,4,OUTPUT);
temp=getfile(domain[0].id,dirfile,7,OUTPUT);

The modified code is kept on the HGMP file system in /packages/stamp/src2 WHEN RUNNING SCOPALIGN AT THE HGMP IT IS ESSENTIAL THAT THE COMMAND 'use stamp2' (which runs the script /packages/menu/USE/stamp2) IS GIVEN BEFORE SCOPALIGN IS RUN. This will ensure that the modified version of stamp is used.

References

None.

Warnings

None.

Diagnostic Error Messages

None.

Exit status

It always exits with status 0.

Known bugs

None.

See also

Program nameDescription
contactsReads coordinate files and writes files of intra-chain residue-residue contact data
dichetParse dictionary of heterogen groups
hmmgenGenerates a hidden Markov model for each alignment in a directory
interfaceReads coordinate files and writes files of inter-chain residue-residue contact data
profgenGenerates various profiles for each alignment in a directory
psiblastsRuns PSI-BLAST given scopalign alignments
scoprepReorder scop classificaiton file so that the representative structure of each family is given first
scopresoRemoves low resolution domains from a scop classification file
seqalignGenerate extended alignments for families in a scop families file by using CLUSTALW with seed alignments
seqsearchGenerate files of hits for families in a scop classification file by using PSI-BLAST with seed alignments
seqsortReads multiple files of hits and writes a non-ambiguous file of hits (scop families file) plus a validation file
seqwordsGenerate file of hits for scop families by searching swissprot with keywords
siggenGenerates a sparse protein signature from an alignment and residue contact data
sigscanScans a signature against swissprot and writes a signature hits files

Author(s)

This application was written by Jon Ison (jison@hgmp.mrc.ac.uk)

History

Written (May 2001) - Jon Ison

Target users

This program is intended to be used by everyone and everything, from naive users to embedded scripts.

Comments