scopseqs

 

Function

Adds pdb and swissprot sequence records to a scop classification file

Description

This is part of Jon Ison's protein structure analysis package.

This package is still being developed.

Please ignore this program until further details can be documented.

All further queries should go to Jon Ison. (Jon Ison)

Usage

Here is a sample session with scopseqs:

% scopseqs

Command line arguments

   Mandatory qualifiers:
  [-scopin]            infile     Name of scop file for input (embl-like
                                  format)
  [-pdbtosp]           infile     Name of the pdbcodes to swissprot indexing
                                  (embl-like format)
  [-dpdb]              string     Location of clean domain coordinate files
                                  for input (embl-like format)
  [-extn]              string     File extension of clean domain coordinate
                                  files
  [-scopout]           outfile    Name of processed file for output (embl-like
                                  format)
  [-errf]              outfile    Name of log file for the build

   Optional qualifiers: (none)
   Advanced qualifiers:
   -datafile           matrixf    This is the scoring matrix file used when
                                  comparing sequences.
   -gapopen            float      The gap insertion penalty is the score taken
                                  away when a gap is created. The best value
                                  depends on the choice of comparison matrix.
                                  The default value assumes you are using the
                                  EBLOSUM62 matrix for protein sequences, and
                                  the EDNAFULL matrix for nucleotide
                                  sequences.
   -gapextend          float      The gap extension, penalty is added to the
                                  standard gap penalty for each base or
                                  residue in the gap. This is how long gaps
                                  are penalized. Usually you will expect a few
                                  long gaps rather than many short gaps, so
                                  the gap extension penalty should be lower
                                  than the gap penalty. An exception is where
                                  one or both sequences are single reads with
                                  possible sequencing errors in which case you
                                  would expect many single base gaps. You can
                                  get this result by setting the gap open
                                  penalty to zero (or very low) and using the
                                  gap extension penalty to control gap
                                  scoring.

   General qualifiers:
  -help                boolean    Report command line options. More
                                  information on associated and general
                                  qualifiers can be found with -help -verbose


Mandatory qualifiers Allowed values Default
[-scopin]
(Parameter 1)
Name of scop file for input (embl-like format) Input file Escop.dat
[-pdbtosp]
(Parameter 2)
Name of the pdbcodes to swissprot indexing (embl-like format) Input file Epdbtosp.dat
[-dpdb]
(Parameter 3)
Location of clean domain coordinate files for input (embl-like format) Any string is accepted ./
[-extn]
(Parameter 4)
File extension of clean domain coordinate files Any string is accepted .pxyz
[-scopout]
(Parameter 5)
Name of processed file for output (embl-like format) Output file Escop.dat.out
[-errf]
(Parameter 6)
Name of log file for the build Output file scopseqs.log
Optional qualifiers Allowed values Default
(none)
Advanced qualifiers Allowed values Default
-datafile This is the scoring matrix file used when comparing sequences. Comparison matrix file in EMBOSS data path EBLOSUM62
-gapopen The gap insertion penalty is the score taken away when a gap is created. The best value depends on the choice of comparison matrix. The default value assumes you are using the EBLOSUM62 matrix for protein sequences, and the EDNAFULL matrix for nucleotide sequences. Floating point number from 1.0 to 100.0 10.0 for any sequence
-gapextend The gap extension, penalty is added to the standard gap penalty for each base or residue in the gap. This is how long gaps are penalized. Usually you will expect a few long gaps rather than many short gaps, so the gap extension penalty should be lower than the gap penalty. An exception is where one or both sequences are single reads with possible sequencing errors in which case you would expect many single base gaps. You can get this result by setting the gap open penalty to zero (or very low) and using the gap extension penalty to control gap scoring. Floating point number from 0.0 to 10.0 0.5 for any sequence

Input file format

Output file format

Data files

Notes

None.

References

None.

Warnings

None.

Diagnostic Error Messages

None.

Exit status

It always exits with status 0.

Known bugs

None.

See also

Program nameDescription
aaindexextractExtract data from AAINDEX
cutgextractExtract data from CUTG
domainerReads protein coordinate files and writes domains coordinate files
funkyReads clean coordinate files and writes file of protein-heterogen contact data
groupsRemoves redundant hits from a scop families file
hetparseConverts raw dictionary of heterogen groups to a file in embl-like format
nrscopeConverts redundant EMBL-format SCOP file to non-redundant one
pdbparseParses pdb files and writes cleaned-up protein coordinate files
pdbtospConvert raw swissprot:pdb equivalence file to embl-like format
printsextractExtract data from PRINTS
prosextractBuilds the PROSITE motif database for patmatmotifs to search
rebaseextractExtract data from REBASE
scopeConvert raw scop classification file to embl-like format
scopnrRemoves redundant domains from a scop classification file
scopparseConverts raw scop classification files to a file in embl-like format
tfextractExtract data from TRANSFAC

Author(s)

This application was written by Jon Ison (jison@hgmp.mrc.ac.uk)

History

Written (date) - author.

Target users

This program is intended to be used by everyone and everything, from naive users to embedded scripts.

Comments