![]() |
tfextract |
TRANSFAC started 1988 with a printed compilation (Nucleic Acids Res. 16: 1879-1902, 1988) and was transferred into computer-readable format in 1990 (BioTechForum - Advances in Molecular Genetics (J. Collins, A.J. Driesel, eds.) 4:95-108, 1991). The basic structures of Table 1 and 2 of the compilation were taken as the core of the emergent database. The aim of the early compilation as well as of the TRANSFAC database is
The program tfextract extracts data from the TRANSFAC database file site.dat. This file contains information on individual (putatively) regulatory protein binding sites. About half of these refer to sites within eukaryotic genes. Just under half of them resulted from mutagenesis studies, in vitro selection procedures starting from random oligonucleotide mixtures or from specific theoretical considerations. And finally, there are about 5% with consensus binding sequences given in the IUPAC code, many of them being taken from the compilation of Faisst and Meyer (Nucleic Acids Res. 20:3-26, 1992). A number of consensi have been generated by the TRANSFAC team, generally derived from the profiles stored in the MATRIX table.
The data is split up by taxonomic groups:
and placed in individual files:
These files are stored in the EMBOSS data directory, see Data Files below.
% tfextract Extract data from TRANSFAC Full pathname of transfac SITE.DAT: /data/transfac/site.dat
Mandatory qualifiers: [-inf] infile Full pathname of transfac SITE.DAT Optional qualifiers: (none) Advanced qualifiers: (none) General qualifiers: -help boolean Report command line options. More information on associated and general qualifiers can be found with -help -verbose |
Mandatory qualifiers | Allowed values | Default | |
---|---|---|---|
[-inf] (Parameter 1) |
Full pathname of transfac SITE.DAT | Input file | Required |
Optional qualifiers | Allowed values | Default | |
(none) | |||
Advanced qualifiers | Allowed values | Default | |
(none) |
http://transfac.gbf.de/cgi-bin/download/download.pl
These files are used by the tfscan program to search for TRANSFAC sites in sequences.
% ls -1s emboss/data/tf* 18 emboss/data/tffungi 17 emboss/data/tfinsect 56 emboss/data/tfother 4 emboss/data/tfplant 112 emboss/data/tfvertebrate
EMBOSS data files are distributed with the application and stored in the standard EMBOSS data directory, which is defined by the EMBOSS environment variable EMBOSS_DATA.
To see the available EMBOSS data files, run:
% embossdata -showall
To fetch one of the data files (for example 'Exxx.dat') into your current directory for you to inspect or modify, run:
% embossdata -fetch -file Exxx.dat
Users can provide their own data files in their own directories. Project specific files can be put in the current directory, or for tidier directory listings in a subdirectory called ".embossdata". Files for all EMBOSS runs can be put in the user's home directory, or again in a subdirectory called ".embossdata".
The directories are searched in the following order:
Program name | Description |
---|---|
aaindexextract | Extract data from AAINDEX |
cutgextract | Extract data from CUTG |
domainer | Reads protein coordinate files and writes domains coordinate files |
funky | Reads clean coordinate files and writes file of protein-heterogen contact data |
groups | Removes redundant hits from a scop families file |
hetparse | Converts raw dictionary of heterogen groups to a file in embl-like format |
nrscope | Converts redundant EMBL-format SCOP file to non-redundant one |
pdbparse | Parses pdb files and writes cleaned-up protein coordinate files |
pdbtosp | Convert raw swissprot:pdb equivalence file to embl-like format |
printsextract | Extract data from PRINTS |
prosextract | Builds the PROSITE motif database for patmatmotifs to search |
rebaseextract | Extract data from REBASE |
scope | Convert raw scop classification file to embl-like format |
scopnr | Removes redundant domains from a scop classification file |
scopparse | Converts raw scop classification files to a file in embl-like format |
scopseqs | Adds pdb and swissprot sequence records to a scop classification file |