![]() |
pepstats |
DayhoffStat is the amino acid's Dayhoff statistic divided by the molar percent. The Dayhoff statistic is the amino acid's relative occurence per 1000 aa normalised to 100 by rls@ebi.ac.uk (original work from 1993)
The probability of expression in inclusion bodies is sometimes referred to as a type of solubility measure. If, however, a recombinant protein is expressed in Escherichia coli, it can be expressed as soluble in the cytosol or insoluble in inclusion bodies. If the Harrison model predicts a given protein to be probably expressed in includion bodies, this doesn't mean that it is not possible to get it soluble in the cytosol. One example: Thermatoga maritima cell divison protein FtsA with a C-terminal His-Tag has a 58% Harrison probability of being expressed in inclusion bodies. However, there was plenty of soluble protein in the E. coli cytosol (F. van den Ent and J. Lowe, EMBO J. 19, 5300-5307 2000). If the protein is expressed in inclusion bodies or not is not only dependent on the sequence, but also on many other factors, such as E. coli strain, incubation temperature, type of expression vector, strength of promoter and medium.
% pepstats Protein statistics Input sequence: sw:laci_ecoli Output file [laci_ecoli.pepstats]:
Mandatory qualifiers: [-sequencea] sequence Sequence USA -outfile outfile Output file name Optional qualifiers: (none) Advanced qualifiers: -[no]termini boolean Include charge at N and C terminus -aadata string Molecular weight data for amino acids General qualifiers: -help boolean Report command line options. More information on associated and general qualifiers can be found with -help -verbose |
Mandatory qualifiers | Allowed values | Default | |
---|---|---|---|
[-sequencea] (Parameter 1) |
Sequence USA | Readable sequence | Required |
-outfile | Output file name | Output file | <sequence>.pepstats |
Optional qualifiers | Allowed values | Default | |
(none) | |||
Advanced qualifiers | Allowed values | Default | |
-[no]termini | Include charge at N and C terminus | Yes/No | Yes |
-aadata | Molecular weight data for amino acids | Any string is accepted | Eamino.dat |
PEPSTATS of LACI_ECOLI from 1 to 360 Molecular weight = 38563.97 Residues = 360 Average Residue Weight = 107.122 Charge = 1.5 Isoelectric Point = 6.8820 Improbability of expression in inclusion bodies = 0.670 Residue Number Mole% DayhoffStat A = Ala 44 12.222 1.421 B = Asx 0 0.000 0.000 C = Cys 3 0.833 0.287 D = Asp 17 4.722 0.859 E = Glu 15 4.167 0.694 F = Phe 4 1.111 0.309 G = Gly 22 6.111 0.728 H = His 7 1.944 0.972 I = Ile 18 5.000 1.111 K = Lys 11 3.056 0.463 L = Leu 40 11.111 1.502 M = Met 10 2.778 1.634 N = Asn 12 3.333 0.775 P = Pro 14 3.889 0.748 Q = Gln 28 7.778 1.994 R = Arg 19 5.278 1.077 S = Ser 33 9.167 1.310 T = Thr 19 5.278 0.865 V = Val 34 9.444 1.431 W = Trp 2 0.556 0.427 X = Xaa 0 0.000 0.000 Y = Tyr 8 2.222 0.654 Z = Glx 0 0.000 0.000 Property Residues Number Mole% Tiny (A+C+G+S+T) 121 33.611 Small (A+B+C+D+G+N+P+S+T+V) 198 55.000 Aliphatic (I+L+V) 92 25.556 Aromatic (F+H+W+Y) 21 5.833 Non-polar (A+C+F+G+I+L+M+P+V+W+Y) 199 55.278 Polar (D+E+H+K+N+Q+R+S+T+Z) 161 44.722 Charged (B+D+E+H+K+R+Z) 69 19.167 Basic (H+K+R) 37 10.278 Acidic (B+D+E+Z) 32 8.889
EMBOSS data files are distributed with the application and stored in the standard EMBOSS data directory, which is defined by EMBOSS environment variable EMBOSS_DATA.
Users can provide their own data files in their own directories. Project specific files can be put in the current directory, or for tidier directory listings in a subdirectory called ".embossdata". Files for all EMBOSS runs can be put in the user's home directory, or again in a subdirectory called ".embossdata".
The directories are searched in the following order:
Program name | Description |
---|---|
backtranseq | Back translate a protein sequence |
charge | Protein charge plot |
checktrans | Reports STOP codons and ORF statistics of a protein sequence |
compseq | Counts the composition of dimer/trimer/etc words in a sequence |
emowse | Protein identification by mass spectrometry |
freak | Residue/base frequency table or plot |
iep | Calculates the isoelectric point of a protein |
mwcontam | Shows molwts that match across a set of files |
mwfilter | Filter noisy molwts from mass spec output |
octanol | Displays protein hydropathy |
pepinfo | Plots simple amino acid properties in parallel |
pepwindow | Displays protein hydropathy |
pepwindowall | Displays protein hydropathy of a set of sequences |