![]() |
cpgreport |
CpG refers to a C nucleotide immediately followed by a G. The 'p' in 'CpG' refers to the phosphate group linking the two bases.
Detection of regions of genomic sequences that are rich in the CpG pattern is important because such regions are resistant to methylation and tend to be associated with genes which are frequently switched on. Regions rich in the CpG pattern are known as CpG islands.
This program does not find CpG islands as normally defined: "a region of greater than 200 bp with a %GC of greater than 50% and observed/expected CpG > 0.6". cpgreport instead uses a running sum rather than a window to create the score as follows: if not CpG at position i, then decrement running-Sum counter, but if CpG then running-Sum counter is incremented by the CPGSCORE. Spans greater than the threshold are searched for recursively.
% cpgreport embl:rnu68037 Reports CpG rich regions CpG score [17]: Output file [rnu68037.cpgreport]:
Mandatory qualifiers: [-sequence] seqall Sequence database USA -score integer This sets the score for each CG sequence found. A value of 17 is more sensitive, but 28 has also been used with some success. [-outfile] outfile Output file name Optional qualifiers: (none) Advanced qualifiers: -featout featout File for output features General qualifiers: -help boolean Report command line options. More information on associated and general qualifiers can be found with -help -verbose |
Mandatory qualifiers | Allowed values | Default | |
---|---|---|---|
[-sequence] (Parameter 1) |
Sequence database USA | Readable sequence(s) | Required |
-score | This sets the score for each CG sequence found. A value of 17 is more sensitive, but 28 has also been used with some success. | Integer from 1 to 200 | 17 |
[-outfile] (Parameter 2) |
Output file name | Output file | <sequence>.cpgreport |
Optional qualifiers | Allowed values | Default | |
(none) | |||
Advanced qualifiers | Allowed values | Default | |
-featout | File for output features | Writeable feature table | unknown.gff |
CPGREPORT of RNU68037 from 1 to 1218 Sequence Begin End Score CpG %CG CG/GC RNU68037 12 13 17 1 100.0 - RNU68037 47 48 17 1 100.0 - RNU68037 96 1032 630 87 66.1 0.65 RNU68037 1072 1100 26 3 62.1 0.00 RNU68037 1139 1140 17 1 100.0 - RNU68037 1183 1193 26 2 72.7 2.00
The first non-blank line of the output file is the title line giving the program name, the name of sequence being analysed and the start and end positions of the sequence.
The second non-blank line contains the headings of teh columns.
Subsequent lines contain columns with the following information:
If the count of GpC in the region is zero, then the ratio of CG/GC is reported as '-'.
Program name | Description |
---|---|
cpgplot | Plot CpG rich areas |
geecee | Calculates the fractional GC content of nucleic acid sequences |
newcpgreport | Report CpG rich areas |
newcpgseek | Reports CpG rich regions |
It was modified for inclusion in EGCG under the name 'CPGSPANS' by Rodrigo Lopez S. (E-mail: rls@ebi.ac.uk Post: EMBL Outstation Hinxton, The European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, UK).
This application was modified for inclusion in EMBOSS by Alan Bleasby (ableasby@hgmp.mrc.ac.uk)