![]() |
degapseq |
In fact, if does more than just this as it removes ANY non-alphabetic character from the input sequence, so as well as removing the gap-characters, it will remove such things as the '*' in protein sequenecs that indicates the position of a 'translated' STOP codon.
There are many different formats for storing sequences in files. Some sequence formats allow you to store aligned sequences, including the information on where gaps have been introduced to make the sequence align properly. This is indicated by using a special character to indicate that there is a gap at that position. Different sequence formats use different characters to indicate gaps. Some formats may use more than one type of character to indicate different types of gaps (e.g. gaps at the ends of the sequences, internal gaps, gaps introduced by a program or by a person editing the alignment, etc.) Some typicate characters used to indicate where gaps are may be: '.', '-' and '~'.
When EMBOSS programs read in a sequence that has gap-characters in, all gap characters are internally changed to '-' characters. i.e. EMBOSS only has one type of gap character. Thus any distinguishing characters for different gap types are reduced to a '-'. There is only one type of gap in EMBOSS.
degapseq removes any non-alphabetic character in the sequence, in effect this means that gaps and '*' characters are removed. The sequence is then written out.
% degapseq alignment.seq nogaps.seq
Mandatory qualifiers: [-sequence] seqall Sequence database USA [-outseq] seqoutall Output sequence(s) USA Optional qualifiers: (none) Advanced qualifiers: (none) General qualifiers: -help boolean Report command line options. More information on associated and general qualifiers can be found with -help -verbose |
Mandatory qualifiers | Allowed values | Default | |
---|---|---|---|
[-sequence] (Parameter 1) |
Sequence database USA | Readable sequence(s) | Required |
[-outseq] (Parameter 2) |
Output sequence(s) USA | Writeable sequence(s) | <sequence>.format |
Optional qualifiers | Allowed values | Default | |
(none) | |||
Advanced qualifiers | Allowed values | Default | |
(none) |
The input sequence can be nucleic or protein.
The input sequence can be gapped or ungapped.
An example of a sequence with gaps might be:
>dgshsh ATGCGCAGGTACGTATG....CTGACGGTACGTGATCGA-GCTGA-CGAGCGTATGC----- >hsf1 --------TGACTGATGCTGA~~~~CTG-ACGTGACTGATGCTGATCGTGACTGATCGTGAC >myclone1 ATGCGCAGGTACGTATGCTGACGGTACGTGATCGA-GCTGA-CGAGCGTATGC-----
An example is the ouput of the above input sequence:
>dgshsh ATGCGCAGGTACGTATGCTGACGGTACGTGATCGAGCTGACGAGCGTATGC >hsf1 TGACTGATGCTGACTGACGTGACTGATGCTGATCGTGACTGATCGTGAC >myclone1 ATGCGCAGGTACGTATGCTGACGGTACGTGATCGAGCTGACGAGCGTATGC
Program name | Description |
---|---|
biosed | Replace or delete sequence sections |
cutseq | Removes a specified section from a sequence |
descseq | Alter the name or description of a sequence |
entret | Reads and writes (returns) flatfile entries |
extractfeat | Extract features from a sequence |
extractseq | Extract regions from a sequence |
listor | Writes a list file of the logical OR of two sets of sequences |
maskfeat | Mask off features of a sequence |
maskseq | Mask off regions of a sequence |
newseq | Type in a short new sequence |
noreturn | Removes carriage return from ASCII files |
notseq | Excludes a set of sequences and writes out the remaining ones |
nthseq | Writes one sequence from a multiple set of sequences |
pasteseq | Insert one sequence into another |
revseq | Reverse and complement a sequence |
seqret | Reads and writes (returns) sequences |
seqretsplit | Reads and writes (returns) sequences in individual files |
skipseq | Reads and writes (returns) sequences, skipping the first few |
splitter | Split a sequence into (overlapping) smaller sequences |
swissparse | Retrieves sequences from swissprot using keyword search |
trimest | Trim poly-A tails off EST sequences |
trimseq | Trim ambiguous bits off the ends of sequences |
union | Reads sequence fragments and builds one sequence |
vectorstrip | Strips out DNA between a pair of vector sequences |
yank | Reads a sequence range, appends the full USA to a list file |