Class Bio::GFF::GFF3::Record::Gap
In: lib/bio/db/gff.rb
Parent: Object

Bio:GFF::GFF3::Record::Gap is a class to store data of "Gap" attribute.

Methods

Classes and Modules

Class Bio::GFF::GFF3::Record::Gap::Code

Constants

Code = Struct.new(:code, :length)   Code is a class to store length of single-letter code.

Attributes

data  [R]  Internal data. Users must not use it.

Public Class methods

Creates a new Gap object.


Arguments:

  • str: a formatted string, or nil.

[Source]

      # File lib/bio/db/gff.rb, line 1275
1275:           def initialize(str = nil)
1276:             if str then
1277:               @data = str.split(/ +/).collect do |x|
1278:                 if /\A([A-Z])([0-9]+)\z/ =~ x.strip then
1279:                   Code.new($1.intern, $2.to_i)
1280:                 else
1281:                   warn "ignored unknown token: #{x}.inspect" if $VERBOSE
1282:                   nil
1283:                 end
1284:               end
1285:               @data.compact!
1286:             else
1287:               @data = []
1288:             end
1289:           end

Creates a new Gap object from given sequence alignment.

Note that sites of which both reference and target are gaps are silently removed.


Arguments:

  • reference: reference sequence (nucleotide sequence)
  • target: target sequence (nucleotide sequence)
  • gap_regexp: regexp to identify gap

[Source]

      # File lib/bio/db/gff.rb, line 1391
1391:           def self.new_from_sequences_na(reference, target,
1392:                                          gap_regexp = /[^a-zA-Z]/)
1393:             gap = self.new
1394:             gap.instance_eval { 
1395:               __initialize_from_sequences_na(reference, target,
1396:                                              gap_regexp)
1397:             }
1398:             gap
1399:           end

Creates a new Gap object from given sequence alignment.

Note that sites of which both reference and target are gaps are silently removed.

For incorrect alignments that break 3:1 rule, gap positions will be moved inside codons, unwanted gaps will be removed, and some forward or reverse frameshift will be inserted.

For example,

   atgg-taagac-att
   M  V  K  -  I

is treated as:

   atggt<aagacatt
   M  V  K  >>I

Incorrect combination of frameshift with frameshift or gap may cause undefined behavior.

Forward frameshifts are recomended to be indicated in the target sequence. Reverse frameshifts can be indicated in the reference sequence or the target sequence.

Priority of regular expressions:

  space > forward/reverse frameshift > gap

Arguments:

  • reference: reference sequence (nucleotide sequence)
  • target: target sequence (amino acid sequence)
  • gap_regexp: regexp to identify gap
  • space_regexp: regexp to identify space character which is completely ignored
  • forward_frameshift_regexp: regexp to identify forward frameshift
  • reverse_frameshift_regexp: regexp to identify reverse frameshift

[Source]

      # File lib/bio/db/gff.rb, line 1587
1587:           def self.new_from_sequences_na_aa(reference, target,
1588:                                             gap_regexp = /[^a-zA-Z]/,
1589:                                             space_regexp = /\s/,
1590:                                             forward_frameshift_regexp = /\>/,
1591:                                             reverse_frameshift_regexp = /\</)
1592:             gap = self.new
1593:             gap.instance_eval { 
1594:               __initialize_from_sequences_na_aa(reference, target,
1595:                                                 gap_regexp,
1596:                                                 space_regexp,
1597:                                                 forward_frameshift_regexp,
1598:                                                 reverse_frameshift_regexp)
1599:             }
1600:             gap
1601:           end

Same as new(str).

[Source]

      # File lib/bio/db/gff.rb, line 1292
1292:           def self.parse(str)
1293:             self.new(str)
1294:           end

Public Instance methods

If self == other, returns true. otherwise, returns false.

[Source]

      # File lib/bio/db/gff.rb, line 1615
1615:           def ==(other)
1616:             if other.class == self.class and
1617:                 @data == other.data then
1618:               true
1619:             else
1620:               false
1621:             end
1622:           end

Processes nucleotide sequences and returns gapped sequences as an array of sequences.

Note for forward/reverse frameshift: Forward/Reverse_frameshift is simply treated as gap insertion to the target/reference sequence.


Arguments:

  • reference: reference sequence (nucleotide sequence)
  • target: target sequence (nucleotide sequence)
  • gap_char: gap character

[Source]

      # File lib/bio/db/gff.rb, line 1715
1715:           def process_sequences_na(reference, target, gap_char = '-')
1716:             s_ref, s_tgt = dup_seqs(reference, target)
1717: 
1718:             s_ref, s_tgt = __process_sequences(s_ref, s_tgt,
1719:                                                gap_char, gap_char,
1720:                                                1, 1,
1721:                                                gap_char, gap_char)
1722: 
1723:             if $VERBOSE and s_ref.length != s_tgt.length then
1724:               warn "returned sequences not equal length"
1725:             end
1726:             return s_ref, s_tgt
1727:           end

Processes sequences and returns gapped sequences as an array of sequences. reference must be a nucleotide sequence, and target must be an amino acid sequence.

Note for reverse frameshift: Reverse_frameshift characers are inserted in the reference sequence. For example, alignment of "Gap=M3 R1 M2" is:

    atgaagat<aatgtc
    M  K  I  N  V

Alignment of "Gap=M3 R3 M3" is:

    atgaag<<<attaatgtc
    M  K  I  I  N  V

Arguments:

  • reference: reference sequence (nucleotide sequence)
  • target: target sequence (amino acid sequence)
  • gap_char: gap character
  • space_char: space character inserted to amino sequence for matching na-aa alignment
  • forward_frameshift: forward frameshift character
  • reverse_frameshift: reverse frameshift character

[Source]

      # File lib/bio/db/gff.rb, line 1752
1752:           def process_sequences_na_aa(reference, target,
1753:                                       gap_char = '-',
1754:                                       space_char = ' ',
1755:                                       forward_frameshift = '>',
1756:                                       reverse_frameshift = '<')
1757:             s_ref, s_tgt = dup_seqs(reference, target)
1758:             s_tgt = s_tgt.gsub(/./, "\\0#{space_char}#{space_char}")
1759:             ref_increment = 3
1760:             tgt_increment = 1 + space_char.length * 2
1761:             ref_gap = gap_char * 3
1762:             tgt_gap = "#{gap_char}#{space_char}#{space_char}"
1763:             return __process_sequences(s_ref, s_tgt,
1764:                                        ref_gap, tgt_gap,
1765:                                        ref_increment, tgt_increment,
1766:                                        forward_frameshift,
1767:                                        reverse_frameshift)
1768:           end

string representation

[Source]

      # File lib/bio/db/gff.rb, line 1604
1604:           def to_s
1605:             @data.collect { |x| x.to_s }.join(" ")
1606:           end

[Validate]