Class Bio::PhyloXML::Parser
In: lib/bio/db/phyloxml/phyloxml_parser.rb
Parent: Object

Description

Bio::PhyloXML::Parser is for parsing phyloXML format files.

Requirements

Libxml2 XML parser is required. Install libxml-ruby bindings from libxml.rubyforge.org or

  gem install -r libxml-ruby

Usage

  require 'bio'

 # Create new phyloxml parser
 phyloxml = Bio::PhyloXML::Parser.open('example.xml')

 # Print the names of all trees in the file
 phyloxml.each do |tree|
   puts tree.name
 end

References

www.phyloxml.org/documentation/version_100/phyloxml.xsd.html

Methods

[]   close   closed?   each   for_io   new   next_tree   open   open_uri  

Included Modules

LibXML

Attributes

other  [R]  After parsing all the trees, if there is anything else in other xml format, it is saved in this array of PhyloXML::Other objects

Public Class methods

Initializes LibXML::Reader and reads from the IO until it reaches the first phylogeny element.

Create a new Bio::PhyloXML::Parser object.

  p = Bio::PhyloXML::Parser.for_io($stdin)

Arguments:

  • (required) io: IO object
  • (optional) validate: For IO reader, the "validate" option is ignored and no validation is executed.
Returns:Bio::PhyloXML::Parser object

[Source]

     # File lib/bio/db/phyloxml/phyloxml_parser.rb, line 218
218:     def self.for_io(io, validate=true)
219:       obj = new(nil, validate)
220:       obj.instance_eval {
221:         @reader = XML::Reader.io(io,
222:                                  { :options =>
223:                                    LibXML::XML::Parser::Options::NONET })
224:         _skip_leader
225:       }
226:       obj
227:     end

Initializes LibXML::Reader and reads the PhyloXML-formatted string until it reaches the first phylogeny element.

Create a new Bio::PhyloXML::Parser object.

  str = File.read("./phyloxml_examples.xml")
  p = Bio::PhyloXML::Parser.new(str)

Deprecated usage: Reads data from a file. <em>str<em> is a filename.

  p = Bio::PhyloXML::Parser.new("./phyloxml_examples.xml")

Taking filename is deprecated. Use Bio::PhyloXML::Parser.open(filename).


Arguments:

  • (required) str: PhyloXML-formatted string
  • (optional) validate: Whether to validate the file against schema or not. Default value is true.
Returns:Bio::PhyloXML::Parser object

[Source]

     # File lib/bio/db/phyloxml/phyloxml_parser.rb, line 318
318:     def initialize(str, validate=true)
319: 
320:       @other = []
321: 
322:       return unless str
323: 
324:       # For compatibility, if filename-like string is given,
325:       # treat it as a filename.
326:       if /[\<\>\r\n]/ !~ str and File.exist?(str) then
327:         # assume that str is filename
328:         warn "Bio::PhyloXML::Parser.new(filename) is deprecated. Use Bio::PhyloXML::Parser.open(filename)."
329:         filename = _secure_filename(str)
330:         _validate(:file, filename) if validate
331:         @reader = XML::Reader.file(filename)
332:         _skip_leader
333:         return
334:       end
335: 
336:       # initialize for string
337:       @reader = XML::Reader.string(str,
338:                                    { :options =>
339:                                      LibXML::XML::Parser::Options::NONET })
340:       _skip_leader
341:     end

Initializes LibXML::Reader and reads the file until it reaches the first phylogeny element.

Example: Create a new Bio::PhyloXML::Parser object.

  p = Bio::PhyloXML::Parser.open("./phyloxml_examples.xml")

If the optional code block is given, Bio::PhyloXML object is passed to the block as an argument. When the block terminates, the Bio::PhyloXML object is automatically closed, and the open method returns the value of the block.

Example: Get the first tree in the file.

  tree = Bio::PhyloXML::Parser.open("example.xml") do |px|
    px.next_tree
  end

Arguments:

  • (required) filename: Path to the file to parse.
  • (optional) validate: Whether to validate the file against schema or not. Default value is true.
Returns:(without block) Bio::PhyloXML::Parser object
Returns:(with block) the value of the block

[Source]

     # File lib/bio/db/phyloxml/phyloxml_parser.rb, line 102
102:     def self.open(filename, validate=true)
103:       obj = new(nil, validate)
104:       obj.instance_eval {
105:         filename = _secure_filename(filename)
106:         _validate(:file, filename) if validate
107:         # XML::Parser::Options::NONET for security reason
108:         @reader = XML::Reader.file(filename,
109:                                    { :options =>
110:                                      LibXML::XML::Parser::Options::NONET })
111:         _skip_leader
112:       }
113:       if block_given? then
114:         begin
115:           ret = yield obj
116:         ensure
117:           obj.close if obj and !obj.closed?
118:         end
119:         ret
120:       else
121:         obj
122:       end
123:     end

Initializes LibXML::Reader and reads the file until it reaches the first phylogeny element.

Create a new Bio::PhyloXML::Parser object.

  p = Bio::PhyloXML::Parser.open_uri("http://www.phyloxml.org/examples/apaf.xml")

If the optional code block is given, Bio::PhyloXML object is passed to the block as an argument. When the block terminates, the Bio::PhyloXML object is automatically closed, and the open_uri method returns the value of the block.


Arguments:

  • (required) uri: (URI or String) URI to the data to parse
  • (optional) validate: For URI reader, the "validate" option is ignored and no validation is executed.
Returns:(without block) Bio::PhyloXML::Parser object
Returns:(with block) the value of the block

[Source]

     # File lib/bio/db/phyloxml/phyloxml_parser.rb, line 143
143:     def self.open_uri(uri, validate=true)
144:       case uri
145:       when URI
146:         uri = uri.to_s
147:       else
148:         # raises error if not a String
149:         uri = uri.to_str
150:         # raises error if invalid URI
151:         URI.parse(uri)
152:       end
153: 
154:       obj = new(nil, validate)
155:       obj.instance_eval {
156:         @reader = XML::Reader.file(uri)
157:         _skip_leader
158:       }
159:       if block_given? then
160:         begin
161:           ret = yield obj
162:         ensure
163:           obj.close if obj and !obj.closed?
164:         end
165:       else
166:         obj
167:       end
168:     end

Public Instance methods

Access the specified tree in the file. It parses trees until the specified tree is reached.

 # Get 3rd tree in the file (starts counting from 0).
 parser = PhyloXML::Parser.open('phyloxml_examples.xml')
 tree = parser[2]

[Source]

     # File lib/bio/db/phyloxml/phyloxml_parser.rb, line 364
364:     def [](i)
365:       tree = nil
366:       (i+1).times do
367:        tree =  self.next_tree
368:       end
369:       return tree
370:     end

Closes the LibXML::Reader inside the object. It also closes the opened file if it is created by using Bio::PhyloXML::Parser.open method.

When closed object is closed again, or closed object is used, it raises LibXML::XML::Error.


Returns:nil

[Source]

     # File lib/bio/db/phyloxml/phyloxml_parser.rb, line 188
188:     def close
189:       @reader.close
190:       @reader = ClosedPhyloXMLParser.new
191:       nil
192:     end

If the object is closed by using the close method or equivalent, returns true. Otherwise, returns false.


Returns:true or false

[Source]

     # File lib/bio/db/phyloxml/phyloxml_parser.rb, line 198
198:     def closed?
199:       if @reader.kind_of?(ClosedPhyloXMLParser) then
200:         true
201:       else
202:         false
203:       end
204:     end

Iterate through all trees in the file.

 phyloxml = Bio::PhyloXML::Parser.open('example.xml')
 phyloxml.each do |tree|
   puts tree.name
 end

[Source]

     # File lib/bio/db/phyloxml/phyloxml_parser.rb, line 351
351:     def each
352:       while tree = next_tree
353:         yield tree
354:       end
355:     end

Parse and return the next phylogeny tree. If there are no more phylogeny element, nil is returned. If there is something else besides phylogeny elements, it is saved in the PhyloXML::Parser#other.

 p = Bio::PhyloXML::Parser.open("./phyloxml_examples.xml")
 tree = p.next_tree

Returns:Bio::PhyloXML::Tree

[Source]

     # File lib/bio/db/phyloxml/phyloxml_parser.rb, line 381
381:     def next_tree()
382: 
383:       if not is_element?('phylogeny')
384:         if @reader.node_type == XML::Reader::TYPE_END_ELEMENT
385:           if is_end_element?('phyloxml')
386:             return nil
387:           else
388:             @reader.read
389:             @reader.read
390:             if is_end_element?('phyloxml')
391:               return nil
392:             end
393:           end
394:         end        
395:         # phyloxml can hold only phylogeny and "other" elements. If this is not
396:         # phylogeny element then it is other. Also, "other" always comes after
397:         # all phylogenies        
398:         @other << parse_other        
399:         #return nil for tree, since this is not valid phyloxml tree.
400:         return nil
401:       end
402: 
403:       tree = Bio::PhyloXML::Tree.new
404: 
405:       # keep track of current node in clades array/stack. Current node is the
406:       # last element in the clades array
407:       clades = []
408:       clades.push tree
409:       
410:       #keep track of current edge to be able to parse branch_length tag
411:       current_edge = nil
412: 
413:       # we are going to parse clade iteratively by pointing (and changing) to
414:       # the current node in the tree. Since the property element is both in
415:       # clade and in the phylogeny, we need some boolean to know if we are
416:       # parsing the clade (there can be only max 1 clade in phylogeny) or
417:       # parsing phylogeny
418:       parsing_clade = false
419: 
420:       while not is_end_element?('phylogeny') do
421:         break if is_end_element?('phyloxml')
422:         
423:         # parse phylogeny elements, except clade
424:         if not parsing_clade
425: 
426:           if is_element?('phylogeny')
427:             @reader["rooted"] == "true" ? tree.rooted = true : tree.rooted = false
428:             @reader["rerootable"] == "true" ? tree.rerootable = true : tree.rerootable = false
429:             parse_attributes(tree, ["branch_length_unit", 'type'])
430:           end
431: 
432:           parse_simple_elements(tree, [ "name", 'description', "date"])
433: 
434:           if is_element?('confidence')
435:             tree.confidences << parse_confidence
436:           end
437: 
438:         end
439: 
440:         if @reader.node_type == XML::Reader::TYPE_ELEMENT
441:           case @reader.name
442:           when 'clade'
443:             #parse clade element
444: 
445:             parsing_clade = true
446: 
447:             node= Bio::PhyloXML::Node.new
448: 
449:             branch_length = @reader['branch_length']
450: 
451:             parse_attributes(node, ["id_source"])
452: 
453:             #add new node to the tree
454:             tree.add_node(node)
455:             # The first clade will always be root since by xsd schema phyloxml can
456:             # have 0 to 1 clades in it.
457:             if tree.root == nil
458:               tree.root = node
459:             else
460:               current_edge = tree.add_edge(clades[-1], node,
461:                                            Bio::Tree::Edge.new(branch_length))
462:             end
463:             clades.push node
464:             #end if clade element
465:           else
466:            parse_clade_elements(clades[-1], current_edge) if parsing_clade
467:           end
468:         end
469: 
470:         #end clade element, go one parent up
471:         if is_end_element?('clade')
472: 
473:            #if we have reached the closing tag of the top-most clade, then our
474:           # curent node should point to the root, If thats the case, we are done
475:           # parsing the clade element
476:           if clades[-1] == tree.root
477:             parsing_clade = false
478:           else
479:             # set current node (clades[-1) to the previous clade in the array
480:             clades.pop
481:           end
482:         end          
483: 
484:         #parsing phylogeny elements
485:         if not parsing_clade
486: 
487:           if @reader.node_type == XML::Reader::TYPE_ELEMENT
488:             case @reader.name
489:             when 'property'
490:               tree.properties << parse_property
491: 
492:             when 'clade_relation'
493:               clade_relation = CladeRelation.new
494:               parse_attributes(clade_relation, ["id_ref_0", "id_ref_1", "distance", "type"])
495: 
496:               #@ add unit test for this
497:               if not @reader.empty_element?
498:                 @reader.read
499:                 if is_element?('confidence')
500:                   clade_relation.confidence = parse_confidence
501:                 end
502:               end
503:               tree.clade_relations << clade_relation
504: 
505:             when 'sequence_relation'
506:               sequence_relation = SequenceRelation.new
507:               parse_attributes(sequence_relation, ["id_ref_0", "id_ref_1", "distance", "type"])
508:               if not @reader.empty_element?
509:                 @reader.read
510:                 if is_element?('confidence')
511:                   sequence_relation.confidence = parse_confidence
512:                 end
513:               end
514:               tree.sequence_relations << sequence_relation
515:             when 'phylogeny'
516:               #do nothing
517:             else
518:               tree.other << parse_other
519:               #puts "Not recognized element. #{@reader.name}"
520:             end
521:           end
522:         end
523:         # go to next element        
524:         @reader.read    
525:       end #end while not </phylogeny>
526:       #move on to the next tag after /phylogeny which is text, since phylogeny
527:       #end tag is empty element, which value is nil, therefore need to move to
528:       #the next meaningful element (therefore @reader.read twice)
529:       @reader.read 
530:       @reader.read
531: 
532:       return tree
533:     end

[Validate]