Module Bio::Alignment::EnumerableExtension
In: lib/bio/alignment.rb  (CVS)

The module Bio::Alignment::EnumerableExtension is a set of useful methods for multiple sequence alignment. It can be included by any classes or can be extended to any objects. The classes or objects must have methods defined in Enumerable, and must have the each method which iterates over each sequence (or string) and yields a sequence (or string) object.

Optionally, if each_seq method is defined, which iterates over each sequence (or string) and yields each sequence (or string) object, it is used instead of each.

Note that the each or each_seq method would be called multiple times. This means that the module is not suitable for IO objects. In addition, break would be used in the given block and destructive methods would be used to the sequences.

For Array or Hash objects, you‘d better using ArrayExtension or HashExtension modules, respectively. They would have built-in each_seq method and/or some methods would be redefined.

Methods

Included Modules

PropertyMethods Output

Public Instance methods

Iterates over each sequence and results running blocks are collected and returns a new alignment as a Bio::Alignment::SequenceArray object.

Note that it would be redefined if you want to change return value‘s class.

[Source]

# File lib/bio/alignment.rb, line 445
      def alignment_collect
        a = SequenceArray.new
        a.set_all_property(get_all_property)
        each_seq do |str|
          a << yield(str)
        end
        a
      end

Concatenates the given alignment. align must have each_seq or each method.

Returns self.

Note that it is a destructive method.

For Hash, please use it carefully because the order of the sequences is inconstant and key information is completely ignored.

[Source]

# File lib/bio/alignment.rb, line 849
      def alignment_concat(align)
        flag = nil
        a = []
        each_seq { |s| a << s }
        i = 0
        begin
          align.each_seq do |seq|
            flag = true
            a[i].concat(seq) if a[i] and seq
            i += 1
          end
          return self
        rescue NoMethodError, ArgumentError => evar
          raise evar if flag
        end
        align.each do |seq|
          a[i].concat(seq) if a[i] and seq
          i += 1
        end
        self
      end

Returns the alignment length. Returns the longest length of the sequence in the alignment.

[Source]

# File lib/bio/alignment.rb, line 366
      def alignment_length
        maxlen = 0
        each_seq do |s|
          x = s.length
          maxlen = x if x > maxlen
        end
        maxlen
      end

Removes excess gaps in the head of the sequences. If removes nothing, returns nil. Otherwise, returns self.

Note that it is a destructive method.

[Source]

# File lib/bio/alignment.rb, line 752
      def alignment_lstrip!
        #(String-like)
        pos = 0
        each_site do |a|
          a.remove_gaps!
          if a.empty?
            pos += 1
          else
            break
          end
        end
        return nil if pos <= 0
        each_seq { |s| s[0, pos] = '' }
        self
      end

Fills gaps to the tail of each sequence if the length of the sequence is shorter than the alignment length.

Note that it is a destructive method.

[Source]

# File lib/bio/alignment.rb, line 712
      def alignment_normalize!
        #(original)
        len = alignment_length
        each_seq do |s|
          s << (gap_char * (len - s.length)) if s.length < len
        end
        self
      end

Removes excess gaps in the tail of the sequences. If removes nothing, returns nil. Otherwise, returns self.

Note that it is a destructive method.

[Source]

# File lib/bio/alignment.rb, line 727
      def alignment_rstrip!
        #(String-like)
        len = alignment_length
        newlen = len
        each_site_step(len - 1, 0, -1) do |a|
          a.remove_gaps!
          if a.empty? then
            newlen -= 1
          else
            break
          end
        end
        return nil if newlen >= len
        each_seq do |s|
          s[newlen..-1] = '' if s.length > newlen
        end
        self
      end

Gets a site of the position. Returns a Bio::Alignment::Site object.

If the position is out of range, it returns the site of which all are gaps.

[Source]

# File lib/bio/alignment.rb, line 403
      def alignment_site(position)
        site = _alignment_site(position)
        site.set_all_property(get_all_property)
        site
      end

Returns the specified range of the alignment. For each sequence, the ‘slice’ method (it may be String#slice, which is the same as String#[]) is executed, and returns a new alignment as a Bio::Alignment::SequenceArray object.

Unlike alignment_window method, the result alignment might contain nil.

If you want to change return value‘s class, you should redefine alignment_collect method.

[Source]

# File lib/bio/alignment.rb, line 807
      def alignment_slice(*arg)
        #(String-like)
        #(BioPerl) AlignI::slice like method
        alignment_collect do |s|
          s.slice(*arg)
        end
      end

Removes excess gaps in the sequences. If removes nothing, returns nil. Otherwise, returns self.

Note that it is a destructive method.

[Source]

# File lib/bio/alignment.rb, line 774
      def alignment_strip!
        #(String-like)
        r = alignment_rstrip!
        l = alignment_lstrip!
        (r or l)
      end

For each sequence, the ‘subseq’ method (Bio::Seqeunce::Common#subseq is expected) is executed, and returns a new alignment as a Bio::Alignment::SequenceArray object.

All sequences in the alignment are expected to be kind of Bio::Sequence::NA or Bio::Sequence::AA objects.

Unlike alignment_window method, the result alignment might contain nil.

If you want to change return value‘s class, you should redefine alignment_collect method.

[Source]

# File lib/bio/alignment.rb, line 829
      def alignment_subseq(*arg)
        #(original)
        alignment_collect do |s|
          s.subseq(*arg)
        end
      end

Returns specified range of the alignment. For each sequence, the ’[]’ method (it may be String#[]) is executed, and returns a new alignment as a Bio::Alignment::SequenceArray object.

Unlike alignment_slice method, the result alignment are guaranteed to contain String object if the range specified is out of range.

If you want to change return value‘s class, you should redefine alignment_collect method.

[Source]

# File lib/bio/alignment.rb, line 466
      def alignment_window(*arg)
        alignment_collect do |s|
          s[*arg] or seqclass.new('')
        end
      end

Iterates over each site of the alignment and results running the block are collected and returns an array. It yields a Bio::Alignment::Site object.

[Source]

# File lib/bio/alignment.rb, line 503
      def collect_each_site
        ary = []
        each_site do |site|
          ary << yield(site)
        end
        ary
      end

Helper method for calculating consensus sequence. It iterates over each site of the alignment. In each site, gaps will be removed if specified with opt. It yields a Bio::Alignment::Site object. Results running the block (String objects are expected) are joined to a string and it returns the string.

 opt[:gap_mode] ==> 0 -- gaps are regarded as normal characters
                    1 -- a site within gaps is regarded as a gap
                   -1 -- gaps are eliminated from consensus calculation
     default: 0

[Source]

# File lib/bio/alignment.rb, line 523
      def consensus_each_site(opt = {})
        mchar = (opt[:missing_char] or self.missing_char)
        gap_mode = opt[:gap_mode]
        case gap_mode
        when 0, nil
          collect_each_site do |a|
            yield(a) or mchar
          end.join('')
        when 1
          collect_each_site do |a|
            a.has_gap? ? gap_char : (yield(a) or mchar)
          end.join('')
        when -1
          collect_each_site do |a|
            a.remove_gaps!
            a.empty? ? gap_char : (yield(a) or mchar)
          end.join('')
        else
          raise ':gap_mode must be 0, 1 or -1'
        end
      end

Returns the IUPAC consensus string of the alignment of nucleic-acid sequences.

It resembles the BioPerl‘s AlignI::consensus_iupac method.

Please refer to the consensus_each_site method for opt.

[Source]

# File lib/bio/alignment.rb, line 565
      def consensus_iupac(opt = {})
        consensus_each_site(opt) do |a|
          a.consensus_iupac
        end
      end

Returns the consensus string of the alignment. 0.0 <= threshold <= 1.0 is expected.

It resembles the BioPerl‘s AlignI::consensus_string method.

Please refer to the consensus_each_site method for opt.

[Source]

# File lib/bio/alignment.rb, line 552
      def consensus_string(threshold = 1.0, opt = {})
        consensus_each_site(opt) do |a|
          a.consensus_string(threshold)
        end
      end

This is the BioPerl‘s AlignI::match like method.

Changes second to last sequences’ sites to match_char(default: ’.’) when a site is equeal to the first sequence‘s corresponding site.

Note that it is a destructive method.

For Hash, please use it carefully because the order of the sequences is inconstant.

[Source]

# File lib/bio/alignment.rb, line 662
      def convert_match(match_char = '.')
        #(BioPerl) AlignI::match like method
        len = alignment_length
        firstseq = nil
        each_seq do |s|
          unless firstseq then
            firstseq = s
          else
            (0...len).each do |i|
              if s[i] and firstseq[i] == s[i] and !is_gap?(firstseq[i..i])
                s[i..i] = match_char
              end
            end
          end
        end
        self
      end

This is the BioPerl‘s AlignI::unmatch like method.

Changes second to last sequences’ sites match_char(default: ’.’) to original sites’ characters.

Note that it is a destructive method.

For Hash, please use it carefully because the order of the sequences is inconstant.

[Source]

# File lib/bio/alignment.rb, line 690
      def convert_unmatch(match_char = '.')
        #(BioPerl) AlignI::unmatch like method
        len = alignment_length
        firstseq = nil
        each_seq do |s|
          unless firstseq then
            firstseq = s
          else
            (0...len).each do |i|
              if s[i..i] == match_char then
                s[i..i] = (firstseq[i..i] or match_char)
              end
            end
          end
        end
        self
      end

Iterates over each sequences. Yields a sequence. It acts the same as Enumerable#each.

You would redefine the method suitable for the class/object.

[Source]

# File lib/bio/alignment.rb, line 340
      def each_seq(&block) #:yields: seq
        each(&block)
      end

Iterates over each site of the alignment. It yields a Bio::Alignment::Site object (which inherits Array). It returns self.

[Source]

# File lib/bio/alignment.rb, line 412
      def each_site
        cp = get_all_property
        (0...alignment_length).each do |i|
          site = _alignment_site(i)
          site.set_all_property(cp)
          yield(site)
        end
        self
      end

Iterates over each site of the alignment, with specifying start, stop positions and step. It yields Bio::Alignment::Site object (which inherits Array). It returns self. It is same as start.step(stop, step) { |i| yield alignment_site(i) }.

[Source]

# File lib/bio/alignment.rb, line 428
      def each_site_step(start, stop, step = 1)
        cp = get_all_property
        start.step(stop, step) do |i|
          site = _alignment_site(i)
          site.set_all_property(cp)
          yield(site)
        end
        self
      end

Iterates over each sliding window of the alignment. window_size is the size of sliding window. step is the step of each sliding. It yields a Bio::Alignment::SequenceArray object which contains each sliding window. It returns a Bio::Alignment::SequenceArray object which contains remainder alignment at the terminal end. If window_size is smaller than 0, it returns nil.

[Source]

# File lib/bio/alignment.rb, line 481
      def each_window(window_size, step_size = 1)
        return nil if window_size < 0
        if step_size >= 0 then
          last_step = nil
          0.step(alignment_length - window_size, step_size) do |i|
            yield alignment_window(i, window_size)
            last_step = i
          end
          alignment_window((last_step + window_size)..-1)
        else
          i = alignment_length - window_size
          while i >= 0
            yield alignment_window(i, window_size)
            i += step_size
          end
          alignment_window(0...(i-step_size))
        end
      end
lstrip!()

Alias for alignment_lstrip!

Returns the match line stirng of the alignment of nucleic- or amino-acid sequences. The type of the sequence is automatically determined or you can specify with opt[:type].

It resembles the BioPerl‘s AlignI::match_line method.

  opt[:type] ==> :na or :aa (or determined by sequence class)
  opt[:match_line_char]   ==> 100% equal    default: '*'
  opt[:strong_match_char] ==> strong match  default: ':'
  opt[:weak_match_char]   ==> weak match    default: '.'
  opt[:mismatch_char]     ==> mismatch      default: ' '
    :strong_ and :weak_match_char are used only in amino mode (:aa)

More opt can be accepted. Please refer to the consensus_each_site method for opt.

[Source]

# File lib/bio/alignment.rb, line 624
      def match_line(opt = {})
        case opt[:type]
        when :aa
          amino = true
        when :na, :dna, :rna
          amino = false
        else
          if seqclass == Bio::Sequence::AA then
            amino = true
          elsif seqclass == Bio::Sequence::NA then
            amino = false
          else
            amino = nil
            self.each_seq do |x|
              if /[EFILPQ]/i =~ x
                amino = true
                break
              end
            end
          end
        end
        if amino then
          match_line_amino(opt)
        else
          match_line_nuc(opt)
        end
      end

Returns the match line stirng of the alignment of amino-acid sequences.

It resembles the BioPerl‘s AlignI::match_line method.

  opt[:match_line_char]   ==> 100% equal    default: '*'
  opt[:strong_match_char] ==> strong match  default: ':'
  opt[:weak_match_char]   ==> weak match    default: '.'
  opt[:mismatch_char]     ==> mismatch      default: ' '

More opt can be accepted. Please refer to the consensus_each_site method for opt.

[Source]

# File lib/bio/alignment.rb, line 584
      def match_line_amino(opt = {})
        collect_each_site do |a|
          a.match_line_amino(opt)
        end.join('')
      end

Returns the match line stirng of the alignment of nucleic-acid sequences.

It resembles the BioPerl‘s AlignI::match_line method.

  opt[:match_line_char]   ==> 100% equal    default: '*'
  opt[:mismatch_char]     ==> mismatch      default: ' '

More opt can be accepted. Please refer to the consensus_each_site method for opt.

[Source]

# File lib/bio/alignment.rb, line 601
      def match_line_nuc(opt = {})
        collect_each_site do |a|
          a.match_line_nuc(opt)
        end.join('')
      end
normalize!()

Alias for alignment_normalize!

Returns number of sequences in this alignment.

[Source]

# File lib/bio/alignment.rb, line 1315
      def number_of_sequences
        i = 0
        self.each_seq { |s| i += 1 }
        i
      end

Completely removes ALL gaps in the sequences. If removes nothing, returns nil. Otherwise, returns self.

Note that it is a destructive method.

[Source]

# File lib/bio/alignment.rb, line 787
      def remove_all_gaps!
        ret = nil
        each_seq do |s|
          x = s.gsub!(gap_regexp, '')
          ret ||= x
        end
        ret ? self : nil
      end
rstrip!()

Alias for alignment_rstrip!

seq_length()

Alias for alignment_length

Returns class of the sequence. If instance variable @seqclass (which can be set by ‘seqclass=’ method) is set, simply returns the value. Otherwise, returns the first sequence‘s class. If no sequences are found, returns nil.

[Source]

# File lib/bio/alignment.rb, line 349
      def seqclass
        if (defined? @seqclass) and @seqclass then
          @seqclass
        else
          klass = nil
          each_seq do |s|
            if s then
              klass = s.class
              break if klass
            end
          end
          (klass or String)
        end
      end

Returns an array of sequence names. The order of the names must be the same as the order of each_seq.

[Source]

# File lib/bio/alignment.rb, line 1324
      def sequence_names
        (0...(self.number_of_sequences)).to_a
      end
slice(*arg)

Alias for alignment_slice

strip!()

Alias for alignment_strip!

subseq(*arg)

Alias for alignment_subseq

window(*arg)

Alias for alignment_window

[Validate]