The Computer Language
Benchmarks Game

regex-dna Ruby JRuby #8 program

source code

# The Computer Language Benchmarks Game
# http://benchmarksgame.alioth.debian.org
#
# contributed by jose fco. gonzalez
# optimized & parallelized by Rick Branson
# optimized & parallelized by Aaron Tavistock

def count_pattern_matches(seq, matchers)
  threads = []
  results = {}
  matchers.each do |matcher|
    threads << Thread.new do
      read, write = IO.pipe
      Process.fork do
        read.close
        count = 0
        seq.scan( Regexp.new(matcher) ) { count += 1 }
        write.print(count)
      end
      Process.wait
      write.close
      results[matcher] = read.read.to_i
    end
  end
  threads.each { |t| t.join }
  results
end

seq = STDIN.read
origin_len = seq.size

seq.gsub!(/>[^\n]+\n|\n/,'')
clean_len = seq.size

matchers = [
  'agggtaaa|tttaccct',
  '[cgt]gggtaaa|tttaccc[acg]',
  'a[act]ggtaaa|tttacc[agt]t',
  'ag[act]gtaaa|tttac[agt]ct',
  'agg[act]taaa|ttta[agt]cct',
  'aggg[acg]aaa|ttt[cgt]ccct',
  'agggt[cgt]aa|tt[acg]accct',
  'agggta[cgt]a|t[acg]taccct',
  'agggtaa[cgt]|[acg]ttaccct'
]

match_counts = count_pattern_matches(seq, matchers)

replacements = {
  'B' => '(c|g|t)',
  'D' => '(a|g|t)',
  'H' => '(a|c|t)',
  'K' => '(g|t)',
  'M' => '(a|c)',
  'N' => '(a|c|g|t)',
  'R' => '(a|g)',
  'S' => '(c|t)',
  'V' => '(a|c|g)',
  'W' => '(a|t)',
  'Y' => '(c|t)'
}

seq.gsub!(/[BDHKMNRSVWY]/, replacements)

matchers.each do |matcher|
  print "#{matcher} #{match_counts[matcher]}\n"
end
print "\n#{origin_len}\n#{clean_len}\n#{seq.size}\n"
    

notes, command-line, and program output

NOTES:
32-bit Ubuntu one core
jruby 9.1.0.0 (2.3.0) 2016-05-02 a633c63 Java HotSpot(TM) Server VM 25.92-b14 on 1.8.0_92-b14 +jit [linux-i386]



Wed, 04 May 2016 05:16:37 GMT

MAKE:
mv regexdna.jruby-8.jruby regexdna.rb
0.01s to complete and log all make actions

COMMAND LINE:
/usr/local/src/jruby-9.1.0.0/bin/jruby -Xcompile.fastest=true -Xcompile.invokedynamic=true -J-server -J-Xmn512m -J-Xms2048m -J-Xmx2048m regexdna.rb 0 < regexdna-input50000.txt

PROGRAM FAILED 


PROGRAM OUTPUT:

NotImplementedError: fork is not available on this platform
                            fork at org/jruby/RubyKernel.java:1738
                            fork at org/jruby/RubyProcess.java:1547
  block in count_pattern_matches at regexdna.rb:14