The Computer Language
Benchmarks Game

regex-redux Ruby #2 program

source code

# The Computer Language Benchmarks Game
# http://benchmarksgame.alioth.debian.org
#
# regex-dna program contributed by jose fco. gonzalez
# optimized & parallelized by Rick Branson
# optimized for ruby2 by Aaron Tavistock
# converted from regex-dna program

seq = STDIN.readlines.join
ilen = seq.size

seq.gsub!(/>.*\n|\n/,"")
clen = seq.length

MATCHERS = [
  /agggtaaa|tttaccct/,
  /[cgt]gggtaaa|tttaccc[acg]/,
  /a[act]ggtaaa|tttacc[agt]t/,
  /ag[act]gtaaa|tttac[agt]ct/,
  /agg[act]taaa|ttta[agt]cct/,
  /aggg[acg]aaa|ttt[cgt]ccct/,
  /agggt[cgt]aa|tt[acg]accct/,
  /agggta[cgt]a|t[acg]taccct/,
  /agggtaa[cgt]|[acg]ttaccct/
]

threads = MATCHERS.map do |f|
  Thread.new do
    Thread.current[:result] = "#{f.source} #{seq.scan(f).size}"
  end
end

threads.each do |t|
  t.join
end

match_results = threads.map do |t|
  t[:result]
end

{
  /tHa[Nt]/ => '<4>', 
  /aND|caN|Ha[DS]|WaS/ => '<3>', 
  /a[NSt]|BY/ => '<2>', 
  /<[^>]*>/ => '|',
  /\|[^|][^|]*\|/ => '-'
}.each { |f,r| seq.gsub!(f,r) }

puts "#{match_results.join("\n")}\n\n#{ilen}\n#{clen}\n#{seq.length}"
    

notes, command-line, and program output

NOTES:
64-bit Ubuntu quad core
ruby 2.4.0p0 (2016-12-24 revision 57164) [x86_64-linux]


Mon, 20 Mar 2017 22:59:53 GMT

COMMAND LINE:
/usr/local/src/ruby/bin/ruby -W0 regexredux.yarv-2.yarv 0 < regexredux-input5000000.txt

PROGRAM OUTPUT:
agggtaaa|tttaccct 356
[cgt]gggtaaa|tttaccc[acg] 1250
a[act]ggtaaa|tttacc[agt]t 4252
ag[act]gtaaa|tttac[agt]ct 2894
agg[act]taaa|ttta[agt]cct 5435
aggg[acg]aaa|ttt[cgt]ccct 1537
agggt[cgt]aa|tt[acg]accct 1431
agggta[cgt]a|t[acg]taccct 1608
agggtaa[cgt]|[acg]ttaccct 2178

50833411
50000000
27388361