/mobile Handheld Friendly website

 performance measurements

Each table row shows performance measurements for this Scala program with a particular command-line input value N.

 N  CPU secs Elapsed secs Memory KB Code B ≈ CPU Load
50,0000.940.6327,624611  6% 68% 35% 43%
500,0005.084.07194,768611  8% 10% 97% 12%
5,000,00039.0135.12747,980611  18% 61% 5% 29%

Read the ↓ make, command line, and program output logs to see how this program was run.

Read regex-dna benchmark to see what this program should do.

 notes

java version "1.8.0"
Java(TM) SE Runtime Environment (build 1.8.0-b132)
Java HotSpot(TM) Server VM (build 25.0-b70, mixed mode)

Scala compiler version 2.10.3 -- Copyright 2002-2013, LAMP/EPFL

 regex-dna Scala program source code

/* The Computer Language Benchmarks Game
   http://benchmarksgame.alioth.debian.org/
  contributed by Isaac Gouy
  modified and updated for 2.8 by Rex Kerr
*/

import java.io._

object regexdna {
  def main(args: Array[String]) {

    var sequence = readFully()
    val initialLength = sequence.length

    def matching(s: String) = java.util.regex.Pattern.compile(s).matcher(sequence)

    // remove FASTA sequence descriptions and new-lines
    sequence = matching(">.*\n|\n").replaceAll("")
    val codeLength = sequence.length

    // regex match
    Array(
      "agggtaaa|tttaccct",
      "[cgt]gggtaaa|tttaccc[acg]",
      "a[act]ggtaaa|tttacc[agt]t",
      "ag[act]gtaaa|tttac[agt]ct",
      "agg[act]taaa|ttta[agt]cct",
      "aggg[acg]aaa|ttt[cgt]ccct",
      "agggt[cgt]aa|tt[acg]accct",
      "agggta[cgt]a|t[acg]taccct",
      "agggtaa[cgt]|[acg]ttaccct"
    ).map(v => {
      var count = 0
      val m = matching(v)
      while (m.find()) count += 1
      println(v + " " + count)
    })

    // regex substitution
    Array(
      ("B", "(c|g|t)"),
      ("D", "(a|g|t)"),
      ("H", "(a|c|t)"),
      ("K", "(g|t)"),
      ("M", "(a|c)"),
      ("N", "(a|c|g|t)"),
      ("R", "(a|g)"),
      ("S", "(c|g)"),
      ("V", "(a|c|g)"),
      ("W", "(a|t)"),
      ("Y", "(c|t)")
    ).foreach(iub => sequence = matching(iub._1).replaceAll(iub._2) )

    println("\n" + initialLength + "\n" + codeLength + "\n" + sequence.length)
  }

  def readFully() = {
    val block = new Array[Char](10240)
    val buffer = new StringBuffer
    val r = new InputStreamReader(System.in)

    Iterator.
      continually(r.read(block)).
      takeWhile(_ > -1).
      foreach(n => buffer.append(block,0,n))

   r.close
   buffer.toString
  }
}

 make, command-line, and program output logs

Wed, 19 Mar 2014 07:46:34 GMT

MAKE:
mv regexdna.scala regexdna.scala
mv: ‘regexdna.scala’ and ‘regexdna.scala’ are the same file
make: [regexdna.scala_run] Error 1 (ignored)
/usr/local/src/scala-2.10.3/bin/scalac -optimise -target:jvm-1.7 regexdna.scala
3.80s to complete and log all make actions

COMMAND LINE:
env JAVA_OPTS=-Xmx1024m /usr/local/src/jdk1.8.0/bin/java -server -XX:+TieredCompilation -XX:+AggressiveOpts  -Xbootclasspath/a:/usr/local/src/scala-2.10.3/lib/scala-library.jar:/usr/local/src/scala-2.10.3/lib/akka-actors.jar:/usr/local/src/scala-2.10.3/lib/typesafe-config.jar regexdna 0 < regexdna-input5000000.txt

PROGRAM OUTPUT:
agggtaaa|tttaccct 356
[cgt]gggtaaa|tttaccc[acg] 1250
a[act]ggtaaa|tttacc[agt]t 4252
ag[act]gtaaa|tttac[agt]ct 2894
agg[act]taaa|ttta[agt]cct 5435
aggg[acg]aaa|ttt[cgt]ccct 1537
agggt[cgt]aa|tt[acg]accct 1431
agggta[cgt]a|t[acg]taccct 1608
agggtaa[cgt]|[acg]ttaccct 2178

50833411
50000000
66800214

Revised BSD license

  Home   Conclusions   License   Play