performance measurements

Each table row shows performance measurements for this Scala program with a particular command-line input value N.

 N  CPU secs Elapsed secs Memory KB Code B ≈ CPU Load
50,0001.560.6339,720668  79% 69% 55% 47%
500,0004.752.12142,332668  43% 47% 94% 42%
5,000,00036.3416.40864,252668  46% 68% 41% 68%

Read the ↓ make, command line, and program output logs to see how this program was run.

Read regex-dna benchmark to see what this program should do.

 notes

java version "1.8.0"
Java(TM) SE Runtime Environment (build 1.8.0-b132)
Java HotSpot(TM) Server VM (build 25.0-b70, mixed mode)

Scala compiler version 2.10.3 -- Copyright 2002-2013, LAMP/EPFL

 regex-dna Scala #2 program source code

/* The Computer Language Benchmarks Game
   http://benchmarksgame.alioth.debian.org/

   Contributed by The Anh Tran
   Updated for 2.8 by Rex Kerr
   Modified by Michael Peng for 2.10
*/

import scala.concurrent.duration.Duration
import java.util.regex.Pattern
import scala.concurrent._
import ExecutionContext.Implicits.global
import scala.io.Source

object regexdna {
  def main(args : Array[String]) {
    // load data from stdin
    var initInput = Source.stdin.mkString
    val init_len = initInput length

    // strip header & newline
    val input = ">.*\n|\n".r replaceAllIn(initInput, "")
    val strip_len = input length

    // counting patterns
    val patterns  = Seq(
      "agggtaaa|tttaccct" ,
      "[cgt]gggtaaa|tttaccc[acg]",
      "a[act]ggtaaa|tttacc[agt]t",
      "ag[act]gtaaa|tttac[agt]ct",
      "agg[act]taaa|ttta[agt]cct",
      "aggg[acg]aaa|ttt[cgt]ccct",
      "agggt[cgt]aa|tt[acg]accct",
      "agggta[cgt]a|t[acg]taccct",
      "agggtaa[cgt]|[acg]ttaccct")

    // queue tasks, each task is handled in a separate thread
    val count_results  = patterns map( pt =>
      future(
        (pt, pt.r.findAllIn(input).length)
      )
    )

    // replace IUB
    val iub = Map(
      "B" -> "(c|g|t)",
      "D" -> "(a|g|t)",
      "H" -> "(a|c|t)",
      "K" -> "(g|t)",
      "M" -> "(a|c)",
      "N" -> "(a|c|g|t)",
      "R" -> "(a|g)",
      "S" -> "(c|g)",
      "V" -> "(a|c|g)",
      "W" -> "(a|t)",
      "Y" -> "(c|t)")

    val replace_result  = {
      val buffer  = new StringBuffer((input.length * 3) / 2)
      val matcher  = Pattern compile "[BDHKMNRSVWY]" matcher input

      while ( matcher find )
        matcher appendReplacement( buffer, iub(matcher group))

      matcher appendTail buffer
      buffer length
    }

    // print results
    Await.result(Future.sequence(count_results), Duration.Inf) foreach (v => printf("%s %d\n", v._1, v._2))
    printf( "\n%d\n%d\n%d\n", init_len, strip_len, replace_result )
  }
}

 make, command-line, and program output logs

Wed, 19 Mar 2014 07:43:22 GMT

MAKE:
mv regexdna.scala-2.scala regexdna.scala
/usr/local/src/scala-2.10.3/bin/scalac -optimise -target:jvm-1.7 regexdna.scala
warning: there were 5 feature warning(s); re-run with -feature for details
warning: there were 5 inliner warning(s); re-run with -Yinline-warnings for details
two warnings found
3.96s to complete and log all make actions

COMMAND LINE:
env JAVA_OPTS=-Xmx1024m /usr/local/src/jdk1.8.0/bin/java -server -XX:+TieredCompilation -XX:+AggressiveOpts  -Xbootclasspath/a:/usr/local/src/scala-2.10.3/lib/scala-library.jar:/usr/local/src/scala-2.10.3/lib/akka-actors.jar:/usr/local/src/scala-2.10.3/lib/typesafe-config.jar regexdna 0 < regexdna-input5000000.txt

PROGRAM OUTPUT:
agggtaaa|tttaccct 356
[cgt]gggtaaa|tttaccc[acg] 1250
a[act]ggtaaa|tttacc[agt]t 4252
ag[act]gtaaa|tttac[agt]ct 2894
agg[act]taaa|ttta[agt]cct 5435
aggg[acg]aaa|ttt[cgt]ccct 1537
agggt[cgt]aa|tt[acg]accct 1431
agggta[cgt]a|t[acg]taccct 1608
agggtaa[cgt]|[acg]ttaccct 2178

50833411
50000000
66800214

Revised BSD license

  Home   Conclusions   License   Play