performance measurements

Each table row shows performance measurements for this Scala program with a particular command-line input value N.

 N  CPU secs Elapsed secs Memory KB Code B ≈ CPU Load
50,0001.360.5735,800668  70% 48% 72% 49%
500,0003.761.45132,792668  79% 73% 52% 57%
5,000,00024.858.96676,292668  82% 78% 58% 61%

Read the ↓ make, command line, and program output logs to see how this program was run.

Read regex-dna benchmark to see what this program should do.

 notes

java version "1.8.0_45"
Java(TM) SE Runtime Environment (build 1.8.0_45-b14)
Java HotSpot(TM) 64-Bit Server VM (build 25.45-b02, mixed mode)

Scala compiler version 2.11.6 -- Copyright 2002-2013, LAMP/EPFL

 regex-dna Scala #2 program source code

/* The Computer Language Benchmarks Game
   http://benchmarksgame.alioth.debian.org/

   Contributed by The Anh Tran
   Updated for 2.8 by Rex Kerr
   Modified by Michael Peng for 2.10
*/

import scala.concurrent.duration.Duration
import java.util.regex.Pattern
import scala.concurrent._
import ExecutionContext.Implicits.global
import scala.io.Source

object regexdna {
  def main(args : Array[String]) {
    // load data from stdin
    var initInput = Source.stdin.mkString
    val init_len = initInput length

    // strip header & newline
    val input = ">.*\n|\n".r replaceAllIn(initInput, "")
    val strip_len = input length

    // counting patterns
    val patterns  = Seq(
      "agggtaaa|tttaccct" ,
      "[cgt]gggtaaa|tttaccc[acg]",
      "a[act]ggtaaa|tttacc[agt]t",
      "ag[act]gtaaa|tttac[agt]ct",
      "agg[act]taaa|ttta[agt]cct",
      "aggg[acg]aaa|ttt[cgt]ccct",
      "agggt[cgt]aa|tt[acg]accct",
      "agggta[cgt]a|t[acg]taccct",
      "agggtaa[cgt]|[acg]ttaccct")

    // queue tasks, each task is handled in a separate thread
    val count_results  = patterns map( pt =>
      future(
        (pt, pt.r.findAllIn(input).length)
      )
    )

    // replace IUB
    val iub = Map(
      "B" -> "(c|g|t)",
      "D" -> "(a|g|t)",
      "H" -> "(a|c|t)",
      "K" -> "(g|t)",
      "M" -> "(a|c)",
      "N" -> "(a|c|g|t)",
      "R" -> "(a|g)",
      "S" -> "(c|g)",
      "V" -> "(a|c|g)",
      "W" -> "(a|t)",
      "Y" -> "(c|t)")

    val replace_result  = {
      val buffer  = new StringBuffer((input.length * 3) / 2)
      val matcher  = Pattern compile "[BDHKMNRSVWY]" matcher input

      while ( matcher find )
        matcher appendReplacement( buffer, iub(matcher group))

      matcher appendTail buffer
      buffer length
    }

    // print results
    Await.result(Future.sequence(count_results), Duration.Inf) foreach (v => printf("%s %d\n", v._1, v._2))
    printf( "\n%d\n%d\n%d\n", init_len, strip_len, replace_result )
  }
}

 make, command-line, and program output logs

Tue, 19 May 2015 20:07:14 GMT

MAKE:
mv regexdna.scala-2.scala regexdna.scala
/usr/local/src/scala-2.11.6/bin/scalac -optimise -target:jvm-1.8 regexdna.scala
Picked up JAVA_TOOL_OPTIONS: -javaagent:/usr/share/java/jayatanaag.jar 
warning: there was one deprecation warning; re-run with -deprecation for details
warning: there were 5 feature warnings; re-run with -feature for details
two warnings found
4.96s to complete and log all make actions

COMMAND LINE:
env JAVA_OPTS=-Xmx1024m /usr/local/src/jdk1.8.0_45/bin/java -server -XX:+TieredCompilation -XX:+AggressiveOpts  -Xbootclasspath/a:/usr/local/src/scala-2.11.6/lib/scala-library.jar:/usr/local/src/scala-2.11.6/lib/akka-actors.jar:/usr/local/src/scala-2.11.6/lib/typesafe-config.jar regexdna 0 < regexdna-input5000000.txt

PROGRAM OUTPUT:
agggtaaa|tttaccct 356
[cgt]gggtaaa|tttaccc[acg] 1250
a[act]ggtaaa|tttacc[agt]t 4252
ag[act]gtaaa|tttac[agt]ct 2894
agg[act]taaa|ttta[agt]cct 5435
aggg[acg]aaa|ttt[cgt]ccct 1537
agggt[cgt]aa|tt[acg]accct 1431
agggta[cgt]a|t[acg]taccct 1608
agggtaa[cgt]|[acg]ttaccct 2178

50833411
50000000
66800214

Picked up JAVA_TOOL_OPTIONS: -javaagent:/usr/share/java/jayatanaag.jar 

Revised BSD license

  Home   Conclusions   License   Play