performance measurements

Each table row shows performance measurements for this Clojure program with a particular command-line input value N.

 N  CPU secs Elapsed secs Memory KB Code B ≈ CPU Load
50,0003.343.3475,888710  2% 1% 0% 100%
500,0007.527.53279,180710  0% 1% 1% 100%
5,000,000Failed  710   

Read the ↓ make, command line, and program output logs to see how this program was run.

Read regex-dna benchmark to see what this program should do.


java version "1.8.0_45"
Java(TM) SE Runtime Environment (build 1.8.0_45-b14)
Java HotSpot(TM) Server VM (build 25.45-b02, mixed mode)

Clojure 1.7.0

 regex-dna Clojure #3 program source code

;;   The Computer Language Benchmarks Game

;; contributed by Andy Fingerhut

(ns regexdna
  (:require [clojure.string :as str])
  (:import (java.util.regex Pattern)))

;; Slightly modified from standard library slurp so that it can read
;; from standard input.

(defn slurp-std-input
  ;; Reads the standard input using the encoding enc into a string and
  ;; returns it.
  ([] (slurp-std-input (.name (java.nio.charset.Charset/defaultCharset))))
  ([#^String enc]
     (with-open [r (new *in*)]
       (let [sb (new StringBuilder)]
	 (loop [c (.read r)]
	   (if (neg? c)
	     (str sb)
	       (.append sb (char c))
	       (recur (.read r)))))))))

(def dna-seq-regexes '(    "agggtaaa|tttaccct"
		       "agggtaa[cgt]|[acg]ttaccct" ))

(def iub-codes '( [ "B"  "(c|g|t)"   ]
		  [ "D"  "(a|g|t)"   ]
		  [ "H"  "(a|c|t)"   ]
		  [ "K"  "(g|t)"     ]
		  [ "M"  "(a|c)"     ]
		  [ "N"  "(a|c|g|t)" ]
		  [ "R"  "(a|g)"     ]
		  [ "S"  "(c|g)"     ]
		  [ "V"  "(a|c|g)"   ]
		  [ "W"  "(a|t)"     ]
		  [ "Y"  "(c|t)"     ] ))

(defn one-replacement [str [iub-str iub-replacement]]
  (str/replace str (. Pattern (compile iub-str)) iub-replacement))

(defn count-regex-occurrences [re s]
  ;; Prepending (?i) to the regexp in Java makes it
  ;; case-insensitive.
  [re (count (re-seq (. Pattern (compile (str "(?i)" re)))

(defn -main
  [& args]
  (let [content (slurp-std-input)
        original-len (count content)
        ;; I'd prefer if I could use the regexp #"(^>.*)?\n" like the
        ;; Perl benchmark does, but that only matches ^ at the beginning
        ;; of the string, not at the beginning of a line in the middle
        ;; of the string.
        content (str/replace content #"(^>.*|\n>.*)?\n" "")
        dna-seq-only-len (count content)]
    (doseq [[re num-matches] (pmap #(count-regex-occurrences % content)
      (printf "%s %d\n" re num-matches))
    (let [content (reduce one-replacement content iub-codes)]
      (printf "\n%d\n%d\n%d\n" original-len dna-seq-only-len (count content))))

 make, command-line, and program output logs

Wed, 01 Jul 2015 01:40:46 GMT

mv regexdna.clojure-3.clojure regexdna.clj
/usr/local/src/jdk1.8.0_45/bin/java -Dclojure.compile.path=. -cp .:/usr/local/src/clojure/clojure-1.7.0.jar: clojure.lang.Compile regexdna
Picked up JAVA_TOOL_OPTIONS: -javaagent:/usr/share/java/jayatanaag.jar 
Compiling regexdna to .
1.71s to complete and log all make actions

/usr/local/src/jdk1.8.0_45/bin/java -server -XX:+TieredCompilation -XX:+AggressiveOpts -Xmx512m -cp .:/usr/local/src/clojure/clojure-1.7.0.jar: regexdna 0 < regexdna-input5000000.txt



Picked up JAVA_TOOL_OPTIONS: -javaagent:/usr/share/java/jayatanaag.jar 
Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
	at java.util.Arrays.copyOfRange(
	at java.lang.StringBuffer.toString(
	at java.util.regex.Matcher.replaceAll(
	at clojure.string$replace.invoke(string.clj:104)
	at regexdna$one_replacement.invoke(regexdna.clj:55)
	at clojure.lang.PersistentList.reduce(
	at clojure.core$reduce.invoke(core.clj:6518)
	at regexdna$_main.doInvoke(regexdna.clj:80)
	at clojure.lang.RestFn.applyTo(
	at regexdna.main(Unknown Source)

Revised BSD license

  Home   Conclusions   License   Play