The Computer Language
Benchmarks Game

regex-redux Clojure #4 program

source code

;;   The Computer Language Benchmarks Game
;;   http://benchmarksgame.alioth.debian.org/
;;
;;   regex-dna program contributed by: Alex Miller
;;   converted from regex-dna program

(ns regexredux
  (:import [java.io FileInputStream FileDescriptor]
           [java.nio ByteBuffer]
           [java.util.regex Pattern])
  (:gen-class))

(set! *warn-on-reflection* true)
(set! *unchecked-math* true)

(defn -main [& args]
  (let [cin (-> FileDescriptor/in FileInputStream. .getChannel)
        bb (-> cin .size int ByteBuffer/allocate)]
    (.read cin bb)
    (let [input (String. (.array bb) "US-ASCII")
          sequence (.replaceAll input ">.*\n|\n" "")
          replacements (array-map "tHa[Nt]" "<4>"
                                  "aND|caN|Ha[DS]|WaS" "<3>"
                                  "a[NSt]|BY" "<2>"
                                  "<[^>]*>" "|"
                                  "\\|[^|][^|]*\\|" "-")
          buflen (future-call #(let [buf (StringBuffer.)
                                     m (.matcher (Pattern/compile "[WYKMSRBDVHN]") sequence)]
                                 (loop []
                                   (when (.find m)
                                     (.appendReplacement m buf "")
                                     (.append buf (get replacements (.group m)))
                                     (recur)))
                                 (.appendTail m buf)
                                 (.length buf)))
          variants ["agggtaaa|tttaccct", "[cgt]gggtaaa|tttaccc[acg]",
                    "a[act]ggtaaa|tttacc[agt]t", "ag[act]gtaaa|tttac[agt]ct",
                    "agg[act]taaa|ttta[agt]cct", "aggg[acg]aaa|ttt[cgt]ccct",
                    "agggt[cgt]aa|tt[acg]accct", "agggta[cgt]a|t[acg]taccct",
                    "agggtaa[cgt]|[acg]ttaccct"]
          match-fn (fn [^String v]
                     (let [m (.matcher (Pattern/compile v) sequence)]
                       (loop [c 0]
                         (if (.find m)
                           (recur (inc c))
                           [v c]))))
          results (pmap match-fn variants)]
      (doall (for [[variant c] results] (println variant c)))
      (println)
      (println (.length input))
      (println (.length sequence))
      (println @buflen)
      (shutdown-agents))))
    

notes, command-line, and program output

NOTES:
64-bit Ubuntu quad core
Clojure 1.8.0
java version "1.8.0_45"
Java(TM) SE Runtime Environment (build 1.8.0_45-b14)
Java HotSpot(TM) 64-Bit Server VM (build 25.45-b02, mixed mode)


Mon, 20 Mar 2017 18:41:03 GMT

MAKE:
mv regexredux.clojure-4.clojure regexredux.clj
/usr/local/src/jdk1.8.0_121/bin/java -Dclojure.compiler.direct-linking=true -Dclojure.compile.path=. -cp .:/usr/local/src/clojure/clojure-1.8.0.jar clojure.lang.Compile regexredux
Compiling regexredux to .
1.21s to complete and log all make actions

COMMAND LINE:
/usr/local/src/jdk1.8.0_121/bin/java -Xmx1024m -cp .:/usr/local/src/clojure/clojure-1.8.0.jar regexredux 0 < regexredux-input50000.txt

UNEXPECTED OUTPUT 

13c13
< 599189
---
> 273927

PROGRAM OUTPUT:
agggtaaa|tttaccct 3
[cgt]gggtaaa|tttaccc[acg] 12
a[act]ggtaaa|tttacc[agt]t 43
ag[act]gtaaa|tttac[agt]ct 27
agg[act]taaa|ttta[agt]cct 58
aggg[acg]aaa|ttt[cgt]ccct 16
agggt[cgt]aa|tt[acg]accct 15
agggta[cgt]a|t[acg]taccct 18
agggtaa[cgt]|[acg]ttaccct 20

508411
500000
599189