/mobile Handheld Friendly website

 performance measurements

Each table row shows performance measurements for this Perl program with a particular command-line input value N.

 N  CPU secs Elapsed secs Memory KB Code B ≈ CPU Load
50,0000.030.06?567  0% 0% 0% 100%
500,0000.440.4617,540567  4% 0% 0% 100%
5,000,0004.394.41139,804567  0% 0% 0% 100%

Read the ↓ make, command line, and program output logs to see how this program was run.

Read regex-dna benchmark to see what this program should do.

 notes

This is perl 5, version 18, subversion 0 (v5.18.0) built for i686-linux

Compile-time options: HAS_TIMES PERLIO_LAYERS PERL_DONT_CREATE_GVSV
                        PERL_HASH_FUNC_ONE_AT_A_TIME_HARD PERL_MALLOC_WRAP
                        PERL_PRESERVE_IVUV PERL_SAWAMPERSAND USE_LARGE_FILES
                        USE_LOCALE USE_LOCALE_COLLATE USE_LOCALE_CTYPE
                        USE_LOCALE_NUMERIC USE_PERLIO USE_PERL_ATOF

Don't split pattern at |

 regex-dna Perl #7 program source code

# The Computer Language Benchmarks Game
# http://benchmarksgame.alioth.debian.org/
# contributed by Danny Sauer
# completely rewritten and cleaned up for speed and fun by Mirco Wahab
# improved STDIN read, regex clean up by Jake Berner
# more speed and multithreading by Andrew Rodland
# moved alternation out of the regexes into the program logic for speed by Daniel Green

use strict;
use warnings;

my $l_file  = -s STDIN;
my $content; read STDIN, $content, $l_file;
# this is significantly faster than using <> in this case

$content =~ s/^>.*//mg;
$content =~ tr/\n//d;
my $l_code  =  length $content;

my @seq = ( ['agggtaaa', 'tttaccct'],
        ['[cgt]gggtaaa', 'tttaccc[acg]'],
        ['a[act]ggtaaa', 'tttacc[agt]t'],
        ['ag[act]gtaaa', 'tttac[agt]ct'],
        ['agg[act]taaa', 'ttta[agt]cct'],
        ['aggg[acg]aaa', 'ttt[cgt]ccct'],
        ['agggt[cgt]aa', 'tt[acg]accct'],
        ['agggta[cgt]a', 't[acg]taccct'],
        ['agggtaa[cgt]', '[acg]ttaccct'] );

my @procs;
for my $s (@seq) {
  my ($pat_l, $pat_r) = (qr/$s->[0]/, qr/$s->[1]/);
  my $pid = open my $fh, '-|';
  defined $pid or die "Error creating process";
  unless ($pid) {
    my $cnt = 0;
    ++$cnt while $content =~ /$pat_l/gi;
    ++$cnt while $content =~ /$pat_r/gi;
    print "$s->[0]|$s->[1] $cnt\n";
    exit 0;
  }
  push @procs, $fh;
}

for my $proc (@procs) {
  print <$proc>;
  close $proc;
}

my %iub = (         B => '(c|g|t)',  D => '(a|g|t)',
  H => '(a|c|t)',   K => '(g|t)',    M => '(a|c)',
  N => '(a|c|g|t)', R => '(a|g)',    S => '(c|g)',
  V => '(a|c|g)',   W => '(a|t)',    Y => '(c|t)' );

# We could cheat here by using $& in the subst and doing it inside a string
# eval to "hide" the fact that we're using $& from the rest of the code... but
# it's only worth 0.4 seconds on my machine.
my $findiub = '(['.(join '', keys %iub).'])';

$content =~ s/$findiub/$iub{$1}/g;

printf "\n%d\n%d\n%d\n", $l_file, $l_code, length $content;

 make, command-line, and program output logs

Tue, 21 May 2013 20:37:47 GMT

COMMAND LINE:
/usr/local/src/perl-5.18.0_no_ithreads_no_multi/bin/perl regexdna.perl-7.perl 0 < regexdna-input5000000.txt

PROGRAM OUTPUT:
agggtaaa|tttaccct 356
[cgt]gggtaaa|tttaccc[acg] 1250
a[act]ggtaaa|tttacc[agt]t 4252
ag[act]gtaaa|tttac[agt]ct 2894
agg[act]taaa|ttta[agt]cct 5435
aggg[acg]aaa|ttt[cgt]ccct 1537
agggt[cgt]aa|tt[acg]accct 1431
agggta[cgt]a|t[acg]taccct 1608
agggtaa[cgt]|[acg]ttaccct 2178

50833411
50000000
66800214

Revised BSD license

  Home   Conclusions   License   Play