The Computer Language
Benchmarks Game

regex-dna description

(Not included in summary comparisons)

Variance

Some language implementations have regex built-in; some provide a regex library; some use a third-party regex library. Some – coincidentally – reduce this work to substring matching. (Remember Hennessy and Patterson's warning.)

The regex algorithm implemented is very likely to be different in different libraries.

The work

The work is to use the same simple regex patterns and actions to manipulate FASTA format data. Don't optimize away the work.

How to implement

We ask that contributed programs not only give the correct result, but also use the same algorithm to calculate that result.

Each program should:

diff program output for this 10KB input file (generated with the fasta program N = 1000) with this output file to check your program output has the correct format, before you contribute your program.

Generate a larger input file (using one of the fasta programs with command line arguments: 5000000 > input5000000.txt) to check program performance.