Please don't optimize the cumulative-probabilities lookup (for example, by using a scaling factor) or naïve LCG arithmetic - those programs will not be accepted.
How to implement
We ask that contributed programs not only give the correct result, but also use the same algorithm to calculate that result.
Each program should:
generate DNA sequences, by copying from a given sequence
generate DNA sequences, by weighted random selection from 2 alphabets
convert the expected probability of selecting each nucleotide into cumulative probabilities
match a random number against those cumulative probabilities to select each nucleotide (use linear search or binary search)
use this naïve linear congruential generator to calculate a random number each time a nucleotide needs to be selected (don't cache the random number sequence)
IM = 139968 IA = 3877 IC = 29573 Seed = 42 Random (Max) Seed = (Seed * IA + IC) modulo IM = Max * Seed / IM
diff program output N = 1000 with this 10KB output file to check your program output has the correct format, before you contribute your program.
Use a larger command line argument (25000000) to check program performance.