/mobile Handheld Friendly website
Ubuntu : Intel® Q6600® one core |
Each table row shows performance measurements for this Python 3 program with a particular command-line input value N.
| N | CPU secs | Elapsed secs | Memory KB | Code B | ≈ CPU Load |
|---|---|---|---|---|---|
| 250,000 | 1.31 | 1.47 | 44,896 | 624 | 1% 0% 1% 93% |
| 2,500,000 | 11.81 | 12.01 | 52,848 | 624 | 1% 0% 0% 100% |
| 25,000,000 | 123.50 | 124.75 | 475,820 | 624 | 1% 0% 0% 100% |
Read the ↓ make, command line, and program output logs to see how this program was run.
Read k-nucleotide benchmark to see what this program should do.
Python 3.3.1 (default, Apr 11 2013, 12:56:47) [GCC 4.7.2] on linux
Notice that sought sequences (GGT GGTA GGTATT GGTATTTTAATT GGTATTTTAATTTATAGT) newer overlap and create new function find_seq() with built-in function count(). Previous program iterated all data in function sort_seq() - char by char.
Improved gen_func() - removed "if" statement and used defaultdict() function
BUT skips the sequence generation in the sequence finder step. def find_seq(seq, nucleo): count = seq.count(nucleo) return nucleo, count
# The Computer Language Benchmarks Game # http://benchmarksgame.alioth.debian.org/ # # submitted by Ian Osgood # modified by Sokolov Yura # modified by bearophile # modified by jacek2v: few changes in algorytm, added multiprocessing, used str.count (nucleo newer overlapping) from sys import stdin from collections import defaultdict from multiprocessing import Process, Pool, cpu_count def gen_freq(seq, frame): frequences = defaultdict(int) ns = len(seq) + 1 - frame for ii in range(ns): frequences[seq[ii:ii + frame]] += 1 return ns, frequences def sort_seq(seq, length): n, frequences = gen_freq(seq, length) #l = sorted(frequences.items(), reverse=True, key=lambda (seq,freq): (freq,seq)) l = sorted(list(frequences.items()), reverse=True, key=lambda seq_freq: (seq_freq[1],seq_freq[0])) return [(st, 100.0*fr/n) for st, fr in l] def find_seq(seq, nucleo): count = seq.count(nucleo) return nucleo, count def load(): for line in stdin: if line[0:3] == ">TH": break seq = [] for line in stdin: if line[0] in ">;": break seq.append( line[:-1] ) return seq def main(): nucleos = "GGT GGTA GGTATT GGTATTTTAATT GGTATTTTAATTTATAGT" sequence = "".join(load()).upper() plres = [] pl = Pool(processes=cpu_count() + 1) for nl in 1,2: plres.append(pl.apply_async(sort_seq, (sequence, nl, ))) for se in nucleos.split(): plres.append(pl.apply_async(find_seq, (sequence, se, ))) pl.close() pl.join() for ii in 0,1: print('\n'.join("%s %.3f" % (st, fr) for st,fr in plres[ii].get())) print('') for ii in range(2, len(nucleos.split()) + 2): print("%d\t%s" % (plres[ii].get()[1], plres[ii].get()[0])) main()
Thu, 11 Apr 2013 20:47:24 GMT MAKE: mv knucleotide.python3-2.python3 knucleotide.python3-2.py 0.01s to complete and log all make actions COMMAND LINE: /usr/local/src/Python-3.3.1/bin/python3.3 knucleotide.python3-2.py 0 < knucleotide-input25000000.txt PROGRAM OUTPUT: A 30.295 T 30.151 C 19.800 G 19.754 AA 9.177 TA 9.132 AT 9.131 TT 9.091 CA 6.002 AC 6.001 AG 5.987 GA 5.984 CT 5.971 TC 5.971 GT 5.957 TG 5.956 CC 3.917 GC 3.911 CG 3.909 GG 3.902 1471758 GGT 446535 GGTA 47336 GGTATT 893 GGTATTTTAATT 893 GGTATTTTAATTTATAGT