Having just built a solver for KenKen puzzles, this is a good opportunity to take a look at performance tuning in Python. Today we’ll look at some basic instrumentation, and begin searching for bottlenecks in the solver.
Preliminaries
For the purposes of this post, I’m assuming that:
* The solver code is stored in a file named neknek_1.py
* The neknek_1.py file is on the import path
* A 9×9 test puzzle is stored in a file named 9×9.txt
* The 9×9.txt file is in the current working directory
Here is the 9×9 test puzzle I’ll be using for performance measurement:
# 9
* 240 A1 B1 A2
+ 12 B2 A3 B3
! 9 A4
- 5 A5 A6
* 72 A7 B7
+ 7 A8 B8 C8
/ 2 A9 B9
* 56 C3 B4 C4
+ 11 B5 B6
/ 2 C1 C2
+ 10 C5 C6 D6
* 42 C7 D7 D8
+ 27 C9 D9 E9 F9 G9
/ 2 D1 E1
+ 19 D2 D3 D4
+ 16 D5 E5 F5
! 3 E2
- 1 E3 E4
+ 24 E6 E7 E8
* 252 F1 G1 G2 G3
+ 6 F2 F3 F4
* 216 F6 G5 G6
+ 15 F7 G7 F8
+ 16 G4 H4 I4
* 90 G8 H8 I8
+ 23 H1 I1 I2
+ 15 H2 H3 I3
- 3 H5 H6
* 12 H7 I7
- 7 H9 I9
- 2 I5 I6
Timeit
The simplest Python tuning tool is probably the timeit
module. From PythonWin it would be invoked this way:
>>> import timeit
>>> timeit.Timer('neknek_1.solve(neknek_1.Puzzle("9x9.txt"))', 'import neknek_1').timeit(number=10)/10
This code will load and solve the 9×9 puzzle 10 times, and return the average time (as a float, in seconds) that a single load-and-solve loop took.
Timeit is a crude but effective way to see the impact of optimizations on a macro level. On my machine, at the moment, it returns an estimate of 0.45s per 9×9 puzzle. (Note that this is significantly lower than what I observed yesterday, indicating that some care must be used when evaluating the results of this module.)
Hotshot
The hotshot
module a true profiler; it will log individual function entry/exit information during program execution, writing the results to a file. The output file can be parsed by tools packaged with the hotshot
module. From PythonWin, the hotshot
profiler would be invoked this way:
>>> import hotshot
>>> import neknek_1
>>> prof = hotshot.Profile('hotshot_nn_stats')
>>> prof.run('neknek_1.solve(neknek_1.Puzzle("9x9.txt"))').close()
To view a summary of the profiler’s results, use these commands:
>>> import hotshot.stats
>>> hotshot.stats.load('hotshot_nn_stats').strip_dirs().sort_stats('time').print_stats()
which will output:
297518 function calls (36546 primitive calls) in 3111.408 CPU seconds
Ordered by: internal time
ncalls tottime percall cumtime percall filename:lineno(function)
133583/11855 1221.719 0.009 2028.554 0.171 neknek_1.py:18()
115588/4363 923.962 0.008 2145.681 0.492 neknek_1.py:15(can_make_sum_p)
44 207.178 4.709 3086.981 70.159 neknek_1.py:112(constrain)
14470/1329 150.414 0.010 283.972 0.214 neknek_1.py:21(can_make_product_p)
18308/3559 133.558 0.007 247.684 0.070 neknek_1.py:24()
640 131.804 0.206 2760.608 4.313 neknek_1.py:44(apply)
1000 118.984 0.119 118.984 0.119 neknek_1.py:78(apply)
2450 59.254 0.024 2570.519 1.049 neknek_1.py:48()
2528 58.285 0.023 58.285 0.023 neknek_1.py:45()
4378 54.718 0.012 2200.400 0.503 neknek_1.py:56(_test_component)
1392 18.611 0.013 302.583 0.217 neknek_1.py:67(_test_component)
44/1 7.363 0.167 2112.332 2112.332 neknek_1.py:132(search)
1213 6.575 0.005 6.575 0.005 neknek_1.py:137()
910 5.532 0.006 5.532 0.006 neknek_1.py:63(_test_component)
66/2 2.864 0.043 2111.620 1055.810 neknek_1.py:139()
288 2.750 0.010 2.750 0.010 neknek_1.py:74(_test_component)
1 2.416 2.416 3109.565 3109.565 neknek_1.py:100(solve)
1 1.215 1.215 1.740 1.740 neknek_1.py:91(__init__)
... clipped ...
Initial Observations
The first thing to notice is the absolutely stupendous amount of time devoted to can_make_sum_p()
– nearly 2/3rds of the execution time is devoted to this single utility function. If you recall, it’s source code is:
# Can we select exactly one member from each set s.t. the sum of all selected members is t?
def can_make_sum_p(t, sets):
if (not sets): return (t == 0)
head = sets[0]; tail = sets[1:]
return any(can_make_sum_p(t-e, tail) for e in head if e <= t)
Let's see if we can speed this up a little.
Base Case
The simplest thing to address is the base case. Right now, the base case of the recursive call is reached when sets
is empty; at that point, the function returns True
iff t
equals 0. This is a bit silly; an (almost) equally good base case is reached when sets
contains only one iterable; the function should return True
iff t
is a member of this iterable. The code:
# Can we select exactly one member from each set s.t. the sum of all selected members is t?
def can_make_sum_p(t, sets):
if (len(sets) == 1): return (t in sets[0])
head = sets[0]; tail = sets[1:]
return any(can_make_sum_p(t-e, tail) for e in head if e <= t)
With this single change, timeit
now reports a per-loop time of 0.32s, and the hotshot
profiler reports these results:
94797 function calls (35465 primitive calls) in 1377.959 CPU seconds
Ordered by: internal time
ncalls tottime percall cumtime percall filename:lineno(function)
21749/4177 282.142 0.013 456.155 0.109 neknek_1.py:15(can_make_sum_p)
44 179.129 4.071 1354.233 30.778 neknek_1.py:112(constrain)
24660/11003 174.013 0.007 340.939 0.031 neknek_1.py:18()
14592/1357 160.274 0.011 308.033 0.227 neknek_1.py:21(can_make_product_p)
18409/3670 147.759 0.008 271.829 0.074 neknek_1.py:24()
635 121.681 0.192 1072.522 1.689 neknek_1.py:44(apply)
1039 102.325 0.098 102.325 0.098 neknek_1.py:78(apply)
2424 57.304 0.024 897.517 0.370 neknek_1.py:48()
2489 53.324 0.021 53.324 0.021 neknek_1.py:45()
4192 49.542 0.012 505.696 0.121 neknek_1.py:56(_test_component)
1422 18.537 0.013 326.570 0.230 neknek_1.py:67(_test_component)
44/1 7.163 0.163 1072.548 1072.548 neknek_1.py:132(search)
1213 6.361 0.005 6.361 0.005 neknek_1.py:137()
947 5.310 0.006 5.310 0.006 neknek_1.py:63(_test_component)
256 2.637 0.010 2.637 0.010 neknek_1.py:74(_test_component)
66/2 2.626 0.040 1071.892 535.946 neknek_1.py:139()
1 2.513 2.513 1376.102 1376.102 neknek_1.py:100(solve)
1 1.228 1.228 1.761 1.761 neknek_1.py:91(__init__)
... clipped ...
As you can see, changing the base case results in an ~84% reduction in the calls to can_make_sum_p()
, and an ~29% reduction in execution time.
Coming Attractions
Tomorrow we'll explore further optimizations, using the same tools.