Let’s continue tuning our KenKen puzzle solver. When we left off yesterday, we’d cut execution time by 30% with a one-line change to the program, but there is still much room for improvement.
Recap
After our last change, the top few items in the hotshot
stats looked like this:
94797 function calls (35465 primitive calls) in 1377.959 CPU seconds
Ordered by: internal time
ncalls tottime percall cumtime percall filename:lineno(function)
21749/4177 282.142 0.013 456.155 0.109 neknek_1.py:15(can_make_sum_p)
44 179.129 4.071 1354.233 30.778 neknek_1.py:112(constrain)
24660/11003 174.013 0.007 340.939 0.031 neknek_1.py:18()
14592/1357 160.274 0.011 308.033 0.227 neknek_1.py:21(can_make_product_p)
18409/3670 147.759 0.008 271.829 0.074 neknek_1.py:24()
635 121.681 0.192 1072.522 1.689 neknek_1.py:44(apply)
... clipped ...
Memoization
The can_make_sum_p()
function is still consuming the plurality of the time. My guess is that it’s called with fairly repetitive arguments, so it’s a good candidate for memoization. The natural (quick-and-dirty) way to code this would be:
# Can we select exactly one member from each set s.t. the sum of all selected members is t?
d_sum_queries = {}
def can_make_sum_p(t, sets):
if (len(sets) == 1): return (t in sets[0])
r = d_sum_queries.get((t, sets))
if (r == None):
head = sets[0]; tail = sets[1:]
r = any(can_make_sum_p(t-e, tail) for e in head if e <= t)
d_sum_queries[(t, sets)] = r
return r
However, this doesn't work. The sets
argument is a list of lists, and lists aren't hashable. We'll need to revise Constraint.apply()
:
class Constraint(object):
# ... snip ...
def apply(self, solution):
d_sets = dict((c, tuple(map(int, solution[c]))) for c in self.cells); d_bad = {}
for k,v in d_sets.items():
others = tuple(ov for ok,ov in d_sets.items() if ok != k)
d_bad[k] = ''.join(str(e) for e in v if not self._test_component(e, others))
return d_bad
We should also clear the memoization dictionary from time to time; A good time to do so seems to be when the solver is first invoked:
def solve(puzzle):
# ... snip ...
# Solve
d_sum_queries = {}
symbols = string.digits[1:1+puzzle.size]
return search(constrain(dict((c,symbols) for c in d_constraints.keys()), *puzzle.cages))
Timeit reports that these changes save around 0.03s per loop (a change from ~0.29s/loop before the change to ~0.26s/loop afterwards). This is a gain of a further 10%. The hotshot
stats tell a similarly encouraging story:
53889 function calls (29397 primitive calls) in 901.370 CPU seconds
Ordered by: internal time
ncalls tottime percall cumtime percall filename:lineno(function)
44 166.225 3.778 879.568 19.990 neknek_1.py:117(constrain)
613 159.227 0.260 615.020 1.003 neknek_1.py:49(apply)
12704/1182 116.667 0.009 221.817 0.188 neknek_1.py:26(can_make_product_p)
16039/3214 105.150 0.007 194.698 0.061 neknek_1.py:29()
1028 98.114 0.095 98.114 0.095 neknek_1.py:83(apply)
2397 53.202 0.022 53.202 0.022 neknek_1.py:50()
2333 49.252 0.021 378.650 0.162 neknek_1.py:53()
4368/4352 44.640 0.010 44.814 0.010 neknek_1.py:16(can_make_sum_p)
4366 41.279 0.009 86.093 0.020 neknek_1.py:61(_test_component)
5608 23.941 0.004 23.941 0.004 neknek_1.py:52()
1247 13.506 0.011 235.324 0.189 neknek_1.py:72(_test_component)
44/1 6.579 0.150 748.179 748.179 neknek_1.py:137(search)
898 5.546 0.006 5.546 0.006 neknek_1.py:68(_test_component)
1213 5.536 0.005 5.536 0.005 neknek_1.py:142()
66/2 2.452 0.037 747.517 373.758 neknek_1.py:144()
278 2.435 0.009 2.435 0.009 neknek_1.py:79(_test_component)
1 2.338 2.338 899.578 899.578 neknek_1.py:105(solve)
1 1.144 1.144 1.685 1.685 neknek_1.py:96(__init__)
... clipped ...
Generalization
Since can_make_product_p()
is also consuming a lot of time, and since it will yield to exactly the same techniques applied to can_make_sum_p()
, let's change its base case, and memoize it as well:
# Can we select exactly one member from each set s.t. the product of all selected members is t?
d_prod_queries = {}
def can_make_product_p(t, sets):
if (len(sets) == 1): return (t in sets[0])
r = d_prod_queries.get((t, sets))
if (r == None):
head = sets[0]; tail = sets[1:]
r = any(can_make_product_p(t/e, tail) for e in head if not t%e)
d_prod_queries[(t, sets)] = r
return r
We should also clear the product memoization dictionary when appropriate:
def solve(puzzle):
# ... snip ...
# Solve
d_sum_queries = {}
d_prod_queries = {}
symbols = string.digits[1:1+puzzle.size]
return search(constrain(dict((c,symbols) for c in d_constraints.keys()), *puzzle.cages))
Timeit reports that these changes save around another 0.03s per loop (a change from ~0.26s/loop before the change to ~0.23s/loop afterwards). This is a gain of a further 11%. The hotshot
stats tell a similarly encouraging story:
27560 function calls (27332 primitive calls) in 682.838 CPU seconds
Ordered by: internal time
ncalls tottime percall cumtime percall filename:lineno(function)
638 163.215 0.256 415.008 0.650 neknek_1.py:54(apply)
44 156.274 3.552 661.793 15.041 neknek_1.py:122(constrain)
959 90.298 0.094 90.298 0.094 neknek_1.py:88(apply)
2526 54.629 0.022 54.629 0.022 neknek_1.py:55()
2437 49.478 0.020 171.874 0.071 neknek_1.py:58()
4637/4588 46.830 0.010 47.337 0.010 neknek_1.py:16(can_make_sum_p)
4601 42.676 0.009 90.013 0.020 neknek_1.py:66(_test_component)
6086 25.290 0.004 25.290 0.004 neknek_1.py:57()
1249 12.826 0.010 25.222 0.020 neknek_1.py:77(_test_component)
1188 12.396 0.010 12.396 0.010 neknek_1.py:27(can_make_product_p)
44/1 6.278 0.143 608.917 608.917 neknek_1.py:142(search)
1213 5.042 0.004 5.042 0.004 neknek_1.py:147()
897 4.843 0.005 4.843 0.005 neknek_1.py:73(_test_component)
66/2 2.379 0.036 608.273 304.137 neknek_1.py:149()
1 2.337 2.337 680.882 680.882 neknek_1.py:110(solve)
275 2.317 0.008 2.317 0.008 neknek_1.py:84(_test_component)
1 1.301 1.301 1.847 1.847 neknek_1.py:101(__init__)
... clipped ...
Safety
At this point, we should attend to a minor corner case. Since we changed the base cases of can_make_sum_p()
and can_make_product_p()
from 0 to 1, these functions can no longer be safely called with empty sets
arguments. In turn, this means that they can no longer be used by Sum
and Prod
constraints applied to only one cell. (Such constraints would invoke _test_component()
with an empty context.) Therefore, let's code the Sum
and Prod
constructors to throw exceptions if these constraints aren't being applied to at least two cells:
class Sum(Constraint):
def __init__(self, value, *cells):
Constraint.__init__(self, value, *cells)
if (len(cells) < 2): raise Exception('Sum constraints must be applied to 2 or more cells')
def _test_component(self, component, context):
return (self.value>=component) and can_make_sum_p(self.value-component, context)
class Prod(Constraint):
def __init__(self, value, *cells):
Constraint.__init__(self, value, *cells)
if (len(cells) < 2): raise Exception('Prod constraints must be applied to 2 or more cells')
def _test_component(self, component, context):
return (not self.value%component) and can_make_product_p(self.value/component, context)
Coming Attractions
Tomorrow we'll consider some slightly larger-scale optimizations. Until then, you can view the latest version of the solver.