How can two ordered lists be synchronized with the minimum of insert and delete operations, given that only those operations are allowed? Let’s take a cut at this problem, using Python lists for illustration.
(Editorial note: There’s probably an elegant, well-known solution to this problem, but I can’t find it right now.)
Terms
Let’s clarify the problem a bit: Our goal is to produce one or more lists of operations that, when applied to a first, subject list will render it identical to a second, target list. Both the subject and target lists are ordered, and may contain duplicates. The subject and target lists may be of different lengths.
Representation
To begin with, let’s set out how we’ll represent the results of our algorithm: the insert and delete operations to be performed.
An insert will be represented as an (index, value)
tuple, where index
represents the final position of the element after synchronization. I.e., if we need to insert a 5
to create the final list [1, 3, 5, 7]
, the insert would always be represented by the tuple (2, 5)
, irrespective of any other operations to be performed during the synchronization.
A delete will be represented as an index
, representing the original position of the element before synchronization. I.e., if we need to remove a 4
from the original list [2, 4, 6, 8]
, the delete would always be represented by the index 1
, irrespective of any other operations to be performed during the synchronization.
Obviously, operations can’t be applied in an arbitrary order. The order in which the operations produced by this algorithm are intended to be applied is:
- First, all deletes, ordered from the highest to the lowest index
- Then, all inserts, ordered from the lowest to the highest index
Code
Here’s the algorithm in its entirety:
def sync_ordered_list(a, b):
# a is the subject list
# b is the target list
# x is the "current position" in the subject list
# y is the "current position" in the target list
# i is the list of inserts
# d is the list of deletes
# Initialize iterators and accumulators
x = 0; y = 0; i = []; d = []
# While there are unexamined elements in either the subject or target list:
while (x < len(a)) or (y < len(b)):
# If the target list is exhausted,
# delete the current element from the subject list
if y >= len(b): d.append(x); x += 1
# O/w, if the subject list is exhausted,
# insert the current element from the target list
elif x >= len(a): i.append((y, b[y])); y += 1
# O/w, if the current subject element precedes the current target element,
# delete the current subject element.
elif a[x] < b[y]: d.append(x); x += 1
# O/w, if the current subject element follows the current target element,
# insert the current target element.
elif a[x] > b[y]: i.append((y, b[y])); y += 1
# O/w the current elements match; consider the next pair
else: x += 1; y += 1
# Return accumulators
return (i,d)
Analysis
To understand why this works, assume that for any pass through the loop the operations in i
and d
, when applied to a[:x]
, will produce b[:y]
. (This is trivially true when x
and y
are 0, and i
and d
are empty.) Therefore:
- If both lists are exhausted, you’re done.
- O/w, if the target list is exhausted,
a[x]
can’t appear in it, and should be deleted. The operations ini
andd
now synchronizea[:x+1]
andb[:y]
. Incrementx
and loop. Note that our loop invariant holds on the next iteration. - O/w, if the subject list is exhausted,
b[y]
can’t appear in it, and should be inserted. The operations ini
andd
now synchronizea[:x]
andb[:y+1]
. Incrementy
and loop. Note that our loop invariant holds on the next iteration. - O/w, if
a[x]
is ordered beforeb[y]
, thena[x]
cannot appear inb[y:]
, and should be deleted. The operations ini
andd
now synchronizea[:x+1]
andb[:y]
. Incrementx
and loop. Note that our loop invariant holds on the next iteration. - O/w, if
a[x]
is ordered afterb[y]
, thenb[y]
cannot appear ina[x:]
, and must be inserted. The operations ini
andd
now synchronizea[:x]
andb[:y+1]
. Incrementy
and loop. Note that our loop invariant holds on the next iteration. - O/w,
a[x]
andb[y]
match, so the operations ini
andd
actually synchronizea[:x+1]
andb[:y+1]
. Incrementx
andy
and loop. Note that our loop invariant holds on the next iteration.
Efficiency
I’m not going to present an argument that the operation set produced by this algorithm is really minimal, because that doesn’t seem like it would be fun. It does seem pretty hard to see how any operation produced by this algorithm could be omitted, though.
Pingback: Things that were not immediately obvious to me » Blog Archive » Synchronization Efficiency Proof