Heckel

Japaneses description (Myers comming soon.)

Let's think about getting difference between Array O and Array N.

In Heckel Algorithm, these three datas are important.

symbol table
old element references
new element references

First, symbol table are described. Symbol table is Dictionary whose key is the elements included in both Array O and N. Like below, it's key is a hash value of each elements and value is a symbol table entry implementally.

Because the key is the hash value of each elements, you can get same value if elements are same. Using this feature, symbol table manages all elements appearing in Array O and N.

Symbol table entry is a dictionary value which has two Counters, Counter for Array O and N, and index of the key element in Array O, called OLNO.

In fact, .zero .one, .many cases are enough for Counter. In Heckel Algorithm, the elements which included in both arrays and the number of them in each arrays is just one, called shared unique element, will be source for getting difference. So Counter does not need having actual number.

Implementally, OLNO can be an associated value of case .one, like below.

// 1 is the only shared unique element.
let O: [Int] = [1, 2, 3, 3] // 1 and 2: unique, 3: not unique
let N: [Int] = [1, 2, 2, 3] // 1 and 3: unique, 2: not unique

<E: Hashable>

var symbolTable: [Int: SymbolTableEntry] = [:]

enum Counter {
  case zero
  case one(index: Int) // OLNO
  case many

  mutating func increment(withIndex index: Int) {
    switch self {
      case .zero:
        self = .one(index: index)

        default:
        self = .many
      }
  }
}

class SymbolTableEntry {
  var oldCounter: Counter
  var newCounter: Counter
}

Summarize for symbol table

How many elements are included in both arrays? -> Counter
Where is the element included in array O? -> associated value of case .one

symbol table is a dictionary data manages these information

Next, describing about 2, 3: old element references, new element references. First of all, these two arrays are correspoding to array O and N one to one. Let each arrays OA and NA.

Now, only symbol table has information of array O and N except O and N themselves. It has the information as a key. Newly, arrays OA and NA will manage a reference where the correspoding element exists. The candidates of reference are .theOther or .symbolTable. For array O, .theOther reference means array N and vice versa.

.symbolTable reference has the entry for symbol table as an associated value.
.theOther reference has the index of the other array as an associated value.

enum ElementReference {
  case symbolTable(entry: SymbolTableEntry)
  case theOther(index: Int)
}

Here, Summarize all characters.

symbol table :
- Dictionary whose key is the hash value of elements in array O and N
- only one table are shared by O and N.
- symbol table entry: the value of symbol table
  - How many elements are included in both arrays?
  - Where is the element included in array O?
Old
- O: original old array (oldArray)
- OA: element reference array is correspoding to array O one to one (oldElementReferences)
New
- N: original new array (newArray)
- NA: element reference array is correspoding to array N one to one (newElementReferences)

The name inside parenthesis is a variable name written in Swift. No other characters appears from this. The essense of Heckel Algorithm is combining these data and the steps described below.

<a name="6steps"> 6-Steps

Step-1

Let's proceed to Step-1. Here, all elements in array N are registered to symbol table.

newArray.forEach { element in
  let entry = symbolTable[element.hashValue] ?? SymbolTableEntry()
  entry.newCounter.increment(withIndex: 0)
  newElementReferences.append(.symbolTable(entry: entry))
  symbolTable[element.hashValue] = entry
}

The index 0 passing to increment(withindex:) has no meaning. The argument is OLNO, so for array N, it is meaingless.

Step-2

In step-2, same operation will be done for array O. Here, OLNO must be considered, so counter is incremented with index.

oldArray.enumerated().forEach { index, element
  let entry = symbolTable[element.hashValue] ?? TableEntry()
  entry.oldCounter.increment(withIndex: index)
  oldElementReferences.append(.symbolTable(entry: entry))
  symbolTable[element.hashValue] = entry
}

The index passing to incremented(withIndex:) is OLNO and it is retained as an associated value of case .one

Two steps out of six have already done.

Next, let's proceed to step-3, 4, 5.

In step-1, 2, only .symbolTable is registered to newElementReferences and oldElementReferences. So all references are .symbolTable, of course.

Here, we will shift the reference to .theOther as much as possible, called shifting operation. If the element does not exist in the other array, the shifting operation cannot be done. Then, the reference remains as it was, or .symbolTable.

Finally,

.symbolTable -> elements does not exist in the other.
.theOther -> elements are appeared in both arrays.

these relationship are built. The above relationship is equivalent to the below relationship.

.symbolTable -> delete, insert
.theOther -> move

Step-3

In step-3, shifting operation is performed for shared unique element.

newElementReferences.enumerated().forEach { newIndex, reference in
  guard case let .symbolTable(entry: entry) = reference,
    case .one(let oldIndex) = entry.oldCounter,
    case .one = entry.newCounter else { return }

  newElementReferences[newIndex] = .theOther(index: oldIndex)
  oldElementReferences[oldIndex] = .theOther(index: newIndex)
}

It seems that finding a shared element needs a loop inside a loop like below.

for n in newArray {
  for o in oldArray {
    // finding shared element
  }
}

But, in Heckel Algorithm, sharing symbol table and managing OLNO, only single loop will be enough for finding shared element. The former method takes O(N x M) time to accomplish calculation. Avoiding that is very important point.

Step-4

Next, step-4. In step-3 shifting operation was performed only for shared unique element. Here, the operation will be done for immediately adjacent to the .theOther pairs.

newElementReferences.enumerated().forEach { newIndex, _ in
  guard case let .theOther(index: oldIndex) = newElementReferences[newIndex],
    oldIndex < oldElementReferences.count - 1, newIndex < newElementReferences.count - 1,
    case let .symbolTable(entry: newEntry) = newElementReferences[newIndex + 1],
    case let .symbolTable(entry: oldEntry) = oldElementReferences[oldIndex + 1],
    newEntry === oldEntry else { return }

  newElementReferences[newIndex + 1] = .theOther(index: oldIndex + 1)
  oldElementReferences[oldIndex + 1] = .theOther(index: newIndex + 1)
}

Implementally, the next index of elements which have .theOther reference will be the target of the operation.

Step-5

Same operation will be done in descending order. This means the previous index will be the target.

newElementReferences.enumerated().reversed().forEach { newIndex, _ in
  guard case let .theOther(index: oldIndex) = newElementReferences[newIndex],
    oldIndex > 0, newIndex > 0,
    case let .symbolTable(entry: newEntry) = newElementReferences[newIndex - 1],
    case let .symbolTable(entry: oldEntry) = oldElementReferences[oldIndex - 1],
    newEntry === oldEntry else { return }

  newElementReferences[newIndex - 1] = .theOther(index: oldIndex - 1)
  oldElementReferences[oldIndex - 1] = .theOther(index: newIndex - 1)
}

Five steps out of six have already done.

Step-6

In step-1 to step-5, arrays OA and NA were determined. Difference scripts, .delete, .insert, .move, are converted from the references contained in OA and NA.

reference to .symbolTable = not shared element
reference to .theOther = shared element

Apparently, the above relationship is built.

Not shared element included in array O must be deleted.
Not shared element included in array N must be inserted.
Shared element must be moved to adjust index.

References can be converted like above. In step-6, the convertion will be done.

enum Difference<E> {
  case delete(element: E, index: Int)
  case insert(element: E, index: Int)
  case move(element: E, fromIndex: Int, toIndex: Int)
}

var differences: [Difference<T>] = []

oldElementReferences.enumerated().forEach { oldIndex, reference in
  guard case .symbolTable = reference else { return }
  differences.append(.delete(element: oldArray[oldIndex], index: oldIndex))
}

newElementReferences.enumerated().forEach { newIndex, reference in
  switch reference {
    case .symbolTable:
      differences.append(.insert(element: newArray[newIndex], index: newIndex))

    case let .theOther(index: oldIndex):
      differences.append(.move(element: newArray[newIndex], fromIndex: oldIndex, toIndex: newIndex))
  }
}

Here, we need be careful for move script. Assuming that array O and N are completely same. In this case, all references will be .theOther, so all scripts will be move. But this move's fromIndex and toIndex are same.

They are useless move scripts. Some index restrictions will be needed for removing such verbose move commands.

Like this, the difference got by Heckel Algorithm cannot be shortest. The major characteristics of it is linear-time calculation.

Myers Difference Algorithm

This algorithm was merged into swift-algorithm-club

Myers Difference Algorithm is an algorithm that finds a longest common subsequence(LCS) or shortest edit scripts(SES) of two sequences. MDA can accomplish this in O(ND) time, where N is the sum of the lengths of the two sequences. The common subsequence of two sequences is the sequence of elements that appear in the same order in

DifferenceAlgorithmComparison

Install / Use

README