Next: Levenshtein edit distance, Previous: Determine isomorphism, Up: Library modules - Utilities [Contents][Index]

`util.lcs`

- The longest common subsequence- Module:
**util.lcs** -
This module implements the algorithm to find the longest common subsequence of two given sequences. The implemented algorithm is based on Eugene Myers’ O(ND) algorithm (Myers86).

One of the applications of this algorithm is to calculate the difference of two text streams; see Calculate difference of text streams.

- Function:
**lcs***seq-a seq-b :optional eq-fn* Calculates and returns the longest common sequence of two lists,

`seq-a`and`seq-b`. Optional`eq-fn`specifies the comparison predicate; if omitted,`equal?`

is used.(lcs '(x a b y) '(p a q b)) ⇒ (a b)

- Function:
**lcs-with-positions***seq-a seq-b :optional eq-fn* This is the detailed version of

`lcs`

. The arguments are the same.Returns a list of the following structure:

(

`length`((`elt``a-pos``b-pos`) …))`Length`is an integer showing the length of the found LCS. What follows is a list of elements of LCS; each sublist consists of the element, the integer position of the element in`seq-a`, then the integer position of the element in`seq-b`.(lcs-with-positions '(a) '(a)) ⇒ (1 ((a 0 0))) (lcs-with-positions '(x a b y) '(p q a b)) ⇒ (2 ((a 1 2) (b 2 3))) (lcs-with-positions '(x a b y) '(p a q b)) ⇒ (2 ((a 1 1) (b 2 3))) (lcs-with-positions '(x y) '(p q)) ⇒ (0 ())

- Function:
**lcs-fold***a-proc b-proc both-proc seed a b :optional eq-fn* A fundamental iterator over the "edit list" derived from two lists

`a`and`b`.`A-proc`,`b-proc`,`both-proc`are all procedures that take two arguments. The second argument is a intermediate state value of the calculation. The first value is an element only in`a`for`a-proc`, or an element only in`b`for`b-proc`, or an element in both`a`and`b`for`both-proc`. The return value of each procedure is used as the state value of the next call of either one of the procedures.`Seed`is used as the initial value of the state value. The last state value is returned from`lcs-fold`

.The three procedures are called in the following order: Suppose the sequence

`a`consists of`a’``c``a”`, and`b`consists of`b’``c``b”`, where`a’`,`b’`,`a”`, and`b”`are subsequences, and`c`is the head of the LCS of`a`and`b`. Then`a-proc`is called first on each element in`a’`,`b-proc`is called second on each element in`b’`, then`both-proc`is called on`c`. Afterwards, the process is repeated using`a”`and`b”`.

- Function:
**lcs-edit-list***a b :optional eq-fn* Calculates ’edit-list’ from two lists

`a`and`b`, which is the smallest set of commands (additions and deletions) that changes`a`into`b`. This procedure is built on top of`lcs-fold`

above.Returns a list of

*hunk*s, which is a contiguous section of additions and deletions. Each hunk consists of a list of directives, which is a form of:(

`+`|`-``position``element`)Here’s an example. Suppose

`a`and`b`are the following lists, respectively.`a`≡ ("A" "B" "C" "E" "H" "J" "L" "M" "N" "P")`b`≡ ("B" "C" "D" "E" "F" "J" "K" "L" "M" "R" "S" "T")Then,

`(lcs-edit-list a b equal?)`

returns the following list.(((- 0 "A")) ((+ 2 "D")) ((- 4 "H") (+ 4 "F")) ((+ 6 "K")) ((- 8 "N") (- 9 "P") (+ 9 "R") (+ 10 "S") (+ 11 "T")) )

The result consists of five hunks. The first hunk consists of one directive,

`(- 0 "A")`

, which means the element`"A"`

at the position 0 of list`a`has to be deleted. The second hunk also consists of one directive,`(+ 2 "D")`

, meaning the element`"D"`

at the position 2 of list`b`has to be added. The third hunk means`"H"`

at the position 4 of list`a`should be removed and`"F"`

at the position 4 of list`b`should be added, and so on.If you are familiar with Perl’s Algorithm::Diff module, you may notice that this is the same structure that its

`diff`

procedure returns.