100% found this document useful (4 votes)

3K views5 pages

A Technique For Isolating Differences Between Files

A simple algorithm is described for isolating the differences between two files. One application is the comparing of two versions of a source program or other file in order to display all differences. The algorithm isolates differences in a way that corresponds closely to our intuitive notion of difference, is easy to implement, and is computationally efficient, with time linear in the file length. For most applications the algorithm isolates differences similar to those isolated by the longest common subsequence. Another application of this algorithm merges files containing independently generated changes into a single file. The algorithm can also be used to generate efficient encodings of a file in the form of the differences between itself and a given “datum” file, permitting reconstruction of the original file from the diference and datum files.

Uploaded by

joyceschan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

100% found this document useful (4 votes)

3K views5 pages

A Technique For Isolating Differences Between Files

Uploaded by

joyceschan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 5

Programming S.L. G r a h a m , R.L.

Rivest,
Techniques Editors

A Technique for 1. Introduction

Isolating Differences The problem of finding all differences between two

files occurs in several contexts. This discussion is
Between Files directed toward building a tool that locates differences
in source programs and other text files. Once under-
Paul Heckel stood, the technique described is transferable to related
Interactive Systems Consultants applications.
A tool that compares different versions of text files
can be used for source program maintenance in at
least four ways: (1) as an editing aid, to verify modifi-
cations and detect spurious edits; (2) as a debugging
aid, to find the differences between a version of a
program that is known to work and one that does not
A simple algorithm is described for isolating the work; (3) as a program maintenance aid for locating
differences between two files. One application is the and merging program changes introduced by several
comparing of two versions of a source program or different people (or organizations); and (4) as a quality
other file in order to display all differences. The control tool to highlight the differences between the
algorithm isolates differences in a way that output of the latest version of a program and the
corresponds closely to our intuitive notion of output of an earlier version.
difference, is easy to implement, and is This tool is also useful when documents, or any
computationally efficient, with time linear in the file text files, are edited and maintained on a computer.
length. For most applications the algorithm isolates Text-editing and word-processing systems such as QEO
differences similar to those isolated by the longest [2] and WYLBVR [4] rarely have a file-compare capabil-
common subsequence. Another application of this ity. This capability is useful when new editions of
algorithm merges files containing independently documents are issued in which all changes from a
generated changes into a single f'de. The algorithm can previous edition are to be underlined or otherwise
also be used to generate efficient encodings of a file in identified.
the form of the differences between itself and a given
"datum" file, permitting reconstruction of the original 2. Current Methods of File Comparison
file from the difference and datum files.
Key Words and Phrases: difference isolation, word Two other methods of file comparison are known.
processing, text editing, program maintenance, In the first, the files are scanned, and whenever a
hashcoding, file compression, bandwidth compression, difference is detected, the next M lines are scanned
longest common subsequence, file comparison, until N consecutive identical lines are found. D E C ' s
molecular evolution F I L E C O M [3] is implemented this way. This technique
C R Categories: 3.63, 3.73, 3.81, 4.43 has two disadvantages: (1) M and/or N usually have to
be tuned by the user to get useful output, and (2) if a
whole block of text is moved, then all of it, rather than
just the beginning and end, is detected as changed.
This can be misleading, for any changes made within a
moved block are not highlighted.
General permission to make fair use in teaching or research of
all or part of this material is granted to individual readers and to The algorithm described here avoids these difficul-
nonprofit libraries acting for them provided that A C M ' s copyright ties. It detects differences that correspond very closely
notice is given and that reference is made to the publication, to its to our intuitive notion of difference. In particular, if a
date of issue, and to the fact that reprinting privileges were granted
by permission of the Association for Computing Machinery. To block of text is moved, only its beginning and end are
otherwise reprint a figure, table, other substantial excerpt, or the detected as different, and any other differences within
entire work requires specific permission as does republication, or the moved block will be highlighted.
systematic or multiple reproduction.
Author's Address: Interactive Systems Consultants, P.O. Box 2345, The second method is the longest c o m m o n subse-
Palo Alto, CA 94305. quence (LCS) method, which has received some atten-
© 1978 ACM 0001-0782/78/0400-0264 $00.75 tion in the literature [1, 6, 7, 10]. Given strings (or
264 Communications April 1978
of Volume 21
the ACM Number 4
files) A = ala2 • • • a , , B = bib2 • • • b , , , and C = clcz takes worst-case O ( m n l o g n ) time and O ( m n ) space. As
• . . c,,, C is a c o m m o n subsequence of A and B if n - a practical LCS implementation, it uses several clever
p characters can be deleted from string A to produce techniques and is neither concise nor easy to under-
string C and m - p characters can be deleted from stand.
string B to produce string C. Given strings A and B, The method described here avoids these computa-
the LCS problem is to find the longest string C that is tional problems. It is concise and easy to understand
a c o m m o n subsequence of A and B. When this is and takes linear time and space for all cases. It also
done, the file comparison output can be generated by produces output that is probably as good as or better
simultaneously scanning strings A, B, and C, flagging than the LCS method in most practical cases.
characters that appear in A but not in C one way, and
flagging those that appear in B but not in C another
way. 3. The Algorithm
The LCS method has strong appeal: It is a simple
formal statement of the problem that yields good To compare two files, it is usually convenient to
results. However, it has two basically different disad- take a line as the basic unit, although other units are
vantages. The first is that it is not necessarily the possible, such as word, sentence, paragraph, or char-
correct formalization of the problem. In the two ex- acter. We approach the problem by finding similarities
amples below, the longest common subsequence is until o n l y d i f f e r e n c e s r e m a i n . We m a k e two
probably not what is desired. In these examples, the observations:
differences are displayed by underlining the inserted
characters and putting minus signs through the deleted 1. A line that o c c u r s o n c e a n d o n l y o n c e in each file
ones. In the second example, the parentheses indicate m u s t be the s a m e line ( u n c h a n g e d b u t p o s s i b l y
a block of characters that has been moved relative to m o v e d ) . This "finds" most lines and thus excludes
its neighboring blocks. them from further consideration.
2. I f in each file i m m e d i a t e l y a d j a c e n t to a " f o u n d "
line p a i r there are lines identical to each other, these
lines m u s t be the s a m e line. Repeated application
OLD STRING NEW STRING LCS METHOD "'BETTER"
will "find" sequences of unchanged lines.
AXCYDWEABE ABCDE AX-BCYDI~EABE AXCYDWE,~,BCDE
ABCDEG DEFGAC ~..~CDEFGAC (DEFG) (AS-C) The algorithm acts on two files, which we call the
"old" file O and the " n e w " file N. Our conception is
that O was changed to produce N , although the algo-
rithm is virtually symmetric. There are three data
In the first example, the LCS finds four deleted structures: a symbol table and two arrays O A and N A .
strings and one inserted one, while the " b e t t e r " display We use the text of the line as a symbol to key the
shows one deleted string and one inserted one. (Note: symbol table entries. Each entry has two counters O C
Minimization of the n u m b e r of inserts plus deletes is and N C specifying the n u m b e r of copies of that line in
not a good formulation of the problem since the files O and N , respectively. These counters need only
differences between any two strings can be trivially contain the values 0, 1, and many. The symbol table
displayed as one deleted string and one inserted one.) entry also has a field O L N O , which contains the line
In the second example, the LCS method does not find n u m b e r of the line in the " o l d " file. It is of interest
a moved block of characters (it never does), but the only if O C = 1.
" b e t t e r " display both shows the m o v e d block and finds The array O A ( N A ) has one entry for each line of
edits within the blocks. The algorithm described here file O (N); it contains either a pointer to the line's
produces the " b e t t e r " display in both examples. symbol table entry or the line's number in file N (O)
Counterexamples in which the algorithm described and a bit to specify which.
here produces poorer results are easy to construct. The algorithm consists of six simple passes. Pass 1
The author's opinion is that there is no "correct" comprises the following: (a) each line i of file N is read
formulation of the problem, just as there is no "cor- in sequence; (b) a symbol table entry for each line i is
rect" way to determine which of two equivalent alge- created if it does not already exist; (c) N C for the
braic expressions is simpler. In both cases it depends line's symbol table entry is incremented; and (d) N A [i]
on the preferences of the person reading it. is set to point to the symbol table entry of line i.
The second disadvantage of the LCS method con- Pass 2 is identical to pass 1 except that it acts on
cerns the computational problems associated with it file O, array O A , and counter O C , and O L N O for the
[1, 6, 7]. In the worst case, it can take time O ( m n ) - symbol table entry is set to the line's number.
that is, time proportional to the product of the lengths In pass 3 we use observation 1 and process only
of the two strings. The only implementation of LCS those lines having N C = O C = 1. Since each represents
for file comparison of which the author is aware [7] (we assume) the same unmodified line, for each we
takes linear time and space for most practical cases but replace the symbol table pointers in N A and O A by
265 Communications April 1978
of Volume 21
the ACM Number 4
the number of the line in the other file. For example, Fig. l. Difference isolation. The hashcodes for the lines in the files
being compared are the corresponding entries of arrays OA and
if N A [i] corresponds to such a line, we look N A [i] up N A . The first and last locations are used for unique generated begin
in the symbol table and set NA[i] to O L N O and and end lines. In pass 3, all unique lines are connected (solid lines).
O A [ O L N O ] to i. In pass 3 we also "find" unique In passes 4 and 5, identical but n o n u n i q u e lines are found (dashed
lines). T h e file comparison can then be generated. Each insert (I),
virtual lines immediately before the first and immedi- delete (D), unchanged line (U), or moved block ([ and ]) can be
ately after the last lines of the files. detected and printed out. The output could be printed as:
In pass 4, we apply observation 2 and process each
line in N A in ascending order: If NA[i] points to FILE N FILE 0
OA[1"] and NA[i + 1] and OA[j + 1] contain identical
Array NA Array OA
symbol table entry pointers, then OA [j + 1] is set to
line i + 1 and NA[i + 1] is set to l i n e j + 1.
In pass 5, we also apply observation 2 and process BEGIN BEGIN
each entry in descending order: if NA[i] points to [U A MUCH D
OA [1"]and N A [i - 1] and OA [j - 1] contain identical U MASS WRITING D
symbol table pointers, then N A [i - 1] is replaced by
U OF IS D
j - 1 and OA[1" - 1] is replaced by i - 1.
Array N A now contains the information needed to I LATIN LIKE
list (or encode) the differences: If NA[i] points to a U WORDS SNOW
symbol table entry, then we assume that line i is an U FALLS
insert, and we can flag it as new text. If it points to UPON A
U
OA[j], but NA[i + 1] does not point to OA[1" + 1],
U THE MASS
then line i is at the boundary of a deletion or block
move and can be flagged as such. In the final pass, the U RELEVANT OF
file is output with its changes described in a form U] FACTS LONG D
appropriate to a particular application environment. [U LIKE WORDS
Figure 1 illustrates how this works. SOFT AND D
I
A formal version of this algorithm is presented in
U SNOW PHRASES D
an earlier version of this paper [5]. It outputs each line
flagged by the type of change it finds: insert, delete, U] FALLS
beginning of block, end of block, or unchanged. [U COVERING UPON

U UP THE

U THE RELEVANT
4. Analysis of Potential Problems
U DETAILS FACTS

The technique described here is prone to detecting U COVERING

false differences. Consider an unchanged sequence of END UP
lines none of which is unique. If the lines immediately THE
above and below the sequence are changed, they will DETAI LS
be (correctly) detected as different, but the unchanged
sequence of lines between them will also be (falsely)
marked as different. END

Because they occur at points to which attention is

directed, false differences, if few in number, are annoy-
ing rather than serious. For applications where the
the number of undetected changes small. For example,
files in question have less convenient characteristics, a
a good 20-bit hashcode would mean that about one
different basic unit or a hierarchy of basic units could
change in a million would go undetected. If a single
be chosen. 1
line is changed, the chance it will be changed to a line
The technique described can be modified by using
with the same hashcode is 1/22°. A change to a line
a hashcode [9] as a surrogate for the characters in the
having the same hashcode as some other line would, at
line. Hashcoding buys greater speed, program simplic-
worst, detect the line as moved when, in fact, it had
ity, and storage efficiency at the cost of letting some
been changed. A complete analysis is beyond the scope
changes go undetected. A good hashcoding algorithm
of this paper.
with a large number of potential hashcodes will keep
This technique can be used on very large files. The
For example, a three-line block could be chosen as the basic symbol table is the only randomly accessed data struc-
unit, with the exclusive or of the hashcodes for lines 1 to 3, 2 to 4, 3 ture that grows with the size of the files. The size of
to 5, etc. forming the indices into the hash table. Since the index for each symbol table entry can be reduced to two bits by
lines i + 1 t o j + 1 can be computed with two exclusive ors from the
index for lines i t o j and the hashcode for line i, the time it takes to combining the functions of N C and OC into one field
perform the algorithm for a k-line block is independent of k. and eliminating O L N O .

266 Communications April 1978

of Volume 21
the A C M Number 4
Fig. 2. File merging. We have three files, P, D1, and D2; P is the iterating through NA to generate lines that are: unchanged (U);
parent file, D1 and D2 are the descendant files, with D1 being the deleted from file 1 (D1), file 2 (D2), both files (DB); inserted into
dominant file. Differences are isolated between D1 (in NA) and D2 file 1 (I1), file 2 (I2), both files (IB); or on block boundaries
(in OA), and between D1 (in NA2) and P (in OA2). The result is ([ and ]). The output could be printed as:
shown by the solid lines. The merged file can be generated by

O R I G I N A L FILE M A I N MERGE FILE M A I N MERGE FILE SECOND MERGE FILE

File P File D1 File D1 File D2
Array OA2 Array N A 2 Array NA Array OA

BEGIN BEGIN BEGIN

DB MUCH A [U A

DB WRITING MASS U MASS

DB IS OF U OF
LIKE LATIN I1 LATIN
SNOW WORDS U WORDS
FALLS U FALLS
J
A UPON U UPON
MASS THE U THE D1
OF RELEVANT D2 RELEVANT
LONG FACTS U] FACTS I_AND D1
WORDS LIKE [U LIKE D1
AND SOFT IB SOFT //\ \\'1. FALLs
PHRASES SNOW U SNOW ' \\'1- UPON
FALLS J U] J ~. THE
UPON COVERING [U COVERING FACTS
THE UP U UP BLURRING 12
RELEVANT THE U THE THE 12
FACTS DETAILS U DETAILS OUTLINE 12
COVERING U AND 12
\
UP END END COVERING
THE \ UP
DETAILS ALL 12
\
THE
\
END DETAILS
\
\
END

One seems to face a choice between a hashcode 5. Encoding Files

size having a large number of bits and minimizing
undetected changes and one that has a small number Difference isolation can be used to produce a
and allows direct addressing of the hash table entry. differential encoding of one of two versions of a file.
However, one can get the advantages of both by using This encoding can be stored or transmitted more
a double hashcode technique, where a long hashcode efficiently than the complete file. The original file can
is stored in N A and O A and is rehashed into a smaller later be reconstructed from its differential encoding
sized hashcode for accessing the symbol table directly. and the other version. Two applications are: (1) A file
The number of extra entries required for such a symbol system can keep several versions of a file by storing
table depends on the number of differences in the files one version in toto and the others as differential
being compared. If the number of symbol table entries encodings; and (2) copies of files on remote computers
is two or three times the number of different lines, can be updated with minimum communications cost.
very few false differences will be detected unless the The analysis of the problem of undetected changes
files have a great many changes. This modification for the encoding process is different from that for
means that the marginal storage cost can be reduced source comparison since the decoding algorithm will
to three to six bits per unique line. not tolerate an occasional false moved block, while the

267 Communications April 1978

of Volume 21
the ACM Number 4
user who reads a source comparison listing will. Two was implemented on an XDS-940 about 5 years ago.
techniques can be used to mitigate this problem. First, More recently, Bill Frantz of T y m s h a r e implemented
checksums of the original and reconstructed files can both PL/1 and 370 machine language versions. Codie
be computed and compared, and, if they are not the Wells made some technical suggestions including the
same, the original file can be transmitted in full. double hashcode and recursive schemes. Mark Hal-
Second, the extra pass, which "unfinds" any single pern, Butler L a m p s o n , T o m Weston, Glenn Manacher,
line blocks, can be inserted into the algorithm. Most and Bonnie Simrell made several useful editorial sug-
undetected changes, since they look like single line gestions which have improved the presentation of the
blocks, would be eliminated at the small cost of trans- ideas in this paper. Glenn Manacher also pointed out
mitting all single line blocks. the relevance of the longest c o m m o n subsequence
Efficient encoding can be done even if the old problem [1, 6, 7, 10].
version is not saved on the "new file" computer. The
encoding algorithm does not require the text of the old Received April 1976; revised December 1976
file, but only the array of hashcodes OA. The array of
References
hashcodes can be transmitted from the "old file" 1. Aho, A., Hirschberg, D., and Ullman, J. Bounds on the
computer to the new file computer and the encoding complexity of the longest common subsequence problem. J. A C M
sent back. In general, this procedure would require 23, 1 (Jan. 1976), 1-12.
2. Deutsch, P., and Lampson, B. An online editor. Comm. A C M
less bandwidth than would be needed to transmit the 10, 12 (Dec. 1967), 793-799.
whole file. 3. Digital Equipment Corp. DEC System 10 Assembly Language
The bandwidth used can be reduced even more by Handbook, 3d ed., 1972, pp. 931-942.
4. Fajman, R., and Borgelt, J. WYLBUR: An interactive text
using a recursive scheme with hierarchy of basic units editing and remote job entry system. Comm. A C M 16, 5 (May
(e.g. chapters, sections, paragraphs, and sentences). 1973), 314-322.
The "new file" computer would load the array NA, 5. Heckel, P. A technique for isolating differences between files.
Tech. Pub. 73, Interactive Systems Consultants, Palo Alto, Calif.
and the "old file" computer would load the array OA 6. Hirschberg, D. A linear space algorithm for computing maximal
with hashcodes. During phase 1, the "old file" com- common subsequences. Comm. A C M 18, 6 (June 1975), 342-343.
puter would scan OA to calculate the hashcodes for all 7. Hunt, J., and McIlroy, M. An algorithm for differential file
comparison. Compt. Sci. Techn. Rep. 41, Bell Telephone Labs,
the chapters and then transmit them with the beginning Murray Hill, N.J., Aug. 1976.
and ending line numbers. The "new file" computer 8. IBM Corp. IBM Virtual Machine Facility/370 Command
would then compare these hashcodes with those that it Language Guide for General Users, Release 2,225-226.1
(UPDATE).
calculated for the chapters; differences can be isolated 9. Knuth, D.E. The Art o f Computer Programming, Vol. 3:
at the chapter level, with each line of the chapter Sorting and Searching. Addison-Wesley, Reading, Mass., 1973, p.
linked for each identical chapter found. In the second 509.
10. Wagner, R., and Fischer, M. The string-to-string correction
phase, only the changed chapters are divided into problem. J. A C M 21, 1 (Jan. 1974), 168-173.
sections, and their hashcodes computed and transmit-
ted in like manner. Only during the last phase is the
unchanged basic unit (lines) transmitted if it is detected
as changed.

6. Merging Files

A n o t h e r application of difference isolation involves

the merging of text changes. This can happen when an
organization maintains its own version of a vendor's
program and wishes to merge vendor changes with its
own whenever a new version of the program is distrib-
uted. IBM has such a syste m [8], but all changes must
be especially encoded by the user. The method for
generating the merged output should be fairly easy to
understand from Figure 2; an algorithm for this pur-
pose is given in the appendix to [5]. Where blocks of
lines have been moved, the order of the output lines is
determined by choosing one of the files as the main
file. Thus two possible merged files can be generated,
depending on which of the modified files is chosen as
dominant.

Acknowledgments. The algorithm described here

268 Communications April 1978
of Volume 21
the ACM Number 4

An Algorithm For Differential File Comparison: J. W. Hunt
No ratings yet
An Algorithm For Differential File Comparison: J. W. Hunt
9 pages
Efficient File Comparison Algorithm
No ratings yet
Efficient File Comparison Algorithm
9 pages
Longest Common Subsequence Explained
No ratings yet
Longest Common Subsequence Explained
4 pages
An O (ND) Difference Algorithm and Its Variations
No ratings yet
An O (ND) Difference Algorithm and Its Variations
15 pages
O(ND) Algorithm for Sequence Comparison
No ratings yet
O(ND) Algorithm for Sequence Comparison
16 pages
Semester Final Project Report
No ratings yet
Semester Final Project Report
11 pages
Longest Common Subsequence Explained
No ratings yet
Longest Common Subsequence Explained
3 pages
Exp 7
No ratings yet
Exp 7
5 pages
Algorithms, Fall 2005. (Massachusetts Institute of Technology: MIT
No ratings yet
Algorithms, Fall 2005. (Massachusetts Institute of Technology: MIT
11 pages
C Queen
No ratings yet
C Queen
90 pages
8 Dynamic Programming
No ratings yet
8 Dynamic Programming
75 pages
Enhancing Signature File Efficiency
No ratings yet
Enhancing Signature File Efficiency
9 pages
Longest Common Subsequence Algorithm
No ratings yet
Longest Common Subsequence Algorithm
5 pages
Searching and Sorting Algorithms Explained
No ratings yet
Searching and Sorting Algorithms Explained
31 pages
DAA (Lecture 5)
No ratings yet
DAA (Lecture 5)
52 pages
Sequence Analysis 1
No ratings yet
Sequence Analysis 1
7 pages
Dynamic Programming Lecture Notes
No ratings yet
Dynamic Programming Lecture Notes
8 pages
Intro To Dynamic Programming
No ratings yet
Intro To Dynamic Programming
7 pages
Arrays:: Problem 1
No ratings yet
Arrays:: Problem 1
4 pages
Ra2311026050228 Sundaranandhan.r.j
No ratings yet
Ra2311026050228 Sundaranandhan.r.j
4 pages
Lec2 PDF
No ratings yet
Lec2 PDF
38 pages
Trie and String Algorithms Overview
0% (1)
Trie and String Algorithms Overview
39 pages
B60 Exp07 Aoa
No ratings yet
B60 Exp07 Aoa
8 pages
Dsbig 5
No ratings yet
Dsbig 5
17 pages
DS May 19 Solved
No ratings yet
DS May 19 Solved
24 pages
LCS Time Complexity Analysis
No ratings yet
LCS Time Complexity Analysis
4 pages
DAA (Lecture 5)
No ratings yet
DAA (Lecture 5)
52 pages
159.102 Computer Science Fundamentals - Massey - Exam - S2 2012
No ratings yet
159.102 Computer Science Fundamentals - Massey - Exam - S2 2012
6 pages
Dynamic Programming: LCIS Guide
No ratings yet
Dynamic Programming: LCIS Guide
7 pages
SET A - Coding Questions - DIFFICULT - VVCE
No ratings yet
SET A - Coding Questions - DIFFICULT - VVCE
11 pages
Unit III Daa
No ratings yet
Unit III Daa
96 pages
11339AoA - EX-7
No ratings yet
11339AoA - EX-7
7 pages
1 s2.0 S1570866712001633 Main
No ratings yet
1 s2.0 S1570866712001633 Main
9 pages
9457lab Manual Expt No. 7 AOA - Longest Common Subsequence
No ratings yet
9457lab Manual Expt No. 7 AOA - Longest Common Subsequence
9 pages
Understanding Longest Common Subsequence
No ratings yet
Understanding Longest Common Subsequence
11 pages
CSE - Report - 1291 - Reedee
No ratings yet
CSE - Report - 1291 - Reedee
76 pages
Longest Common Subsequence
No ratings yet
Longest Common Subsequence
11 pages
Longest Common Subsequence Algorithm
No ratings yet
Longest Common Subsequence Algorithm
4 pages
String Manipulation and Comparison Problems
No ratings yet
String Manipulation and Comparison Problems
13 pages
CSE 205 Lab Manual 13 LCS
No ratings yet
CSE 205 Lab Manual 13 LCS
5 pages
1 s2.0 S0020019015000411 Main
No ratings yet
1 s2.0 S0020019015000411 Main
3 pages
Coding Questions IN C++
No ratings yet
Coding Questions IN C++
18 pages
IV B BCS401 Assignment-1
No ratings yet
IV B BCS401 Assignment-1
37 pages
LCS Algorithm in Dynamic Programming
No ratings yet
LCS Algorithm in Dynamic Programming
3 pages
%SASDIFF: A SAS Macro For Differential File Comparison
No ratings yet
%SASDIFF: A SAS Macro For Differential File Comparison
8 pages
Brute Force and Exhaustive Search Techniques
No ratings yet
Brute Force and Exhaustive Search Techniques
99 pages
Longest Common Subsequences
No ratings yet
Longest Common Subsequences
8 pages
Assignment
No ratings yet
Assignment
16 pages
Detecting Source Code Plagiarism With CodeMatch
No ratings yet
Detecting Source Code Plagiarism With CodeMatch
25 pages
Chp3 Strings
No ratings yet
Chp3 Strings
40 pages
A Two Way Pattern Matching Algorithm Using Sliding Patterns
No ratings yet
A Two Way Pattern Matching Algorithm Using Sliding Patterns
5 pages
Computer Science HSSC II Model Paper Solution
100% (1)
Computer Science HSSC II Model Paper Solution
13 pages
Outlier Mining Using N-Gram Technique With Compression Models
No ratings yet
Outlier Mining Using N-Gram Technique With Compression Models
7 pages
File Organization
No ratings yet
File Organization
49 pages
Question 01
No ratings yet
Question 01
113 pages
CS201 19
No ratings yet
CS201 19
31 pages
Ds Unit-5
No ratings yet
Ds Unit-5
5 pages
A* Algorithm Implementation Report
No ratings yet
A* Algorithm Implementation Report
3 pages
W1PA Sep22
No ratings yet
W1PA Sep22
11 pages
SVM & Naive Bayes Assignment
No ratings yet
SVM & Naive Bayes Assignment
3 pages
Bootstrapping Time Series Models
No ratings yet
Bootstrapping Time Series Models
43 pages
Queuing Theory Chapter
No ratings yet
Queuing Theory Chapter
22 pages
2.9.binomial and Poisson Probability Distributions
No ratings yet
2.9.binomial and Poisson Probability Distributions
23 pages
The Riccati-Bernoulli sub-ODE Method
No ratings yet
The Riccati-Bernoulli sub-ODE Method
9 pages
An Automated Software Failure Prediction Technique Using Hybrid Machine Learning Algorithms
No ratings yet
An Automated Software Failure Prediction Technique Using Hybrid Machine Learning Algorithms
5 pages
Paper TTS+Conversion
No ratings yet
Paper TTS+Conversion
13 pages
Understanding Deadlock in Operating Systems
No ratings yet
Understanding Deadlock in Operating Systems
3 pages
0478 - 23 Paper 2 May - Jun 2024
No ratings yet
0478 - 23 Paper 2 May - Jun 2024
1 page
Synthetic Text Detection in Images
No ratings yet
Synthetic Text Detection in Images
10 pages
Signal Module
No ratings yet
Signal Module
81 pages
Multivariate Approach Towards Financial Market Prediction: Guided By:-Mrs. Shraddha Ovale
No ratings yet
Multivariate Approach Towards Financial Market Prediction: Guided By:-Mrs. Shraddha Ovale
8 pages
Grade 9 Polynomial Math Test
No ratings yet
Grade 9 Polynomial Math Test
2 pages
Variables
No ratings yet
Variables
11 pages
German University in Cairo Faculty of Media Engineering and Technology Dr. Mahmoud Khalil Eng. Nourhan Ehab Eng. Noha Youssef
No ratings yet
German University in Cairo Faculty of Media Engineering and Technology Dr. Mahmoud Khalil Eng. Nourhan Ehab Eng. Noha Youssef
3 pages
Tripos Guide
No ratings yet
Tripos Guide
8 pages
Introduction to Machine Learning Concepts
No ratings yet
Introduction to Machine Learning Concepts
21 pages
The Use of Gaussian Processes in System Identification
No ratings yet
The Use of Gaussian Processes in System Identification
13 pages
Qns Exam2
No ratings yet
Qns Exam2
11 pages
Advantage of Finite Element Method
No ratings yet
Advantage of Finite Element Method
2 pages
9.extreme Values of Functions-1
No ratings yet
9.extreme Values of Functions-1
18 pages
Cox-Ross-Rubinstein to Black-Scholes
No ratings yet
Cox-Ross-Rubinstein to Black-Scholes
4 pages
2.correlation Regression Summary Notes by Pranav Popat 1
No ratings yet
2.correlation Regression Summary Notes by Pranav Popat 1
4 pages
Nov 22 DSP
No ratings yet
Nov 22 DSP
4 pages
BFS, DFS, and Best First Search Overview
No ratings yet
BFS, DFS, and Best First Search Overview
15 pages
5.Work-Energy-and-PowerPROBLEM-SOLVING-TACTICSFormulae-sheet 2
No ratings yet
5.Work-Energy-and-PowerPROBLEM-SOLVING-TACTICSFormulae-sheet 2
1 page
Transformations for Normal Distribution
No ratings yet
Transformations for Normal Distribution
6 pages
State Space Model Recipes in R
No ratings yet
State Space Model Recipes in R
27 pages

A Technique For Isolating Differences Between Files

Uploaded by

A Technique For Isolating Differences Between Files

Uploaded by

Programming S.L. G r a h a m , R.L.

A Technique for 1. Introduction

Isolating Differences The problem of finding all differences between two

The technique described here is prone to detecting U COVERING

Because they occur at points to which attention is

266 Communications April 1978

O R I G I N A L FILE M A I N MERGE FILE M A I N MERGE FILE SECOND MERGE FILE

BEGIN BEGIN BEGIN

DB WRITING MASS U MASS

One seems to face a choice between a hashcode 5. Encoding Files

267 Communications April 1978

A n o t h e r application of difference isolation involves

Acknowledgments. The algorithm described here

You might also like