String differencing

Postby kimmov » Sun Mar 01, 2009 8:02 pm

So I've finally added few tests for the stringdiff with Google Test. I've only added some very basic tests so far. But already have one interesting observation about words.

If we have these two lines in two files:
Code: Select all
abcde fghij
ABcde fGhij

I'd expect we have two differences as two words are different. For my surprise the current implementation returns only one difference covering both words. This is test StringDiffsTest / WordBreak2Words3.

If we have three words:
Code: Select all
abcde fghij KLmno
abcDE fghij klmno

Then we get two differences when there is one identical word in between. This is test StringDiffsTest / WordBreak3Words5.

I've set test to pass as the code currently works. But I'm wondering if we really should detect two differences in two words.
Re: String differencing

Postby galh » Mon Apr 06, 2009 12:59 pm

I'll try to explain (based on my knowledge from the code I worked with last year, I don't know if things were changed since then).

If the following example (word diff, case-sensitive):
Code: Select all
abcde fghij
ABcde fGhij

There are two diffs, but then the stringdiffs::PopulateDiffs() method "join" them to one diff. I think this is a nice behavior because it like using a marker to highlight a difference on a paper. You usually doesn't care about the white-spaces and you mark the two words as one diff. Of course, if you care about white-spaces, then you might not want to connect the words.

In your second example, the identical word between the diffs "breaks" the marker to two diffs.
