Coloring stringdifference

Main development forum.

Coloring stringdifference

Postby matthias1955 » Wed Feb 25, 2009 11:12 am

I had a some discuss with Kimmo and now we like to share with the community

Many bugreports are related to this. Coloring stringdifference.

I will explain what is going on originally in one line (string)

WM creates blocks (call words in option) whitch will break at: 'Break at whitespace ' or 'at whitespace or punctuation' Than WM is testing each word with the equal word in the second string. If there is a diff, WM is looking to find same later in the string. If found, we get a problem. as WM will go on at that position to merge the rest.
Example: Bug #1989811 Difference Coloring at Character level

Left: abcdef,abcdef,abcdef,
Right: abcdef,abccef,abcdef,
Option: 'Break at whitespace ' => we get one word on both side=> no problem 'at whitespace or punctuation' => we get 6 words on both side.

Now Word1 is same in both
Word2 ',' is same in both
Word3 left is found as word5 in right => differ starts at word3 in left, word5 in right.
Now WM is confused. WM will mark word5 left also word3 right. That’s buggy.

I will create a small patch, for all who like to test this.

With this patch WM will try to find a synconize from backwards also.
Better actual position + 30% of numbers of words in string.
(It will not extend multiply single chars in a word. So if there is more than one diff.
It will still make from first to last differ inside a word!)

So WM will find Word5 is same in left and right.=> differ only word3.
Means WM will do same for words in a string as for Lines in a file, with the option on
'Match similar line'.


With a new option in 'line difference coloring' 'fix word to word '
the user can decide to do mergeing, or word to word compare.

The bug #1996269 Fails to detect a change Is explaining well.

In my patch look for line 91 in stringdiffs.cpp
'bool m_matchblock (false); ' change it to true and check with files from ID: 1996269

see the difference.

I have also extended the option 'at whitespace or punctuation'
so we get more and smaller words in big mathematic functions.
Whats your mind?

a)Do we need an new option for coloring 'fix word to word '.

b)Do we need an new position 'break at whitespace or none alfanumericcharacter'
or just extent the charaters as the patch shows?

the patch you may find #2636551 stringdiffcolor.

Edit by kimmov: Use winmerge.org url to tracker items
matthias1955
 
Posts: 162
Joined: Wed Dec 17, 2008 1:55 pm

Re: Coloring stringdifference

Postby gerundt » Wed Feb 25, 2009 6:25 pm

matthias1955 wrote:a)Do we need an new option for coloring 'fix word to word '.

b)Do we need an new position 'break at whitespace or none alfanumericcharacter'
or just extent the charaters as the patch shows?


I personally use Line Difference Coloring only on at Character level and don't need the Word level.

I think also that many people don't understand what Line Difference Coloring, Character level and Word level means for them. More options in this area would confuse them more!

Maybe we can simplify the option on this way? Just use a drop-down list "Coloring line difference" with entries like "Not", "All charaters", "Full words", "Full words without whitespace" or something like this. Just make it simpler for non power-user. ;)

But I like, that you works on this area! Danke! :mrgreen:
gerundt
Site Admin
 
Posts: 193
Joined: Wed Sep 24, 2008 8:47 am
Location: Germany

Re: Coloring stringdifference

Postby kimmov » Wed Feb 25, 2009 6:44 pm

gerundt wrote:I personally use Line Difference Coloring only on at Character level and don't need the Word level.

I think also that many people don't understand what Line Difference Coloring, Character level and Word level means for them. More options in this area would confuse them more!

Completely agreed! And that is what I've said every time this area has been discussed: no more options!

There are lots and lots of users that could not care less about character- or word-level. They don't even open the Options-dialog. Nature of this character/word level option is such that it needs to be selected by data you have. So even having it in Options-dialog is wrong. It belongs to the menu so it is easy to switch. If we want to keep this option.

One very strong argument to not have the option is simplifying the code - we'd have just one (not two) code to determine in-line differences. More options means more complex code. For what advantage?

So I'm probably going to just remove the GUI for the word/character level and make character-level the default. People using WinMerge for few years remember the mess we had before and how I solved it just by removing unneeded choices. Nobody complained about making WinMerge GUI easier to use.
kimmov
 
Posts: 562
Joined: Thu Sep 11, 2008 8:51 pm
Location: Finland

Re: Coloring stringdifference

Postby kimmov » Tue Mar 03, 2009 6:04 pm

I've now committed the patch to SVN trunk. It looked like nice improvement on some files I tested it. However now there are many unit tests fails so I most probably have to backout the patch soon.
kimmov
 
Posts: 562
Joined: Thu Sep 11, 2008 8:51 pm
Location: Finland

Re: Coloring stringdifference

Postby matthias1955 » Sat Mar 07, 2009 6:19 pm

with last patch for whitebreak it's ok now. so one option is done. differend to mine, but now customized, also better. thanks.:=)

The mainoption I'm talking about is differend.
actually we compare while mergeing inline.
so
'lets talk abut' and 'comunity lets talk about'
we can detect the word 'comunity' as insert and also mark it so.
'abut' and 'about' as wrote differ, so also marked.

This kind of compareinmg will not work with next sample.:
Let's take an easy sample
0 1 2 3 4 5 6 7 8 9 A B C D E F
0030 2d d0 00 00 80 00 00 00 00 00 c0 a8 48 c2 00 00 -...........H...
0040 00 00 c0 a8 48 c1 00 14 e8 5b 5d 66 00 00 00 00 ....H....[]f....
and
0030 56 91 00 00 00 00 c0 a8 48 c2 c0 a8 48 c2 00 00 V.......H...H...
0040 00 00 c0 a8 48 c1 00 14 e8 5b 5d 66 00 00 00 00 ....H....[]f....
How you want to compare this?

WM find a 80 at pos 34 left, and finds no completment right, so the rest is marked as differend!
with the option byte to byte.
WM will compare allways the correct value without problems.
So I see two possibilitys.
new optionn in line coloring, or user must use full content method of filecompare method.
That can give same result. But I thinks these two methods have nothing to do with each other.:=(
matthias1955
 
Posts: 162
Joined: Wed Dec 17, 2008 1:55 pm

Re: Coloring stringdifference

Postby galh » Mon Apr 06, 2009 12:27 pm

Please note that I believe that the problem with this feature is because of a wrong basic assumption. When I re-wrote the code (more than a year ago :oops:) I've found out that a diff algorithm between two strings returns a list of diffs between the left string and the right string. It doesn't expect to return the diffs the way WinMerge is trying to do today, left-right diffs AND right-left diffs.

Fix that and the rest of the bugs will either be gone or be easier to fix.

Of course, I might be wrong :). I'll be glad if someone will direct me to a such a diff algorithm.
galh
 
Posts: 2
Joined: Tue Feb 10, 2009 9:30 am


Return to Developers

Who is online

Users browsing this forum: No registered users and 2 guests