New compare type(s)?

Main development forum.

New compare type(s)?

Postby kimmov » Wed Jan 07, 2009 8:32 pm

This post was inspired by lengthy discussions I've had with Matthias about encoding detection and how to detect encodings.

Currently we have three compare types in WinMerge:
  • full contents compare is the most complete compare type, it handles Unicode, encodings, codepages, EOLs etc. It is also always used in file compare. Other compare types are only for folder compare (for faster results).
  • quick contents compare is basically a byte per byte check for file data. It also handles EOL differences, but no encodings, codepage or some such
  • file time and/or size compares are the fastest compares, they just check file attributes, don't read one bit of file content

Each compare type has its own use though quick contents compare is too easy to misunderstand. Which is our developers fault that we haven't documented it better.

But the real question is if we need new compare types? Quick compare is a bit of weird as with tweaks for byte per byte compare.

At least two new kinds of compare types come to mind:
  • plain and simple byte per byte compare without even EOL bytes check. Plain and simple, but fastest possible content compare. Ideal for binary files.
  • smarter quick compare which would handle EOL types, encodings and perhaps some more.
  • some combinations of these?

Full contents is always the only type giving best results. Other compare types are faster but lose some accuracy.
kimmov
 
Posts: 562
Joined: Thu Sep 11, 2008 8:51 pm
Location: Finland

Re: New compare type(s)?

Postby matthias1955 » Thu Jan 08, 2009 4:02 pm

I think more a little differ way-
let's have :
class for fileshandling
class for unicodeing
class for plugin
class or method will be the compare method

right now we have many things complete double.

mean lets go to another structure, like Geek has done.

I understand there will be mutch more to discuss if we redesign WM.
And it's not a small job. Maybe for version 3.00.0
matthias1955
 
Posts: 162
Joined: Wed Dec 17, 2008 1:55 pm

Re: New compare type(s)?

Postby kimmov » Thu Jan 08, 2009 5:50 pm

matthias1955 wrote:I think more a little differ way-


Oh yes. Current WinMerge is just crap and should be replaced by some old and hacky version from some web page.

Seriously, how was this post related to the topic?

WARNING I will (as moderator) remove unrelated posts from this thread. Do not continue this crap. The thread is about compare engines and their features not about WinMerge re-desing, rejected patches or WinMerge 3.0.
kimmov
 
Posts: 562
Joined: Thu Sep 11, 2008 8:51 pm
Location: Finland

Re: New compare type(s)?

Postby gerundt » Mon Jan 19, 2009 8:04 pm

Maybe a "Checksum" compare type? It create checksums from the files and compare it!?
gerundt
Site Admin
 
Posts: 193
Joined: Wed Sep 24, 2008 8:47 am
Location: Germany

Re: New compare type(s)?

Postby kimmov » Mon Jan 19, 2009 8:21 pm

gerundt wrote:Maybe a "Checksum" compare type? It create checksums from the files and compare it!?

Interesting idea. I think this has been suggested few times in past and we had some initial ideas about it back then.

Some properties of checksum compare are (feel free to correct me and/or add more):
pros
  • can use to verify file checksums
  • idiot proof with strong hash algorithms (no two files can have same checksum)
  • compare result can be easily stored (as it is a hash(es))
  • compare results easy to move over e-mail etc

neutral
  • can be even faster than full contents compare

Cons
  • creating checksums is relatively slow (compared to e.g byte per compare)
  • cannot handle EOL styles or encodings
  • cannot have line filters
kimmov
 
Posts: 562
Joined: Thu Sep 11, 2008 8:51 pm
Location: Finland

Re: New compare type(s)?

Postby gerundt » Thu Jan 22, 2009 9:55 am

Today I must copy a folder structure (with more then 70.000 files) from a server to a other. Unfortunately stumble Windows after 10% of the copy across a open file. So I now use WinMerge to compare and copy the files. I use the "Size" compare type in hope, that is the fastest method. :D

With this background, would be a "Exists" compare type faster? A compare type who just look if a file exists or not?
gerundt
Site Admin
 
Posts: 193
Joined: Wed Sep 24, 2008 8:47 am
Location: Germany

Re: New compare type(s)?

Postby kimmov » Thu Jan 22, 2009 3:10 pm

gerundt wrote:I use the "Size" compare type in hope, that is the fastest method. :D


It is. It only checks file's attributes and don't even open the files. Time-based compare should be as fast but as comparing times is a bit more work, there might be a difference with large set of files.

gerundt wrote:With this background, would be a "Exists" compare type faster? A compare type who just look if a file exists or not?


Interesting idea. Its not comparing anymore, but a tool for synchronization. In cases when you only need to check if both folder (structures) have same files.

These synchronization features are what lots of people need (perhaps more than comparing content) and WinMerge lacks quite badly with them. WinMerge does not detect moved files, no selection of newer files to copy over older etc etc.

So I probably wouldn't add "check for existence" as a separate compare method but as a part of synchronization feature/improvements.
kimmov
 
Posts: 562
Joined: Thu Sep 11, 2008 8:51 pm
Location: Finland

Re: New compare type(s)?

Postby kimmov » Thu Jan 22, 2009 3:20 pm

gerundt wrote:Maybe a "Checksum" compare type? It create checksums from the files and compare it!?


Just had one idea about using checksums. Giving that checksums are unique (no two files exist with the same checksum), we could easily create a map (instead of a list) of files and find matching files from that map (by the checksum). So we basically can turn the compare around and find files that match and show if the path for those files are identical or different.

Basically the compare would be like this (just one possibility):
  1. get lists of items to compare (like in current compare methods)
  2. calculate checksum for each file
  3. compare checksums:
    1. if the checksums match the files are in same place and are identical
    2. if other side file is missing check if the checksum is found elsewhere in the folder structure (moved file)
    3. if the checksums don't match check if there are files with matching checksums in other folders (moved/replaced files)

Now the checksum compare produces results we simply cannot produce with other (current) methods and it suddenly became a great new feature to have.
kimmov
 
Posts: 562
Joined: Thu Sep 11, 2008 8:51 pm
Location: Finland

Re: New compare type(s)?

Postby gerundt » Thu Jan 22, 2009 3:36 pm

And with checksums we can find duplicates too, or? For example duplicated mp3s in your music collection or duplicated images from your photo album?
gerundt
Site Admin
 
Posts: 193
Joined: Wed Sep 24, 2008 8:47 am
Location: Germany

Re: New compare type(s)?

Postby kimmov » Thu Jan 22, 2009 5:05 pm

gerundt wrote:And with checksums we can find duplicates too, or? For example duplicated mp3s in your music collection or duplicated images from your photo album?


Yep. Good point. :!:
kimmov
 
Posts: 562
Joined: Thu Sep 11, 2008 8:51 pm
Location: Finland


Return to Developers

Who is online

Users browsing this forum: No registered users and 2 guests