fdupes does a great job if it's just one filesystem. Can delete the dupes or just hard-link which is sometimes more useful.

But before I bothered to look and see that someone had already solved my problem, I wrote a script (in bash) that would list all the md5sums and paths in two columns, sort by md5 and delete all but the first of each. It wouldn't be hard to do this for 11 filesystems, just extend it a bit so that you include the drive it came from, the date perhaps, apply whatever logic you want to the lists and generate 11 scripts as output that get run back on the original machines.



On 24 January 2014 15:16, Peter Reutemann <fracpete@waikato.ac.nz> wrote:
"Imagine having thousands of images on disparate machines. many are
dupes, even among the disparate machines. It's impossible to delete
all the dupes manually and create a singular, accurate photo image
base? Is there an app out there that can scan a file system, perhaps a
target sub-folder system, and suck in the images-- WITHOUT creating
duplicates? Perhaps by reading EXIF info or hashes? I have eleven file
systems saved, and the task of eliminating dupes seems impossible."

-- source: http://linux.slashdot.org/story/14/01/23/2227241

Worthwhile reading the comments, mentioning various tools.

Cheers, Peter
--
Peter Reutemann, Dept. of Computer Science, University of Waikato, NZ
http://www.cms.waikato.ac.nz/~fracpete/ � � � � �Ph. +64 (7) 858-5174
_______________________________________________
wlug mailing list | wlug@list.waikato.ac.nz
Unsubscribe: http://list.waikato.ac.nz/mailman/listinfo/wlug