
24 Jan
2014
24 Jan
'14
3:26 a.m.
On Fri, 24 Jan 2014 15:58:09 +1300, Bruce Kingsbury wrote:
But before I bothered to look and see that someone had already solved my problem, I wrote a script (in bash) that would list all the md5sums and paths in two columns, sort by md5 and delete all but the first of each.
If you want to speed things up, you could be lazy about computing the hash of each file’s contents. Start by just recording the length of each file; only if you find two files the same length, do you need to compute their respective content hashes to check for a match. This is the point where I would give up on bash and resort to Python.