Finding duplicate files

Every day solutions to every day challenges. + Brilliant stuff

Moderators: b1o, jkerr82508

Forum rules
Please feel free to post your tip it does not have to be advanced. Also ask questions directly related to the tip here. But do not start new threads with questions or ask for help here. That is what the help section is for. forum rules: http://bjoernvold.com/forum/viewtopic.php?f=8&t=568
User avatar
viking60
Über-Berserk
Posts: 9351
Joined: 14 Mar 2010, 16:34

Finding duplicate files

Postby viking60 » 16 Nov 2013, 09:58

When cleaning up the computer I want to find and delete duplicate files. This is a one liner that will find duplicate files:

Code: Select all

find -not -empty -type f -printf "%s\n" | sort -rn | uniq -d | xargs -I{} -n1 find -type f -size {}c -print0 | xargs -0 md5sum | sort | uniq -w32 --all-repeated=separate


Easy to remember command right? :-D If not :shock: - maybe it is time to make an alias:

Code: Select all

alias duplicates='find -not -empty -type f -printf "%s\n" | sort -rn | uniq -d | xargs -I{} -n1 find -type f -size {}c -print0 | xargs -0 md5sum | sort | uniq -w32 --all-repeated=separate'


Now you can navigate to your directory of choice and check it for duplicate files by typing

Code: Select all

duplicates
. This will check all subdirectories too.
This is good for finding duplicate pictures and mp3's etc. It is not a good idea to delete duplicate files from themes and icon sets etc.
Just delete stuff that you have put there.
I always find a lot of duplicates in my Downloads directory..

Now if you want to find and delete duplicate files in one operation you could enter this command:

Code: Select all

find -not -empty -type f -printf "%s\n" | sort -rn | uniq -d |  xargs -I{} -n1 find -type f -size {}c -print0 | xargs -0 md5sum | sort | uniq -w32 --all-repeated=separate | cut -f3-100 -d ' ' | tr '\n.' '\t.' | sed 's/\t\t/\n/g' | cut -f2-100 | tr '\t' '\n' | perl -i -pe 's/([ (){}-])/\\$1/g' | perl -i -pe 's/'\''/\\'\''/g' | xargs -pr rm -v

Enter y to confirm that all findings should be deleted. This is probably not a smart thing to do - and dangerous.
No risk - no fun though :berserkf
Manjaro 64bit on the main box -Intel(R) Core(TM) i7 CPU 920 @ 2.67GHz and nVidia Corporation GT200b [GeForce GTX 275] (rev a1. + Centos on the server - Arch on the laptop.
"There are no stupid questions - Only stupid answers!"

User avatar
viking60
Über-Berserk
Posts: 9351
Joined: 14 Mar 2010, 16:34

Re: Finding duplicate files

Postby viking60 » 29 Oct 2017, 10:07

To find and delete duplicate files; you can also install fdupes.

You need to enter a directory

Code: Select all

fdupes ~/Downloads

This will find all duplicates in your Download directory.

To delete them you can do a:

Code: Select all

fdupes -d ~/Downloads

This will present you with a list of how many duplicates you want to preserve. Typing 1 (one) will preserve one copy.

You will have to do this for every duplicate found.

If you want to scan and remove recursively you can use the -r switch:

Code: Select all

fdupes -rd ~/Downloads


To see the size of the duplicates you can use the -S switch:

Code: Select all

fdupes -S ~/Downloads
Manjaro 64bit on the main box -Intel(R) Core(TM) i7 CPU 920 @ 2.67GHz and nVidia Corporation GT200b [GeForce GTX 275] (rev a1. + Centos on the server - Arch on the laptop.
"There are no stupid questions - Only stupid answers!"


Return to “Tips & Tricks”