Search for duplicated files using C# and LINQ

Over the years I downloaded, copied, moved around my files, sometimes I made lot of copies, or put them in different directories. And now, there is a time, to clean up some duplicates. I took the easy way, quickly created a small application, which filtering my drive based on file name and length.
The solution is easy, first I create a FileInfo list, fill the list with FileInfo’s. I walk through the directory tree with recursion, and not bothering myself with permission violation. Than search for duplicates, than I create a file with possible duplicates.
Here is the basic code:

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.IO;
using System.Security;
using System.Security.Permissions;

namespace ConsoleApplication1
{
    public class DuplicateFileFinderClass
    {
        public static List<FileInfo> files = new List<FileInfo> ();
        public static void ListDrive (string drive, bool enumerateFolders)
        {
            try
            {
                DirectoryInfo di = new DirectoryInfo (drive);
                foreach (FileInfo fi in di.EnumerateFiles ())
                {
                    files.Add (fi);
                }

                if (enumerateFolders)
                {
                    foreach (DirectoryInfo sdi in di.EnumerateDirectories ())
                    {
                        ListDrive (sdi.FullName, enumerateFolders);
                    }
                }


            }

            catch (UnauthorizedAccessException) { }
        }

        public static void ListDuplicates ()
        {
            var duplicatedFiles = files.GroupBy (x => new { x.Name, x.Length}).Where (t => t.Count () > 1).ToList ();

            Console.WriteLine ("Total items: {0}", files.Count);
            Console.WriteLine ("Probably duplicates {0}", duplicatedFiles.Count ());

            StreamWriter duplicatesFoundLog = new StreamWriter ("DuplicatedFileList.txt");

            foreach (var filter in duplicatedFiles)
            {
                duplicatesFoundLog.WriteLine ("Probably duplicated item: Name: {0}, Length: {1}",
                    filter.Key.Name,
                    filter.Key.Length);

                var items = files.Where (x => x.Name == filter.Key.Name &&
                    x.Length == filter.Key.Length).ToList ();

                int c = 1;
                foreach (var suspected in items)
                {
                    duplicatesFoundLog.WriteLine ("{3}, {0} - {1}, Creation date {2}",
                        suspected.Name,
                        suspected.FullName,
                        suspected.CreationTime,
                        c);
                    c++;
                }

                duplicatesFoundLog.WriteLine ();
            }

            duplicatesFoundLog.Flush ();
            duplicatesFoundLog.Close ();
        }
    }
}

From the console application I first call the ListDrive method, than call ListDuplicates method. Well, I don’t say it’s the best and most elegant way, but quickly served my needs. The whole process took around 31 seconds, 6 for compile the list, 25 for create the log, in 500GB HDD, with over 6600 duplications. With less than 100 lines of code.

Advertisements

4 thoughts on “Search for duplicated files using C# and LINQ

  1. I truly love your site.. Very nice colors & theme. Did you make this web site yourself?
    Please reply back as I’m wanting to create my own site and would love to know where you got this from or what the theme is called. Thanks!

  2. Kevin Sheth says:

    this is very nice an concise. I wrote something a little more verbose but also does md5 hashing to confirm the duplicates. I’ve also attempted a WPF gui, but a lot of work is still needed. check it out at https://github.com/kns98/ndupfinder

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s