How to detect duplicate images with ASP.NET Core

Learn to detect exact and near-duplicate images in ASP.NET Core using hashing, perceptual hashing, and machine learning. Includes code examples and best practices.

ASP.NET Core

How to detect duplicate images with ASP.NET Core

  • Sunday, May 25, 2025

Learn to detect exact and near-duplicate images in ASP.NET Core using hashing, perceptual hashing, and machine learning. Includes code examples and best practices.

In the digital age, managing image uploads efficiently is crucial for applications ranging from social media platforms to e-commerce sites. Duplicate images can waste storage, degrade user experience, and even lead to content redundancy. This guide explores robust methods to detect duplicate images in ASP.NET Core, covering both exact duplicates and near-duplicates, with practical implementations and best practices.

Understanding Image Duplicates

Exact Duplicates

Exact duplicates are identical image files, byte-for-byte. Even a single pixel difference or metadata change makes them distinct.

Near-Duplicates

Near-duplicates are visually similar but differ in format, size, quality, or minor edits (e.g., resizing, cropping, filters). Detecting these requires advanced techniques.

Exact Duplicate Detection Using Hashing

Hashing generates a unique fingerprint for files. Identical hashes indicate exact duplicates.

Cryptographic Hashing

  • MD5: Fast but prone to collisions.

  • SHA256: Slower but secure.

Implementation:

using System.Security.Cryptography;

public string ComputeHash(Stream stream)
{
    using var sha256 = SHA256.Create();
    byte[] hashBytes = sha256.ComputeHash(stream);
    return BitConverter.ToString(hashBytes).Replace("-", "").ToLower();
}

Storing and Querying Hashes

Store hashes in a database with an indexed column for quick lookups.

public class ImageRecord
{
    public int Id { get; set; }
    public string Hash { get; set; }
    public string FilePath { get; set; }
}

public async Task<bool> IsDuplicateAsync(string hash)
{
    return await _context.ImageRecords.AnyAsync(i => i.Hash == hash);
}

Near-Duplicate Detection with Perceptual Hashing

Perceptual hashing (p-hash) identifies similar images by focusing on visual features.

Using ImageSharp for Processing

Install SixLabors.ImageSharp for image manipulation.

using SixLabors.ImageSharp;
using SixLabors.ImageSharp.Processing;

public async Task<string> ComputePerceptualHashAsync(Stream stream)
{
    using var image = await Image.LoadAsync(stream);
    image.Mutate(x => x.Resize(32, 32).Grayscale());
    var pixels = image.GetPixelMemoryGroup().First().Span;
    // Compute average brightness and generate hash
    // (Example simplified; implement detailed logic)
    return "pHash_placeholder";
}

Comparing pHashes

Use Hamming Distance to measure similarity:

public int HammingDistance(string hash1, string hash2)
{
    return hash1.Zip(hash2, (c1, c2) => c1 != c2 ? 1 : 0).Sum();
}

Handling Different Image Formats

Convert images to a standard format (e.g., JPEG) before processing:

public async Task<Stream> ConvertToJpegAsync(Stream input)
{
    using var image = await Image.LoadAsync(input);
    var output = new MemoryStream();
    await image.SaveAsJpegAsync(output);
    output.Position = 0;
    return output;
}

Performance Optimization

Asynchronous Processing

Avoid blocking requests by offloading hashing to background tasks.

public async Task<IActionResult> UploadImage(IFormFile file)
{
    var stream = file.OpenReadStream();
    var hash = await Task.Run(() => ComputeHash(stream));
    var isDuplicate = await _service.IsDuplicateAsync(hash);
    // Handle response
}

Caching

Cache frequently accessed hashes using Redis:

services.AddStackExchangeRedisCache(options =>
{
    options.Configuration = "localhost:6379";
});

Security Considerations

  • Validate File Types: Restrict uploads to allowed MIME types.

  • Sanitize Filenames: Prevent path traversal attacks.

  • Secure Storage: Use encrypted storage for sensitive images.

Advanced Techniques

Machine Learning with ML.NET

Train a model to recognize near-duplicates using feature vectors.

var context = new MLContext();
var pipeline = context.Transforms.Concatenate("Features", "PixelValues")
    .Append(context.Clustering.Trainers.KMeans(numberOfClusters: 10));

Feature Matching with OpenCvSharp

Use OpenCV for edge detection and keypoint matching.

using OpenCvSharp;

var img1 = Cv2.ImRead("image1.jpg");
var img2 = Cv2.ImRead("image2.jpg");
var detector = ORB.Create();
KeyPoint[] keypoints1, keypoints2;
Mat descriptors1 = new Mat(), descriptors2 = new Mat();
detector.DetectAndCompute(img1, null, out keypoints1, descriptors1);
// Compare descriptors

Real-World Use Cases

  • Social Media: Prevent users from re-uploading the same image.

  • E-Commerce: Ensure product images are unique.

  • Healthcare: Avoid redundant medical imaging storage.

Best Practices

  • Regular Audits: Clean up outdated hashes.

  • Monitor Performance: Log processing times and collisions.

  • Hybrid Approaches: Combine exact and perceptual hashing for balance.

Conclusion

Detecting duplicate images in ASP.NET Core involves choosing the right hashing strategy, optimizing performance, and securing the process. Whether exact or near-duplicates, leveraging libraries like ImageSharp and OpenCV can streamline implementation. By following these guidelines, developers can enhance application efficiency and user experience.

Contact Facile Team

Signup for monthly updates and stay in touch!

Subscribe to Facile Technolab's monthly newsletter to receive updates on our latest news, offers, promotions, resources, source code, jobs and other exciting updates.