Recognizing images with the same content in Java
Some time ago, I spent some time looking for ways to determine whether the two images are the same to answer this question I now face a slightly different problem: I have about 2000 images, some of which have the same content, but are mutually scaled / rotated versions (the rotation is always a multiple of 90 °), as well as different compression and image formats (mainly JPG, some PNG, nothing else) The scale should not exceed approximately 2:1 What I want to do is eliminate duplication while retaining the highest quality instances Since Java is the only language I am proficient in, I need to use Java
The answers provide many useful links to another problem, but they don't look like any one that can identify duplicates when zooming / rotating
This question along with the answers suggests that you first scale all images to a very small size (for example, 32 * 32 or 16 * 16), then basically do some hashing and compare them according to the hashes It sounds smart enough to me that images can be pre sorted before comparison, which will become an O (n) problem after sorting However, since the image may rotate, I don't know how to deal with it; Considering that what they describe has a clear direction (the human eye can easily decide which way "should be"), one option is to manually pass through all images and decide to rotate I want to avoid it if possible
Is there a method / algorithm (link to SSIM) to deal with this problem, or can anyone propose a better method than the above? Maybe someone knows that Java libraries may be suitable for tasks (in the link question, mention the Java wrapper of OpenCV, and then ImageJ, imgsclr)? Any help is appreciated
Solution
I think the general answer to this question requires an unattended machine learning method, which can produce locally invariant features - basically a fancy method of scaling or rotating without changing the hash, and then running the clustering algorithm Here are some papers that may be relevant:
>Clustering near duplicate images in large collections > a novel duplicate images detection method based on PLSA model > efficient image duplicate detection based on image analysis – there are a lot of things here because it is the whole doctoral thesis of some Playboys