Java – why HashSet RemoveAll requires secondary operations?
I have this code to generate a HashSet and call removeall() I made a Class A, which is just an int wrapper. It records the number equal to the number of calls - the program outputs this number
import java.util.*; class A { int x; static int equalsCalls; A(int x) { this.x = x; } @Override public int hashCode() { return x; } @Override public boolean equals(Object o) { equalsCalls++; if (!(o instanceof A)) return false; return x == ((A)o).x; } public static void main(String[] args) { int setSize = Integer.parseInt(args[0]); int listSize = Integer.parseInt(args[1]); Set<A> s = new HashSet<A>(); for (int i = 0; i < setSize; i ++) s.add(new A(i)); List<A> l = new LinkedList<A>(); for (int i = 0; i < listSize; i++) l.add(new A(i)); s.removeAll(l); System.out.println(A.equalsCalls); } }
It turns out that the removeAll operation is not linear. I think:
$java A 10 10 55 $java A 100 100 5050 $java A 1000 1000 500500
In fact, it seems to be secondary Why?
Replace line s.removeall (L); And (a B: l) s. remove (b); Fix it to act the way I want it to:
$java A 10 10 10 $java A 100 100 100 $java A 1000 1000 1000
Why?
This is a graph showing a conic:
The X and Y axes are two parameters of a java program The Z axis is the number of a.equals calls
The figure is generated by this asymptote program:
import graph3; import grid3; import palette; currentprojection=orthographic(0.8,1,1); size(400,300,IgnoreAspect); defaultrender.merge=true; real[][] a = new real[][]{ new real[]{0,0},new real[]{0,1},3,3},2,6,6},10,10},4,15,15},5,21,21},28,28},7,36,36},8,45,45},9,55,55},66,66},11,78,78},12,91,91},13,105,105},14,120,120},136,136},16,153,153},17,171,171},18,190,190},19,210},}; surface s=surface(a,(-1/2,-1/2),(1/2,1/2),Spline); draw(s,mean(palette(s.map(zpart),Rainbow())),black); //grid3(XYZgrid);
Array A is generated by Haskell program:
import System.Process import Data.List f :: Integer -> Integer -> IO Integer f x y = fmap read $readProcess "/usr/bin/java" ["A",show x,show y] "" g :: [[Integer]] -> [String] g xss = map (\xs -> "new real[]{" ++ intercalate "," (map show xs) ++ "},") xss main = do let n = 20 xs <- mapM (\x -> mapM (\y -> f x y) [0..n]) [0..n] putStrLn $unlines $g xs
Solution
The time it takes for removeAll to work depends on what collection you pass through When you look at the implementation of removeAll:
public boolean removeAll(Collection<?> c) { boolean modified = false; if (size() > c.size()) { for (Iterator<?> i = c.iterator(); i.hasNext(); ) modified |= remove(i.next()); } else { for (Iterator<?> i = iterator(); i.hasNext(); ) { if (c.contains(i.next())) { i.remove(); modified = true; } } } return modified; }
You can see that when the HashSet and the collection have the same size, it iterates over the HashSet and calls c.contains.for each element Since you use LinkedList as a parameter, this is the O (n) operation of each element Since this operation needs to be performed for each of the N deleted elements, the result is an O (N2) operation
If you replace the LinkedList with a collection that provides more efficient content, the performance of removeAll will improve accordingly If you make the HashSet larger than the collection, the performance should also be significantly improved because it traverses the collection