RFR: 8193031: Collections.addAll is likely to perform worse than Collection.addAll

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

RFR: 8193031: Collections.addAll is likely to perform worse than Collection.addAll

Martin Buchholz-3
Reply | Threaded
Open this post in threaded view
|

Re: RFR: 8193031: Collections.addAll is likely to perform worse than Collection.addAll

Stuart Marks


On 12/4/17 7:50 PM, Martin Buchholz wrote:
> https://bugs.openjdk.java.net/browse/JDK-8193031
> http://cr.openjdk.java.net/~martin/webrevs/jdk/Collections-addAll/

       * Adds all of the specified elements to the specified collection.
       * Elements to be added may be specified individually or as an array.
       * The behavior of this convenience method is identical to that of
-     * {@code c.addAll(Arrays.asList(elements))}, but this method is likely
-     * to run significantly faster under most implementations.
+     * {@code c.addAll(Arrays.asList(elements))}.
       *
       * <p>When elements are specified individually, this method provides a
       * convenient way to add a few elements to an existing collection:

I agree with the removal of "significantly faster." In some real cases, it's
definitely not! Usually the behavior wording isn't "is identical to". I'd prefer
something like "the behavior is as if...."

      public static <T> boolean addAll(Collection<? super T> c, T... elements) {
-        boolean result = false;
-        for (T element : elements)
-            result |= c.add(element);
-        return result;
+        return c.addAll(Arrays.asList(elements));
      }

So, this seems sensible, but the tradeoffs aren't obvious.

If the destination is an ArrayList, the first thing that addAll() does is call
toArray() on the argument. And Arrays$ArrayList.toArray() makes a copy, as
required by spec. So this redundantly copies every element, compared to the
for-loop.

But I did some quick benchmarks and I found that bulk copies are so fast that
even doing two of them is quite a bit faster than a single copy in a for-loop,
ranging from 4x-6x faster over a variety of sizes.

This also requires temp space the size of the input array. This gives me (and
the garbage collector) a bit of a pause.

Turning to other collections, the AbstractCollection.addAll() implementation
runs essentially the same loop as here, except that it runs it over its
Collection argument instead of an array. Thus it uses an Iterator to loop
instead of indexing over an array. Here, converting to a list has a slight
performance disadvantage, around 10-15% (for HashSet, which uses
AbstractCollection.addAll).

You point out in the bug report that the implementation of Collections.addAll()
misses out on optimizations that implementations can provide for bulk updates.
This is true, but my feeling is that loses something by converting the array
into a Collection before calling the virtual addAll(Collection) method.

How about this alternative: do the same thing we did with sort().

Add a new default method Collection.addEach(T...) whose default implementation
is either the original implementation of Collections.addAll() or your proposed
modification. (We should use a name other than addAll, since overloading with a
varargs method is a bad idea.) Add overrides for key implementations like
ArrayList. Then, have Collections.addAll(c, T... elements) delegate to
c.addEach(elements).

For ArrayList, at least, this would avoid the redundant copy and temporary
allocation, which seems nice.

s'marks
Reply | Threaded
Open this post in threaded view
|

Re: RFR: 8193031: Collections.addAll is likely to perform worse than Collection.addAll

Remi Forax
+1

being also able to write,
  List<String> l = ...
  l.addEach("foo", "bar", "baz");
is a nice addition.

Rémi

----- Mail original -----
> De: "Stuart Marks" <[hidden email]>
> À: "Martin Buchholz" <[hidden email]>
> Cc: "core-libs-dev" <[hidden email]>
> Envoyé: Mardi 5 Décembre 2017 08:27:41
> Objet: Re: RFR: 8193031: Collections.addAll is likely to perform worse than Collection.addAll

> On 12/4/17 7:50 PM, Martin Buchholz wrote:
>> https://bugs.openjdk.java.net/browse/JDK-8193031
>> http://cr.openjdk.java.net/~martin/webrevs/jdk/Collections-addAll/
>
>       * Adds all of the specified elements to the specified collection.
>       * Elements to be added may be specified individually or as an array.
>       * The behavior of this convenience method is identical to that of
> -     * {@code c.addAll(Arrays.asList(elements))}, but this method is likely
> -     * to run significantly faster under most implementations.
> +     * {@code c.addAll(Arrays.asList(elements))}.
>       *
>       * <p>When elements are specified individually, this method provides a
>       * convenient way to add a few elements to an existing collection:
>
> I agree with the removal of "significantly faster." In some real cases, it's
> definitely not! Usually the behavior wording isn't "is identical to". I'd prefer
> something like "the behavior is as if...."
>
>      public static <T> boolean addAll(Collection<? super T> c, T... elements) {
> -        boolean result = false;
> -        for (T element : elements)
> -            result |= c.add(element);
> -        return result;
> +        return c.addAll(Arrays.asList(elements));
>      }
>
> So, this seems sensible, but the tradeoffs aren't obvious.
>
> If the destination is an ArrayList, the first thing that addAll() does is call
> toArray() on the argument. And Arrays$ArrayList.toArray() makes a copy, as
> required by spec. So this redundantly copies every element, compared to the
> for-loop.
>
> But I did some quick benchmarks and I found that bulk copies are so fast that
> even doing two of them is quite a bit faster than a single copy in a for-loop,
> ranging from 4x-6x faster over a variety of sizes.
>
> This also requires temp space the size of the input array. This gives me (and
> the garbage collector) a bit of a pause.
>
> Turning to other collections, the AbstractCollection.addAll() implementation
> runs essentially the same loop as here, except that it runs it over its
> Collection argument instead of an array. Thus it uses an Iterator to loop
> instead of indexing over an array. Here, converting to a list has a slight
> performance disadvantage, around 10-15% (for HashSet, which uses
> AbstractCollection.addAll).
>
> You point out in the bug report that the implementation of Collections.addAll()
> misses out on optimizations that implementations can provide for bulk updates.
> This is true, but my feeling is that loses something by converting the array
> into a Collection before calling the virtual addAll(Collection) method.
>
> How about this alternative: do the same thing we did with sort().
>
> Add a new default method Collection.addEach(T...) whose default implementation
> is either the original implementation of Collections.addAll() or your proposed
> modification. (We should use a name other than addAll, since overloading with a
> varargs method is a bad idea.) Add overrides for key implementations like
> ArrayList. Then, have Collections.addAll(c, T... elements) delegate to
> c.addEach(elements).
>
> For ArrayList, at least, this would avoid the redundant copy and temporary
> allocation, which seems nice.
>
> s'marks