RFR(m): 8177290 add copy factory methods for unmodifiable List, Set, Map

classic Classic list List threaded Threaded
32 messages Options
12
Reply | Threaded
Open this post in threaded view
|

Re: RFR(m): 8177290 add copy factory methods for unmodifiable List, Set, Map

Patrick Reinhart
Am 02.11.2017 um 20:08 schrieb Roger Riggs:
> Hi,
>
> It would also support a more fluent use of the APIs.  Though if we
> start down that road
> it may become littered with convenience methods.  Something to ponder
> for a bit.
>
One one hand I see your obligation, on the other hand we would open the
chance to write
more optimal implementations for a library implementers though. That
seems to me also
an considerable point here. For instance an unmodifiable List could
basically return a fast
copy of itself here using an internal constructor passing the values
from itself or a Set
implementation has not to check for duplicate entries or similar.

-Patrick

>
>
> On 11/1/2017 4:29 PM, Patrick Reinhart wrote:
>> In this case I would prefer a non static copyOf() method on the list
>> to create a unmodifiable list/set/map, where the optimal factory
>> method can be called. This would also solve the problem of a
>> concurrent implementation.
>>
>> -Patrick
>>
>>
>>> Am 01.11.2017 um 21:05 schrieb Louis Wasserman <[hidden email]>:
>>>
>>> I disagree, actually.  Collections with size zero and one are
>>> significantly more common than bigger collections.
>>>
>>> In Guava's immutable collection factories (ImmutableList.of(...)
>>> etc.), we observed a roughly exponential decline in the number of
>>> users of factory methods of each size: if N people created empty
>>> lists, ~N/2 created singleton lists, ~N/4 created two-element lists,
>>> etc.  We got noticeable pushback from RAM-sensitive customers when
>>> we eliminated some specialized singleton collection implementations.
>>>
>>> Our experience has been that specializing for N=0 and N=1 does pay
>>> off.  Probably not N=2, though?
>>>
>>> On Wed, Nov 1, 2017 at 12:45 PM Patrick Reinhart <[hidden email]
>>> <mailto:[hidden email]>> wrote:
>>>
>>>> Am 01.11.2017 um 20:25 schrieb Stuart Marks
>>>> <[hidden email] <mailto:[hidden email]>>:
>>>>
>>>> On 10/31/17 5:52 PM, Bernd Eckenfels wrote:
>>>>> Having a List.of(List) copy constructor would save an additional
>>>>> array copy in the collector Which uses 
>>>>> (List<T>)List.of(list.toArray())
>>>> The quickest way to get all the elements from the source collection
>>>> is to call toArray() on it. Some copy constructors (like
>>>> ArrayList's) simply use the returned array for internal storage.
>>>> This saves a copy, but it relies on the source collection's
>>>> toArray() implementation to be correct. In particular, the returned
>>>> array must be freshly created, and that array must be of type
>>>> Object[]. If either of these is violated, it will break the ArrayList.
>>>>
>>>> The "immutable" collections behind List.of/copyOf/etc. are
>>>> fastidious about allocating their internal arrays in order to
>>>> ensure that they are referenced only from within the newly created
>>>> collection. This requires making an „extra" copy of the array
>>>> returned by toArray().
>>>>
>>>> An alternative is for the new collection to preallocate its
>>>> internal array using the source's size(), and then to copy the
>>>> elements out. But the source’s
>>>> size might change during the copy (e.g., if it’s a concurrent
>>>> collection) so this complicates things.
>>> I think the array „overhead“ would be only for the cases of zero,
>>> one and two value implementations. That seems to me not worth of
>>> optimizing…
>>>
>>>> I think the only safe way to avoid the extra copy is to create a
>>>> private interface that can be used by JDK implementations.
>>>>
>>>> s'marks
>>> -Patrick
>


Reply | Threaded
Open this post in threaded view
|

Re: RFR(m): 8177290 add copy factory methods for unmodifiable List, Set, Map

Patrick Reinhart
In reply to this post by Stuart Marks
Hi Stuart,

After having thought over your arguments about copyOf versus unmodifiableCopyOf length discussion (also with my colleague) I was thinking about why not create additional X.of(X) methods instead of X.copyOf(X). It would seem to me a logical enhancement in the sense of the existing API though.

-Patrick

> Am 02.11.2017 um 23:04 schrieb Stuart Marks <[hidden email]>:
>
>
>> Why not using:
>>
>>     coll.stream().collect(Collectors.toImmutableSet())
>>
>> As Collectors.toImmutableSet() is currently implemented, with serial Stream it will create a single HashSet, add all the elements to it and call Set.of(HashSet.toArray()) with it. Pretty much the same as what Tagir proposes, but the Collector could be made more efficient in the future and with it, the optimization would automatically extend to Set.copyOf()...
> This is mainly about whether Set.copyOf() is implemented in terms of Collectors.toUnmodifiableSet(), or vice-versa, which then calls Set.of(T[]) to do the actual creation. Some future optimization will probably replace both of these implementations with calls to JDK internal methods that can bypass the extra copying, so it doesn't really matter which one of these calls the other right now.
>
> s'marks
>

Reply | Threaded
Open this post in threaded view
|

Re: RFR(m): 8177290 add copy factory methods for unmodifiable List, Set, Map

Peter Levart
Hi Patrick,

On 11/10/2017 09:49 AM, Patrick Reinhart wrote:
> Hi Stuart,
>
> After having thought over your arguments about copyOf versus unmodifiableCopyOf length discussion (also with my colleague) I was thinking about why not create additional X.of(X) methods instead of X.copyOf(X). It would seem to me a logical enhancement in the sense of the existing API though.
>
> -Patrick

I'm sure Stuart thought of that, but decided against it because of usual
ambiguity problems that occur when two candidate .of() methods apply at
the same time, for example:

List<String> strings = List.of("a", "b", "c");

List<?> foo = List.of(strings);

What did programmer want to end up in foo, List<List<String>> or
List<String>? What did javac choose? Answers may differ.

Regards, Peter

>
>> Am 02.11.2017 um 23:04 schrieb Stuart Marks <[hidden email]>:
>>
>>
>>> Why not using:
>>>
>>>      coll.stream().collect(Collectors.toImmutableSet())
>>>
>>> As Collectors.toImmutableSet() is currently implemented, with serial Stream it will create a single HashSet, add all the elements to it and call Set.of(HashSet.toArray()) with it. Pretty much the same as what Tagir proposes, but the Collector could be made more efficient in the future and with it, the optimization would automatically extend to Set.copyOf()...
>> This is mainly about whether Set.copyOf() is implemented in terms of Collectors.toUnmodifiableSet(), or vice-versa, which then calls Set.of(T[]) to do the actual creation. Some future optimization will probably replace both of these implementations with calls to JDK internal methods that can bypass the extra copying, so it doesn't really matter which one of these calls the other right now.
>>
>> s'marks
>>

Reply | Threaded
Open this post in threaded view
|

Re: RFR(m): 8177290 add copy factory methods for unmodifiable List, Set, Map

Funda Secgin
In reply to this post by Stuart Marks
--

Fs
Reply | Threaded
Open this post in threaded view
|

Re: RFR(m): 8177290 add copy factory methods for unmodifiable List, Set, Map

John Rose-3
In reply to this post by Stuart Marks
Late to the party, but these lines rub me the wrong way:

@return the new {@code List}
@return the new {@code Set}
@return the new {@code Map}

The word "new" is a loaded term, which usually means
(or can be easily mistaken to mean) that a new object
identity is guaranteed.  Thus, "new" shouldn't be used
to specify the behavior of value-based classes.

Given that that the underlying objects are of VBCs,
and that we are encouraging programmers to rely on
the efficiency of chained copies, it should say something
like this instead:

@return a {@code List} containing the same elements as the given collection
@return a {@code Set} containing the same elements as the given collection
@return a {@code Map} containing the same mappings as the given map

(Or even s/return a/return an unmodifiable/.)

— John
Reply | Threaded
Open this post in threaded view
|

Re: RFR(m): 8177290 add copy factory methods for unmodifiable List, Set, Map

John Rose-3
In reply to this post by Patrick Reinhart
On Nov 3, 2017, at 1:26 AM, Patrick Reinhart <[hidden email]> wrote:
>
>>
>> On 11/1/2017 4:29 PM, Patrick Reinhart wrote:
>>> In this case I would prefer a non static copyOf() method on the list
>>> to create a unmodifiable list/set/map, where the optimal factory
>>> method can be called. This would also solve the problem of a
>>> concurrent implementation.

Such methods would, unfortunately, be a disastrous mistake.

We are encouraging users to rely on the unmodifiability of the
value-based lists, sets, and maps returned by copyOf, the
same way they rely on the unmodifiability of Strings.  But
if we made copyOf be non-static, then an attacker (or just
a buggy program) could introduce a broken implementation
of List that returned a time-varying List.  This would break
the contract of copyOf, but it would be impossible to detect.
The result would be a less reliable copyOf API, and
eventually some TOCTTOU security vulnerabilities.
This kind of surprise mutability is red meat for security
researchers.

So it's got to be a static method, although if List were a
class we could also choose to make copyOf be a final method.

I said "unfortunately" above because I too would prefer
fluent syntax like ls.unmodifiableCopy() or ls.frozen()
instead of List.copyOf(ls), since fluent syntax reads
and composes cleanly from left to right.

I think a good way to get what we want may be to
introduce a way for selected methods of an interface
to mark themselves (somehow) as "fluent statics",
meaning that they can be invoked with their first
argument in receiver position, before the dot.
In effect, they would act like "final default" methods
but without having to damage interfaces with final
methods.

I don't have a syntax in mind for defining a "fluent
static", but I have thought for some time that allowing
such sugar for interface statics is the best way to
adapt the notion of class final methods to interfaces.

This is not easy to do, since it is a language change,
and only applies to a small (though influential) number of
methods, those which should *not* be overridable but
*should* (for some reason of API design) support virtual
(fluent, postfix, left-to-right) notation.

For more details please see JDK-8191530 and if you
think of a good use case for these thingies, let me know
so I can add it to the write-up.

— John
Reply | Threaded
Open this post in threaded view
|

Re: RFR(m): 8177290 add copy factory methods for unmodifiable List, Set, Map

Remi Forax
Hi John,
i strongly believe that static fluent methods have no place in a blue collar language, we can do better, in fact the have already done better by introducing default method instead of extension methods like in C# or Kotlin*.

The main issue with this kind of method is that it quacks like a duck but it's not a duck, i.e. you call those static methods as virtual methods but there are no virtual methods because you can not override them.

Adding extension methods as feature is like a black hole, extension methods allow you to write clunky DSLs so it's a feature that will force the language to move to add more new features to be able to write decent DSLs (think implicit object as an example), and so on. So it sucks your design budget like a black hole.

In term of performance, the code inside a static extension methods tends to become megamorphic, in C#, Data Link (Link for Collection API) is slow because of that, in Kotlin you can workaround that by using the keyword inline which will copy the code into the callsite avoiding profile pollution.

So in my opinion, the only possible option is to introduce final default methods, i fully understand that this is non trivial to introduce this kind of methods in the VM but this discussion is a good use case for their introduction.

regards,
Rémi

* i will not criticize Kotlin designers on that, they wanted to enhance Java APIs without owning them so they had little options.

----- Mail original -----
> De: "John Rose" <[hidden email]>
> À: "Patrick Reinhart" <[hidden email]>
> Cc: "core-libs-dev" <[hidden email]>
> Envoyé: Samedi 18 Novembre 2017 07:40:05
> Objet: Re: RFR(m): 8177290 add copy factory methods for unmodifiable List, Set, Map

> On Nov 3, 2017, at 1:26 AM, Patrick Reinhart <[hidden email]> wrote:
>>
>>>
>>> On 11/1/2017 4:29 PM, Patrick Reinhart wrote:
>>>> In this case I would prefer a non static copyOf() method on the list
>>>> to create a unmodifiable list/set/map, where the optimal factory
>>>> method can be called. This would also solve the problem of a
>>>> concurrent implementation.
>
> Such methods would, unfortunately, be a disastrous mistake.
>
> We are encouraging users to rely on the unmodifiability of the
> value-based lists, sets, and maps returned by copyOf, the
> same way they rely on the unmodifiability of Strings.  But
> if we made copyOf be non-static, then an attacker (or just
> a buggy program) could introduce a broken implementation
> of List that returned a time-varying List.  This would break
> the contract of copyOf, but it would be impossible to detect.
> The result would be a less reliable copyOf API, and
> eventually some TOCTTOU security vulnerabilities.
> This kind of surprise mutability is red meat for security
> researchers.
>
> So it's got to be a static method, although if List were a
> class we could also choose to make copyOf be a final method.
>
> I said "unfortunately" above because I too would prefer
> fluent syntax like ls.unmodifiableCopy() or ls.frozen()
> instead of List.copyOf(ls), since fluent syntax reads
> and composes cleanly from left to right.
>
> I think a good way to get what we want may be to
> introduce a way for selected methods of an interface
> to mark themselves (somehow) as "fluent statics",
> meaning that they can be invoked with their first
> argument in receiver position, before the dot.
> In effect, they would act like "final default" methods
> but without having to damage interfaces with final
> methods.
>
> I don't have a syntax in mind for defining a "fluent
> static", but I have thought for some time that allowing
> such sugar for interface statics is the best way to
> adapt the notion of class final methods to interfaces.
>
> This is not easy to do, since it is a language change,
> and only applies to a small (though influential) number of
> methods, those which should *not* be overridable but
> *should* (for some reason of API design) support virtual
> (fluent, postfix, left-to-right) notation.
>
> For more details please see JDK-8191530 and if you
> think of a good use case for these thingies, let me know
> so I can add it to the write-up.
>
> — John
Reply | Threaded
Open this post in threaded view
|

Re: RFR(m): 8177290 add copy factory methods for unmodifiable List, Set, Map

Brian Goetz-2
I'm with Remi on this.  

Sent from my MacBook Wheel

> On Nov 18, 2017, at 5:41 AM, Remi Forax <[hidden email]> wrote:
>
> Hi John,
> i strongly believe that static fluent methods have no place in a blue collar language, we can do better, in fact the have already done better by introducing default method instead of extension methods like in C# or Kotlin*.
>
> The main issue with this kind of method is that it quacks like a duck but it's not a duck, i.e. you call those static methods as virtual methods but there are no virtual methods because you can not override them.
>
> Adding extension methods as feature is like a black hole, extension methods allow you to write clunky DSLs so it's a feature that will force the language to move to add more new features to be able to write decent DSLs (think implicit object as an example), and so on. So it sucks your design budget like a black hole.
>
> In term of performance, the code inside a static extension methods tends to become megamorphic, in C#, Data Link (Link for Collection API) is slow because of that, in Kotlin you can workaround that by using the keyword inline which will copy the code into the callsite avoiding profile pollution.
>
> So in my opinion, the only possible option is to introduce final default methods, i fully understand that this is non trivial to introduce this kind of methods in the VM but this discussion is a good use case for their introduction.
>
> regards,
> Rémi
>
> * i will not criticize Kotlin designers on that, they wanted to enhance Java APIs without owning them so they had little options.
>
> ----- Mail original -----
>> De: "John Rose" <[hidden email]>
>> À: "Patrick Reinhart" <[hidden email]>
>> Cc: "core-libs-dev" <[hidden email]>
>> Envoyé: Samedi 18 Novembre 2017 07:40:05
>> Objet: Re: RFR(m): 8177290 add copy factory methods for unmodifiable List, Set, Map
>
>>> On Nov 3, 2017, at 1:26 AM, Patrick Reinhart <[hidden email]> wrote:
>>>
>>>>
>>>>> On 11/1/2017 4:29 PM, Patrick Reinhart wrote:
>>>>> In this case I would prefer a non static copyOf() method on the list
>>>>> to create a unmodifiable list/set/map, where the optimal factory
>>>>> method can be called. This would also solve the problem of a
>>>>> concurrent implementation.
>>
>> Such methods would, unfortunately, be a disastrous mistake.
>>
>> We are encouraging users to rely on the unmodifiability of the
>> value-based lists, sets, and maps returned by copyOf, the
>> same way they rely on the unmodifiability of Strings.  But
>> if we made copyOf be non-static, then an attacker (or just
>> a buggy program) could introduce a broken implementation
>> of List that returned a time-varying List.  This would break
>> the contract of copyOf, but it would be impossible to detect.
>> The result would be a less reliable copyOf API, and
>> eventually some TOCTTOU security vulnerabilities.
>> This kind of surprise mutability is red meat for security
>> researchers.
>>
>> So it's got to be a static method, although if List were a
>> class we could also choose to make copyOf be a final method.
>>
>> I said "unfortunately" above because I too would prefer
>> fluent syntax like ls.unmodifiableCopy() or ls.frozen()
>> instead of List.copyOf(ls), since fluent syntax reads
>> and composes cleanly from left to right.
>>
>> I think a good way to get what we want may be to
>> introduce a way for selected methods of an interface
>> to mark themselves (somehow) as "fluent statics",
>> meaning that they can be invoked with their first
>> argument in receiver position, before the dot.
>> In effect, they would act like "final default" methods
>> but without having to damage interfaces with final
>> methods.
>>
>> I don't have a syntax in mind for defining a "fluent
>> static", but I have thought for some time that allowing
>> such sugar for interface statics is the best way to
>> adapt the notion of class final methods to interfaces.
>>
>> This is not easy to do, since it is a language change,
>> and only applies to a small (though influential) number of
>> methods, those which should *not* be overridable but
>> *should* (for some reason of API design) support virtual
>> (fluent, postfix, left-to-right) notation.
>>
>> For more details please see JDK-8191530 and if you
>> think of a good use case for these thingies, let me know
>> so I can add it to the write-up.
>>
>> — John

Reply | Threaded
Open this post in threaded view
|

Re: RFR(m): 8177290 add copy factory methods for unmodifiable List, Set, Map

John Rose-3
On Nov 18, 2017, at 11:10 AM, Brian Goetz <[hidden email]> wrote:
>
> I'm with Remi on this.  

Which suggestion of his two?  Blue-collar language with bad DSLs,
or final defaults?

— John
Reply | Threaded
Open this post in threaded view
|

Re: RFR(m): 8177290 add copy factory methods for unmodifiable List, Set, Map

Brian Goetz-2
Use-site static extension methods = bad.  

Sent from my MacBook Wheel

> On Nov 18, 2017, at 11:12 AM, John Rose <[hidden email]> wrote:
>
>> On Nov 18, 2017, at 11:10 AM, Brian Goetz <[hidden email]> wrote:
>>
>> I'm with Remi on this.  
>
> Which suggestion of his two?  Blue-collar language with bad DSLs,
> or final defaults?
>
> — John
Reply | Threaded
Open this post in threaded view
|

Re: RFR(m): 8177290 add copy factory methods for unmodifiable List, Set, Map

John Rose-3
On Nov 18, 2017, at 11:16 AM, Brian Goetz <[hidden email]> wrote:
>
> Use-site static extension methods = bad.  

Right; that wasn't on the table from either Remy or me.
All def-site stuff, so the designer of the defining interface
has full control of the DSL.  JDK-8191530 is def-site extension
methods, a compromise that meets most or all of our requirements.

(Side note:  We could enable third-party DSL stuff with value types
and specialization by using the value object as a view, and the
real type of the extended object/value as a generic parameter.
The value type acts just like a C++ smart pointer and can warp
the original API around however it likes.)


> Sent from my MacBook Wheel
>
> On Nov 18, 2017, at 11:12 AM, John Rose <[hidden email] <mailto:[hidden email]>> wrote:
>
>> On Nov 18, 2017, at 11:10 AM, Brian Goetz <[hidden email] <mailto:[hidden email]>> wrote:
>>>
>>> I'm with Remi on this.  
>>
>> Which suggestion of his two?  Blue-collar language with bad DSLs,
>> or final defaults?
>>
>> — John

Reply | Threaded
Open this post in threaded view
|

Re: RFR(m): 8177290 add copy factory methods for unmodifiable List, Set, Map

John Rose-3
In reply to this post by Stuart Marks
On Oct 30, 2017, at 6:50 PM, Stuart Marks <[hidden email]> wrote:
>
> (also includes 8184690: add Collectors for collecting into unmodifiable List, Set, and Map)

Now I'm going to be picky about the names of the Collectors;
please bear with me a moment.  Consider `toUnmodifiableList`
and its two cousins (`toUSet`, `toUMap`).

The most natural names `toList` (etc.) are already taken.
They mean "do whatever is most useful for the collector,
perhaps returning unmodifiable or modifiable results".

So the problem here is giving the user the option to say,
"no, I really want a value-based-class-style list here; I know
you can do this for me".

`Collectors.toList` can't ever be upgraded in place, since
the programmer is getting a Collector's Choice result,
and programmers will be inadvertently relying on whatever
that (undocumented) choice happens to be.

I think this makes the name `toList` significantly less useful,
and I for one will never want to use `toList` again, once
`toUnmodifiableList` is available.

Meanwhile, `toUnmodifiableList` is so ugly to the ear,
eye, and fingers that it will be less used just because of
its long spelling.

Users should be allowed to code as if unmodifiability is a
common practice, not something that must be re-announced
every time I make a new temporary list.  That's why `copyOf`
and not `unmodifiableCopyOf` is the right move, and
`toUnmodifiableList` is not.

(See also Gafter's blog[1] on making NEW FEATURES look NEW.
The NEW UNMODIFIABILITY will soon cease to be new but in
its present form it will shout forever like an aging rocker.)

To get out of this bind, I suggest picking a shorter name that
emphasizes that the result is the _elements_ in the list, not the
list object per se.  This is a VBC[2] move, one which we need to
learn to make more often, especially as value types arrive.
So I suggest these names for the collectors:  `elements` (for
a list), `elementSet`, and `elementMap`.  Same signatures,
of course.

Another approach to this, which recycles the old names but
doesn't have the VBC vibe, is to create a second set of
factories for list/set/map collectors that look just like the
old ones, but behave more predictably.

For example, one might try to add methods to `Collector`
named `asModifiable` and `asUnmodifiable`, so people can
write `toList().asModifiable()`.  I don't think this works, nor
is it very pretty (looks too "NEW"), although it provides a
more principled fix to the Collector's Choice problem.

If we go forward with the change as written, I think people will
only want to remember the one `toList` and forget about his ugly
brother.  Only zealots like me will remember to add the extra
incantation ("NEW!  now Unmodifiable!").

Another, better way to recycle the name `toList` is to provide
`Collectors.Modifiable` and `Collectors.Unmodifiable`, and then
have users import `Collectors.Unmodifiable.toList`.  This at least
would move the shouting up to the static imports list and out of
real code.

But I think `toList` is broken and should be avoided.  So I
think it should be `elements`, `elementSet`, and `elementMap`.

Even if you think that unmodifiability should be mentioned every
time a collector collects some stuff, observe that `toUnmodifiableList`
is not a strong parallel with the older API point, `Collections.unmodifiableList`.
That guy creates a _view_ while `copyOf` and `elements` (my name)
create non-view unmodifiable (VBC, shallowly immutable) lists.
So `Collectors.toUnmodifiableList` is both ugly and ill-sorted with `copyOf`.

— John

[1]: http://gafter.blogspot.com/2017/06/making-new-language-features-stand-out.html
[2]: https://docs.oracle.com/javase/8/docs/api/java/lang/doc-files/ValueBased.html

12