Re: RFR: 8261731: shallow copy the internal buffer of a scalar-replaced java.lang.String object

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

Re: RFR: 8261731: shallow copy the internal buffer of a scalar-replaced java.lang.String object

Richard Reingruber
On Mon, 15 Feb 2021 09:22:00 GMT, Xin Liu <[hidden email]> wrote:

> There are 3 nodes involving in the construction of a java.lang.String object.
> 1. Allocate of itself, aka. alloc
> 2. AllocateArray of a byte array, which is value:byte[], aka. aa
> 3. ArrayCopyNode which copys in the contents of value, aka. ac
>
> Lemma
> When a String object `alloc` is scalar replaced, C2 can eliminate `aa` and `ac`.
>
> Because `alloc` is scalar replaced, it must be non-escaped. The field value:byte[] of j.l.String cannot be seen by external world, therefore it must not be global escaped. Because the buffer is marked as stable, it is safe to assume its contents are whatever ac copies in. Because all public java.lang.String constructors clone the incoming array, the source of `ac` is stable as well.
>
> It is possible to rewire `aa` to the source of ac with the correct offset. That is to say, we can replace both `aa` and `ac` with a “shallow copy” of the source of `ac`. It’s safe if C2 keeps a reference of the source oop for all safepoints.

Hi,

this is a smart optimization.

>
>
> There are 3 nodes involving in the construction of a java.lang.String object.
>
>     1. Allocate of itself, aka. alloc
>
>     2. AllocateArray of a byte array, which is value:byte[], aka. aa
>
>     3. ArrayCopyNode which copys in the contents of value, aka. ac
>
>
> Lemma
> When a String object `alloc` is scalar replaced, C2 can eliminate `aa` and `ac`.
>
> Because `alloc` is scalar replaced, it must be non-escaped. The field value:byte[] of j.l.String cannot be seen by external world, therefore it must not be global escaped. Because the buffer is marked as stable, it is safe to assume its contents are whatever ac copies in. Because all public java.lang.String constructors clone the incoming array, the source of `ac` is stable as well.

Are you saying that the source of `ac` cannot be accessed by another thread because of the cloning in the constructor? But the resulting string instance which is used to construct the non-escape instance can be GlobalEscape and then the source of `ac` is accessible to other threads, isn't it?

-------------

PR: https://git.openjdk.java.net/jdk/pull/2570
Reply | Threaded
Open this post in threaded view
|

Re: RFR: 8261731: shallow copy the internal buffer of a scalar-replaced java.lang.String object

Xin Liu
On Tue, 16 Feb 2021 08:14:22 GMT, Richard Reingruber <[hidden email]> wrote:

> Hi,
>
> this is a smart optimization.
>
> > There are 3 nodes involving in the construction of a java.lang.String object.
> > ```
> > 1. Allocate of itself, aka. alloc
> >
> > 2. AllocateArray of a byte array, which is value:byte[], aka. aa
> >
> > 3. ArrayCopyNode which copys in the contents of value, aka. ac
> > ```
> >
> >
> > Lemma
> > When a String object `alloc` is scalar replaced, C2 can eliminate `aa` and `ac`.
> > Because `alloc` is scalar replaced, it must be non-escaped. The field value:byte[] of j.l.String cannot be seen by external world, therefore it must not be global escaped. Because the buffer is marked as stable, it is safe to assume its contents are whatever ac copies in. Because all public java.lang.String constructors clone the incoming array, the source of `ac` is stable as well.
>
> Are you saying that the source of `ac` cannot be accessed by another thread because of the cloning in the constructor? But the resulting string instance which is used to construct the non-escape instance can be GlobalEscape and then the source of `ac` is accessible to other threads, isn't it?

Hi, @reinrich
You might not know, but I learn how to reallocate an object in deoptimization from your previous patches. Thank you!

You are right. The source of ac (`src`) might be escaped.  I didn't say other threads can't access it. I said we need to guarantee the `src` is stable,  or  it's a "frozen" array in JDK-8261007.

To be honest, I didn't check the src is frozen. In practice, I only see ArrayCopy in construction [here](https://github.com/openjdk/jdk/blob/master/src/java.base/share/classes/java/lang/StringLatin1.java#L769).This method is creating a new substring from an established array, so its value is `stable`.

After I read your comment, I went through String.java. I do find one open-ended constructor. yes, it's a problem!
https://github.com/openjdk/jdk/blob/master/src/java.base/share/classes/java/lang/String.java#L395

If the frozen attribute is not present, what I can come up with. an array is "stable". What's do you think?
1) has annotation "stable" or
2) it's non-escaped and
3) can't find any store nodes along its mem stream.

-------------

PR: https://git.openjdk.java.net/jdk/pull/2570
Reply | Threaded
Open this post in threaded view
|

Re: RFR: 8261731: shallow copy the internal buffer of a scalar-replaced java.lang.String object

Richard Reingruber
On Tue, 16 Feb 2021 20:56:32 GMT, Xin Liu <[hidden email]> wrote:

>> Hi,
>>
>> this is a smart optimization.
>>
>>>
>>>
>>> There are 3 nodes involving in the construction of a java.lang.String object.
>>>
>>>     1. Allocate of itself, aka. alloc
>>>
>>>     2. AllocateArray of a byte array, which is value:byte[], aka. aa
>>>
>>>     3. ArrayCopyNode which copys in the contents of value, aka. ac
>>>
>>>
>>> Lemma
>>> When a String object `alloc` is scalar replaced, C2 can eliminate `aa` and `ac`.
>>>
>>> Because `alloc` is scalar replaced, it must be non-escaped. The field value:byte[] of j.l.String cannot be seen by external world, therefore it must not be global escaped. Because the buffer is marked as stable, it is safe to assume its contents are whatever ac copies in. Because all public java.lang.String constructors clone the incoming array, the source of `ac` is stable as well.
>>
>> Are you saying that the source of `ac` cannot be accessed by another thread because of the cloning in the constructor? But the resulting string instance which is used to construct the non-escape instance can be GlobalEscape and then the source of `ac` is accessible to other threads, isn't it?
>
>> Hi,
>>
>> this is a smart optimization.
>>
>> > There are 3 nodes involving in the construction of a java.lang.String object.
>> > ```
>> > 1. Allocate of itself, aka. alloc
>> >
>> > 2. AllocateArray of a byte array, which is value:byte[], aka. aa
>> >
>> > 3. ArrayCopyNode which copys in the contents of value, aka. ac
>> > ```
>> >
>> >
>> > Lemma
>> > When a String object `alloc` is scalar replaced, C2 can eliminate `aa` and `ac`.
>> > Because `alloc` is scalar replaced, it must be non-escaped. The field value:byte[] of j.l.String cannot be seen by external world, therefore it must not be global escaped. Because the buffer is marked as stable, it is safe to assume its contents are whatever ac copies in. Because all public java.lang.String constructors clone the incoming array, the source of `ac` is stable as well.
>>
>> Are you saying that the source of `ac` cannot be accessed by another thread because of the cloning in the constructor? But the resulting string instance which is used to construct the non-escape instance can be GlobalEscape and then the source of `ac` is accessible to other threads, isn't it?
>
> Hi, @reinrich
> You might not know, but I learn how to reallocate an object in deoptimization from your previous patches. Thank you!
>
> You are right. The source of ac (`src`) might be escaped.  I didn't say other threads can't access it. I said we need to guarantee the `src` is stable,  or  it's a "frozen" array in JDK-8261007.
>
> To be honest, I didn't check the src is frozen. In practice, I only see ArrayCopy in construction [here](https://github.com/openjdk/jdk/blob/master/src/java.base/share/classes/java/lang/StringLatin1.java#L769).This method is creating a new substring from an established array, so its value is `stable`.
>
> After I read your comment, I went through String.java. I do find one open-ended constructor. yes, it's a problem!
> https://github.com/openjdk/jdk/blob/master/src/java.base/share/classes/java/lang/String.java#L395
>
> If the frozen attribute is not present, what I can come up with. an array is "stable". What's do you think?
> 1) has annotation "stable" or
> 2) it's non-escaped and
> 3) can't find any store nodes along its mem stream.

Hi @navyxliu,

> You might not know, but I learn how to reallocate an object in deoptimization from your previous patches. Thank you!

Oh, great! :)

> You are right. The source of ac (`src`) might be escaped. I didn't say other threads can't access it. I said we need to guarantee the `src` is stable, or it's a "frozen" array in JDK-8261007.
>
> To be honest, I didn't check the src is frozen. In practice, I only see ArrayCopy in construction [here](https://github.com/openjdk/jdk/blob/master/src/java.base/share/classes/java/lang/StringLatin1.java#L769).This method is creating a new substring from an established array, so its value is `stable`.

Sorry, I wasn't aware of the @Stable annotation on the value array.

https://github.com/openjdk/jdk/blob/d19503353e5c347ce393544a3a30d5caec53d133/src/java.base/share/classes/java/lang/String.java#L154

So my concern was that another thread could for example use reflection to modify
the value array but I reckon this is illegal then (I wonder if it is checked in
reflection...).

Also there is already UseStringDeduplication that relies on the value array being stable.

> After I read your comment, I went through String.java. I do find one open-ended constructor. yes, it's a problem!
> https://github.com/openjdk/jdk/blob/master/src/java.base/share/classes/java/lang/String.java#L395

I see this one is deprecated. Why do you think there's a problem?

> If the frozen attribute is not present, what I can come up with. an array is "stable". What's do you think?
>
>     1. has annotation "stable" or
>
>     2. it's non-escaped and
>
>     3. can't find any store nodes along its mem stream.

Currently I'd think this is not necessary. `@Stable` is strong enough.

The optimization reminds me a bit of earlier Java versions where String
instances had an offset field and the value array could be shared among String
instances.

I guess the new String needs to be scalarized mostly because you cannot take
care of the offset otherwise.

Coincidentally I'm currently looking at
[LoadNode::can_see_arraycopy_value()](https://github.com/openjdk/jdk/blob/b955f85e03bafe8ce39677d0af06bf1ceb7e2cbb/src/hotspot/share/opto/memnode.cpp#L951)
which does something similar you want to do. There I don't see any concerns
about the src being stable. In fact I don't think this is correct. Just a side
mark...

These are my (somewhat random) thoughts so far. I'd think it is a legal and useful optimization (though haven't yet looked at the code yet).

Thanks, Richard.

-------------

PR: https://git.openjdk.java.net/jdk/pull/2570
Reply | Threaded
Open this post in threaded view
|

Re: RFR: 8261731: shallow copy the internal buffer of a scalar-replaced java.lang.String object

Volker Simonis-3
On Wed, 17 Feb 2021 09:00:25 GMT, Richard Reingruber <[hidden email]> wrote:

>>> Hi,
>>>
>>> this is a smart optimization.
>>>
>>> > There are 3 nodes involving in the construction of a java.lang.String object.
>>> > ```
>>> > 1. Allocate of itself, aka. alloc
>>> >
>>> > 2. AllocateArray of a byte array, which is value:byte[], aka. aa
>>> >
>>> > 3. ArrayCopyNode which copys in the contents of value, aka. ac
>>> > ```
>>> >
>>> >
>>> > Lemma
>>> > When a String object `alloc` is scalar replaced, C2 can eliminate `aa` and `ac`.
>>> > Because `alloc` is scalar replaced, it must be non-escaped. The field value:byte[] of j.l.String cannot be seen by external world, therefore it must not be global escaped. Because the buffer is marked as stable, it is safe to assume its contents are whatever ac copies in. Because all public java.lang.String constructors clone the incoming array, the source of `ac` is stable as well.
>>>
>>> Are you saying that the source of `ac` cannot be accessed by another thread because of the cloning in the constructor? But the resulting string instance which is used to construct the non-escape instance can be GlobalEscape and then the source of `ac` is accessible to other threads, isn't it?
>>
>> Hi, @reinrich
>> You might not know, but I learn how to reallocate an object in deoptimization from your previous patches. Thank you!
>>
>> You are right. The source of ac (`src`) might be escaped.  I didn't say other threads can't access it. I said we need to guarantee the `src` is stable,  or  it's a "frozen" array in JDK-8261007.
>>
>> To be honest, I didn't check the src is frozen. In practice, I only see ArrayCopy in construction [here](https://github.com/openjdk/jdk/blob/master/src/java.base/share/classes/java/lang/StringLatin1.java#L769).This method is creating a new substring from an established array, so its value is `stable`.
>>
>> After I read your comment, I went through String.java. I do find one open-ended constructor. yes, it's a problem!
>> https://github.com/openjdk/jdk/blob/master/src/java.base/share/classes/java/lang/String.java#L395
>>
>> If the frozen attribute is not present, what I can come up with. an array is "stable". What's do you think?
>> 1) has annotation "stable" or
>> 2) it's non-escaped and
>> 3) can't find any store nodes along its mem stream.
>
> Hi @navyxliu,
>
>> You might not know, but I learn how to reallocate an object in deoptimization from your previous patches. Thank you!
>
> Oh, great! :)
>
>> You are right. The source of ac (`src`) might be escaped. I didn't say other threads can't access it. I said we need to guarantee the `src` is stable, or it's a "frozen" array in JDK-8261007.
>>
>> To be honest, I didn't check the src is frozen. In practice, I only see ArrayCopy in construction [here](https://github.com/openjdk/jdk/blob/master/src/java.base/share/classes/java/lang/StringLatin1.java#L769).This method is creating a new substring from an established array, so its value is `stable`.
>
> Sorry, I wasn't aware of the @Stable annotation on the value array.
>
> https://github.com/openjdk/jdk/blob/d19503353e5c347ce393544a3a30d5caec53d133/src/java.base/share/classes/java/lang/String.java#L154
>
> So my concern was that another thread could for example use reflection to modify
> the value array but I reckon this is illegal then (I wonder if it is checked in
> reflection...).
>
> Also there is already UseStringDeduplication that relies on the value array being stable.
>
>> After I read your comment, I went through String.java. I do find one open-ended constructor. yes, it's a problem!
>> https://github.com/openjdk/jdk/blob/master/src/java.base/share/classes/java/lang/String.java#L395
>
> I see this one is deprecated. Why do you think there's a problem?
>
>> If the frozen attribute is not present, what I can come up with. an array is "stable". What's do you think?
>>
>>     1. has annotation "stable" or
>>
>>     2. it's non-escaped and
>>
>>     3. can't find any store nodes along its mem stream.
>
> Currently I'd think this is not necessary. `@Stable` is strong enough.
>
> The optimization reminds me a bit of earlier Java versions where String
> instances had an offset field and the value array could be shared among String
> instances.
>
> I guess the new String needs to be scalarized mostly because you cannot take
> care of the offset otherwise.
>
> Coincidentally I'm currently looking at
> [LoadNode::can_see_arraycopy_value()](https://github.com/openjdk/jdk/blob/b955f85e03bafe8ce39677d0af06bf1ceb7e2cbb/src/hotspot/share/opto/memnode.cpp#L951)
> which does something similar you want to do. There I don't see any concerns
> about the src being stable. In fact I don't think this is correct. Just a side
> mark...
>
> These are my (somewhat random) thoughts so far. I'd think it is a legal and useful optimization (though haven't yet looked at the code yet).
>
> Thanks, Richard.

What about GC? What happens if the original String isn't reachable any more? Do you put a reference to the byte array into the corresponding oop maps to make sure it can't be garbage collected?

-------------

PR: https://git.openjdk.java.net/jdk/pull/2570
Reply | Threaded
Open this post in threaded view
|

Re: RFR: 8261731: shallow copy the internal buffer of a scalar-replaced java.lang.String object

Xin Liu
On Wed, 17 Feb 2021 17:46:54 GMT, Volker Simonis <[hidden email]> wrote:

> What about GC? What happens if the original String isn't reachable any more? Do you put a reference to the byte array into the corresponding oop maps to make sure it can't be garbage collected?

hi, Volker,

I don't know OopMap very much.  IMHO, it should be discovered by GC because I put the byte array oop to `jvm->locals`
I do that for all safepoints nodes.

          // 1. src oop
          sfpt->add_req(ac->in(ArrayCopyNode::Src));
https://github.com/openjdk/jdk/pull/2570/files#diff-2faebd05d08f9115f8d9ef771644cf05087a6986c2f9013d7163c6aa720169c3R995

-------------

PR: https://git.openjdk.java.net/jdk/pull/2570