RFR(L): 8180032: Unaligned pointer dereference in ClassFileParser

classic Classic list List threaded Threaded
17 messages Options
Reply | Threaded
Open this post in threaded view
|

RFR(L): 8180032: Unaligned pointer dereference in ClassFileParser

Mikael Vidstedt-3

Warning: It may be wise to stock up on coffee or tea before reading this.

Bug: https://bugs.openjdk.java.net/browse/JDK-8180032
Webrev: http://cr.openjdk.java.net/~mikael/webrevs/8180032/webrev.00/hotspot/webrev/ <http://cr.openjdk.java.net/~mikael/webrevs/8180032/webrev.00/hotspot/webrev/>

* Background (from the JBS description)

x86 is normally very forgiving when it comes to dereferencing unaligned pointers. Even if the pointer isn’t aligned on the natural size of the element being accessed, the hardware will do the Right Thing(tm) and nobody gets hurt. However, turns out there are exceptions to this. Specifically, SSA2 introduced the movdqa instruction, which is a 128-bit load/store which *does* require that the pointer is 128-bit aligned.

Normally this isn’t a problem, because after all we don’t typically use 128-bit data types in our C/C++ code. However, just because we don’t use any such data types explicitly, there’s nothing preventing the C compiler to do so under the covers. Specifically, the C compiler tends to do loop unrolling and vectorization, which can turn pretty much any data access into vectorized SIMD accesses.

We’ve actually run into a variation on this exact same problem a while back when upgrading to gcc 4.9.2. That time the problem (as captured in JDK-8141491) was in nio/Bits.c, and it was fixed by moving the copy functionality into hotspot (copy.[ch]pp), making sure the copy logic does the relevant alignment checks etc.

This time the problem is with ClassFileParser. Or more accurately, it’s in the methods ClassFileParser makes use of. Specifically, the problem is with the copy_u2_with_conversion method, used to copy various data from the class file and put it in the “native” endian order in memory. It, in turn, uses Bytes::get_Java_u2 to read and potentially byte swap a 16-bit entry from the class file. bytes_x86.hpp has this to say about its implementation:

// Efficient reading and writing of unaligned unsigned data in platform-specific byte ordering
// (no special code is needed since x86 CPUs can access unaligned data)

While that is /almost/ always true for the x86 architecture in itself, the C standard still expects accesses to be aligned, and the C compiler is free to make use of that expectation to, for example, vectorize operations and make use of the movdqa instruction.

I noticed this when working on the portola/musl port, and in that environment the bug is tickled immediately. Why is it only a problem with musl? Turns out it sorta isn’t. It seems that this isn't an actual problem on any of the current platforms/toolchains we're using, but it's a latent bug which may be triggered at any point.

bytes_x86.hpp will, in the end, actually use system library functions to do byte swapping. Specifically, on linux_x86 it will come down to bytes_linux_x86.inline.hpp which, on AMD64, uses the system <byteswap.h> functions/macros swap_u{16,32,64} to do the actual byte swapping. Now here’s the “funny” & interesting part:

With glibc, the swap_u{16,32,64} methods are implemented using inline assembly - in the end it comes down to an inline rotate “rorw” instruction. Since GCC can’t see through the inline assembly, it will not realize that there are loop unrolling/vectorization opportunities, and of specific interest to us: the movdqa instruction will not be used. The code will potentially not be as efficient as it could be, but it will be functional.

With musl, the swap methods are instead implemented as normal macros, shifting bits around to achieve the desired effect. GCC recognizes the bit shifting patterns, will realize that it’s just byte swapping a bunch of values, will vectorize the loop, and *will* make use of the movdqa instruction. Kaboom.

To recap: dereferencing unaligned pointers in C/C++ is a no-no, even in cases where you think it should be okay. With the existing compilers and header files we are not currently running into this problem, but even a small change in the byte swap implementation exposes the problem.



* About the change

The key changes are in three different areas:

1. copy.[ch]pp

Introducing: conjoint_swap_if_needed

conjoint_swap_if_needed copies data, and bytes wap it on-the-fly if the specified endianness differs from the native/CPU endianness. It does this by either delegating to conjoint_swap (on endian mismatch), or conjoint_copy (on match). In copy.cpp, the changes all boil down to making the innermost do_conjoint_swap method more flexible so that it can be reused for both cases (straight copy as well as copy+swap).

2. classFile{Parser,Stream}

The key change is in classFileParser.cpp, switching to copying data from the class file using the new conjoint_swap_if_needed method, replacing the loop implemented in copy_u2_with_conversion/Bytes::get_Java_u2.

However, in addition to that change, I noticed that there are a lot of u2* passed around in the code, pointers which are not necessarily 16-bit aligned. While there’s nothing wrong with *having* an unaligned pointer in C - as long as it’s not dereferenced everything is peachy - it made me uneasy to see it passed around and used the way it is. Specifically, ClassFileStream::get_u2_buffer() could, to the untrained eye, be a bit misleading. One could accidentally and incorrectly assume that the returned pointer is, in fact, 16-bit aligned and start dereferencing it directly, where in fact there is no such guarantee. Perhaps even use it as an array and attract the wrath of the C compiler.

Changing to void* may or may not be the right thing to do here. In a way I’d actually like to “carry” the type information, but in some way still prevent the pointer from being directly dereferenced. Taking suggestions.


3. bytes_x86.hpp

This is addressing the wider concern that other parts of hotspot may use the same primitives in much the same (potentially broken) way, and in particular the fact that the get/put primitives aren’t checking whether the pointer argument is aligned before they dereference it. It may well be that a simple assert or two would do the trick here. That said:

It turns out that the various platforms all have their own unique ways of implementing bytes.hpp, duplicating some logic which could/should be platform independent. I tried to clean up and unify it all a bit while at it by introducing an Endian helper class in bytes.hpp. The primitives for accessing data in memory now check for alignment and either perform the raw memory access (when the pointer is aligned), or does a memcpy (if unaligned). There’s some template “magic” in there avoid duplicating code, but hopefully the magic is relatively straightforward.


* Testing

I’ve run some basic testing on this, but I’m very much looking for advice on which tests to run. Let me know if you have any suggestions!

Cheers,
Mikael

Reply | Threaded
Open this post in threaded view
|

Re: RFR(L): 8180032: Unaligned pointer dereference in ClassFileParser

David Holmes
Hi Mikael,

On 10/05/2017 8:40 AM, Mikael Vidstedt wrote:
>
> Warning: It may be wise to stock up on coffee or tea before reading this.
>
> Bug: https://bugs.openjdk.java.net/browse/JDK-8180032
> Webrev: http://cr.openjdk.java.net/~mikael/webrevs/8180032/webrev.00/hotspot/webrev/ <http://cr.openjdk.java.net/~mikael/webrevs/8180032/webrev.00/hotspot/webrev/>

Overall this looks good to me. I like the refactoring from Bytes to
Endian - that simplifies things a lot I think.

The actual "copy/swap" changes and templates I'm not an expert on but I
get the gist and they seemed okay.

Was a little unsure about all the changes to void* from u2*/u1* in
classFileParser.h/cpp - does that just simplify use of the copy/swap
code? Though I see some casts to u2* are no longer necessary as well.

A couple of oddities I noticed:

src/share/vm/classfile/classFileStream.hpp

Without the get_u2_buffer/get_u1_buffer distinction get_u1_buffer seems
superfluous and all uses can be replaced by the existing current() accessor.

---

src/share/vm/classfile/classFileParser.cpp

We do we have void* here:

1707   const void* const exception_table_start = cfs->get_u1_buffer();

but u1* here:

1845   const u1* const localvariable_table_start = cfs->get_u1_buffer();

Thanks,
David

> * Background (from the JBS description)
>
> x86 is normally very forgiving when it comes to dereferencing unaligned pointers. Even if the pointer isn’t aligned on the natural size of the element being accessed, the hardware will do the Right Thing(tm) and nobody gets hurt. However, turns out there are exceptions to this. Specifically, SSA2 introduced the movdqa instruction, which is a 128-bit load/store which *does* require that the pointer is 128-bit aligned.
>
> Normally this isn’t a problem, because after all we don’t typically use 128-bit data types in our C/C++ code. However, just because we don’t use any such data types explicitly, there’s nothing preventing the C compiler to do so under the covers. Specifically, the C compiler tends to do loop unrolling and vectorization, which can turn pretty much any data access into vectorized SIMD accesses.
>
> We’ve actually run into a variation on this exact same problem a while back when upgrading to gcc 4.9.2. That time the problem (as captured in JDK-8141491) was in nio/Bits.c, and it was fixed by moving the copy functionality into hotspot (copy.[ch]pp), making sure the copy logic does the relevant alignment checks etc.
>
> This time the problem is with ClassFileParser. Or more accurately, it’s in the methods ClassFileParser makes use of. Specifically, the problem is with the copy_u2_with_conversion method, used to copy various data from the class file and put it in the “native” endian order in memory. It, in turn, uses Bytes::get_Java_u2 to read and potentially byte swap a 16-bit entry from the class file. bytes_x86.hpp has this to say about its implementation:
>
> // Efficient reading and writing of unaligned unsigned data in platform-specific byte ordering
> // (no special code is needed since x86 CPUs can access unaligned data)
>
> While that is /almost/ always true for the x86 architecture in itself, the C standard still expects accesses to be aligned, and the C compiler is free to make use of that expectation to, for example, vectorize operations and make use of the movdqa instruction.
>
> I noticed this when working on the portola/musl port, and in that environment the bug is tickled immediately. Why is it only a problem with musl? Turns out it sorta isn’t. It seems that this isn't an actual problem on any of the current platforms/toolchains we're using, but it's a latent bug which may be triggered at any point.
>
> bytes_x86.hpp will, in the end, actually use system library functions to do byte swapping. Specifically, on linux_x86 it will come down to bytes_linux_x86.inline.hpp which, on AMD64, uses the system <byteswap.h> functions/macros swap_u{16,32,64} to do the actual byte swapping. Now here’s the “funny” & interesting part:
>
> With glibc, the swap_u{16,32,64} methods are implemented using inline assembly - in the end it comes down to an inline rotate “rorw” instruction. Since GCC can’t see through the inline assembly, it will not realize that there are loop unrolling/vectorization opportunities, and of specific interest to us: the movdqa instruction will not be used. The code will potentially not be as efficient as it could be, but it will be functional.
>
> With musl, the swap methods are instead implemented as normal macros, shifting bits around to achieve the desired effect. GCC recognizes the bit shifting patterns, will realize that it’s just byte swapping a bunch of values, will vectorize the loop, and *will* make use of the movdqa instruction. Kaboom.
>
> To recap: dereferencing unaligned pointers in C/C++ is a no-no, even in cases where you think it should be okay. With the existing compilers and header files we are not currently running into this problem, but even a small change in the byte swap implementation exposes the problem.
>
>
>
> * About the change
>
> The key changes are in three different areas:
>
> 1. copy.[ch]pp
>
> Introducing: conjoint_swap_if_needed
>
> conjoint_swap_if_needed copies data, and bytes wap it on-the-fly if the specified endianness differs from the native/CPU endianness. It does this by either delegating to conjoint_swap (on endian mismatch), or conjoint_copy (on match). In copy.cpp, the changes all boil down to making the innermost do_conjoint_swap method more flexible so that it can be reused for both cases (straight copy as well as copy+swap).
>
> 2. classFile{Parser,Stream}
>
> The key change is in classFileParser.cpp, switching to copying data from the class file using the new conjoint_swap_if_needed method, replacing the loop implemented in copy_u2_with_conversion/Bytes::get_Java_u2.
>
> However, in addition to that change, I noticed that there are a lot of u2* passed around in the code, pointers which are not necessarily 16-bit aligned. While there’s nothing wrong with *having* an unaligned pointer in C - as long as it’s not dereferenced everything is peachy - it made me uneasy to see it passed around and used the way it is. Specifically, ClassFileStream::get_u2_buffer() could, to the untrained eye, be a bit misleading. One could accidentally and incorrectly assume that the returned pointer is, in fact, 16-bit aligned and start dereferencing it directly, where in fact there is no such guarantee. Perhaps even use it as an array and attract the wrath of the C compiler.
>
> Changing to void* may or may not be the right thing to do here. In a way I’d actually like to “carry” the type information, but in some way still prevent the pointer from being directly dereferenced. Taking suggestions.
>
>
> 3. bytes_x86.hpp
>
> This is addressing the wider concern that other parts of hotspot may use the same primitives in much the same (potentially broken) way, and in particular the fact that the get/put primitives aren’t checking whether the pointer argument is aligned before they dereference it. It may well be that a simple assert or two would do the trick here. That said:
>
> It turns out that the various platforms all have their own unique ways of implementing bytes.hpp, duplicating some logic which could/should be platform independent. I tried to clean up and unify it all a bit while at it by introducing an Endian helper class in bytes.hpp. The primitives for accessing data in memory now check for alignment and either perform the raw memory access (when the pointer is aligned), or does a memcpy (if unaligned). There’s some template “magic” in there avoid duplicating code, but hopefully the magic is relatively straightforward.
>
>
> * Testing
>
> I’ve run some basic testing on this, but I’m very much looking for advice on which tests to run. Let me know if you have any suggestions!
>
> Cheers,
> Mikael
>
Reply | Threaded
Open this post in threaded view
|

Re: RFR(L): 8180032: Unaligned pointer dereference in ClassFileParser

Mikael Vidstedt-3

New webrevs:

full: http://cr.openjdk.java.net/~mikael/webrevs/8180032/webrev.01/hotspot/webrev/
incremental (from webrev.00): http://cr.openjdk.java.net/~mikael/webrevs/8180032/webrev.01.incr/hotspot/webrev/

I definitely want a second reviewer of this, and I’m (still) taking suggestions on tests to run etc.!

Also, comments inline below..

> On May 9, 2017, at 5:12 PM, David Holmes <[hidden email]> wrote:
>
> Hi Mikael,
>
> On 10/05/2017 8:40 AM, Mikael Vidstedt wrote:
>>
>> Warning: It may be wise to stock up on coffee or tea before reading this.
>>
>> Bug: https://bugs.openjdk.java.net/browse/JDK-8180032
>> Webrev: http://cr.openjdk.java.net/~mikael/webrevs/8180032/webrev.00/hotspot/webrev/ <http://cr.openjdk.java.net/~mikael/webrevs/8180032/webrev.00/hotspot/webrev/>
>
> Overall this looks good to me. I like the refactoring from Bytes to Endian - that simplifies things a lot I think.

Thanks, I like it too ;)

> The actual "copy/swap" changes and templates I'm not an expert on but I get the gist and they seemed okay.
>
> Was a little unsure about all the changes to void* from u2*/u1* in classFileParser.h/cpp - does that just simplify use of the copy/swap code? Though I see some casts to u2* are no longer necessary as well.

Right, I could go either way here, but when I personally see a u2* I feel tempted to just dereference it, and that’s not valid in these cases. With void* it’s more obvious that you need to do something else to get the data, but the width/type of the underlying data is lost. It may make sense to introduce a few helpful typedefs of void* which include in their names the underlying data type, or something along those lines.

I suggest wrapping up and pushing what I have and working on improving the story here as a separate change. Reasonable?

> A couple of oddities I noticed:
>
> src/share/vm/classfile/classFileStream.hpp
>
> Without the get_u2_buffer/get_u1_buffer distinction get_u1_buffer seems superfluous and all uses can be replaced by the existing current() accessor.

Good point, get_u1_buffer can be removed (and it’s gone in the new webrev).

>
> ---
>
> src/share/vm/classfile/classFileParser.cpp
>
> We do we have void* here:
>
> 1707   const void* const exception_table_start = cfs->get_u1_buffer();
>
> but u1* here:
>
> 1845   const u1* const localvariable_table_start = cfs->get_u1_buffer();

Good catch. Changed to void*.

Cheers,
Mikael

>
> Thanks,
> David
>
>> * Background (from the JBS description)
>>
>> x86 is normally very forgiving when it comes to dereferencing unaligned pointers. Even if the pointer isn’t aligned on the natural size of the element being accessed, the hardware will do the Right Thing(tm) and nobody gets hurt. However, turns out there are exceptions to this. Specifically, SSA2 introduced the movdqa instruction, which is a 128-bit load/store which *does* require that the pointer is 128-bit aligned.
>>
>> Normally this isn’t a problem, because after all we don’t typically use 128-bit data types in our C/C++ code. However, just because we don’t use any such data types explicitly, there’s nothing preventing the C compiler to do so under the covers. Specifically, the C compiler tends to do loop unrolling and vectorization, which can turn pretty much any data access into vectorized SIMD accesses.
>>
>> We’ve actually run into a variation on this exact same problem a while back when upgrading to gcc 4.9.2. That time the problem (as captured in JDK-8141491) was in nio/Bits.c, and it was fixed by moving the copy functionality into hotspot (copy.[ch]pp), making sure the copy logic does the relevant alignment checks etc.
>>
>> This time the problem is with ClassFileParser. Or more accurately, it’s in the methods ClassFileParser makes use of. Specifically, the problem is with the copy_u2_with_conversion method, used to copy various data from the class file and put it in the “native” endian order in memory. It, in turn, uses Bytes::get_Java_u2 to read and potentially byte swap a 16-bit entry from the class file. bytes_x86.hpp has this to say about its implementation:
>>
>> // Efficient reading and writing of unaligned unsigned data in platform-specific byte ordering
>> // (no special code is needed since x86 CPUs can access unaligned data)
>>
>> While that is /almost/ always true for the x86 architecture in itself, the C standard still expects accesses to be aligned, and the C compiler is free to make use of that expectation to, for example, vectorize operations and make use of the movdqa instruction.
>>
>> I noticed this when working on the portola/musl port, and in that environment the bug is tickled immediately. Why is it only a problem with musl? Turns out it sorta isn’t. It seems that this isn't an actual problem on any of the current platforms/toolchains we're using, but it's a latent bug which may be triggered at any point.
>>
>> bytes_x86.hpp will, in the end, actually use system library functions to do byte swapping. Specifically, on linux_x86 it will come down to bytes_linux_x86.inline.hpp which, on AMD64, uses the system <byteswap.h> functions/macros swap_u{16,32,64} to do the actual byte swapping. Now here’s the “funny” & interesting part:
>>
>> With glibc, the swap_u{16,32,64} methods are implemented using inline assembly - in the end it comes down to an inline rotate “rorw” instruction. Since GCC can’t see through the inline assembly, it will not realize that there are loop unrolling/vectorization opportunities, and of specific interest to us: the movdqa instruction will not be used. The code will potentially not be as efficient as it could be, but it will be functional.
>>
>> With musl, the swap methods are instead implemented as normal macros, shifting bits around to achieve the desired effect. GCC recognizes the bit shifting patterns, will realize that it’s just byte swapping a bunch of values, will vectorize the loop, and *will* make use of the movdqa instruction. Kaboom.
>>
>> To recap: dereferencing unaligned pointers in C/C++ is a no-no, even in cases where you think it should be okay. With the existing compilers and header files we are not currently running into this problem, but even a small change in the byte swap implementation exposes the problem.
>>
>>
>>
>> * About the change
>>
>> The key changes are in three different areas:
>>
>> 1. copy.[ch]pp
>>
>> Introducing: conjoint_swap_if_needed
>>
>> conjoint_swap_if_needed copies data, and bytes wap it on-the-fly if the specified endianness differs from the native/CPU endianness. It does this by either delegating to conjoint_swap (on endian mismatch), or conjoint_copy (on match). In copy.cpp, the changes all boil down to making the innermost do_conjoint_swap method more flexible so that it can be reused for both cases (straight copy as well as copy+swap).
>>
>> 2. classFile{Parser,Stream}
>>
>> The key change is in classFileParser.cpp, switching to copying data from the class file using the new conjoint_swap_if_needed method, replacing the loop implemented in copy_u2_with_conversion/Bytes::get_Java_u2.
>>
>> However, in addition to that change, I noticed that there are a lot of u2* passed around in the code, pointers which are not necessarily 16-bit aligned. While there’s nothing wrong with *having* an unaligned pointer in C - as long as it’s not dereferenced everything is peachy - it made me uneasy to see it passed around and used the way it is. Specifically, ClassFileStream::get_u2_buffer() could, to the untrained eye, be a bit misleading. One could accidentally and incorrectly assume that the returned pointer is, in fact, 16-bit aligned and start dereferencing it directly, where in fact there is no such guarantee. Perhaps even use it as an array and attract the wrath of the C compiler.
>>
>> Changing to void* may or may not be the right thing to do here. In a way I’d actually like to “carry” the type information, but in some way still prevent the pointer from being directly dereferenced. Taking suggestions.
>>
>>
>> 3. bytes_x86.hpp
>>
>> This is addressing the wider concern that other parts of hotspot may use the same primitives in much the same (potentially broken) way, and in particular the fact that the get/put primitives aren’t checking whether the pointer argument is aligned before they dereference it. It may well be that a simple assert or two would do the trick here. That said:
>>
>> It turns out that the various platforms all have their own unique ways of implementing bytes.hpp, duplicating some logic which could/should be platform independent. I tried to clean up and unify it all a bit while at it by introducing an Endian helper class in bytes.hpp. The primitives for accessing data in memory now check for alignment and either perform the raw memory access (when the pointer is aligned), or does a memcpy (if unaligned). There’s some template “magic” in there avoid duplicating code, but hopefully the magic is relatively straightforward.
>>
>>
>> * Testing
>>
>> I’ve run some basic testing on this, but I’m very much looking for advice on which tests to run. Let me know if you have any suggestions!
>>
>> Cheers,
>> Mikael
>>

Reply | Threaded
Open this post in threaded view
|

Re: RFR(L): 8180032: Unaligned pointer dereference in ClassFileParser

harold seigel
Hi Mikael,

The changes look good.  One minor typo in bytes_sparc.hpp: "nativ byte".

It might be good to run the RBT tier2 - tier5 tests on one big endian
and one little endian platform.

Thanks, Harold


On 5/15/2017 2:34 PM, Mikael Vidstedt wrote:

> New webrevs:
>
> full: http://cr.openjdk.java.net/~mikael/webrevs/8180032/webrev.01/hotspot/webrev/
> incremental (from webrev.00): http://cr.openjdk.java.net/~mikael/webrevs/8180032/webrev.01.incr/hotspot/webrev/
>
> I definitely want a second reviewer of this, and I’m (still) taking suggestions on tests to run etc.!
>
> Also, comments inline below..
>
>> On May 9, 2017, at 5:12 PM, David Holmes <[hidden email]> wrote:
>>
>> Hi Mikael,
>>
>> On 10/05/2017 8:40 AM, Mikael Vidstedt wrote:
>>> Warning: It may be wise to stock up on coffee or tea before reading this.
>>>
>>> Bug: https://bugs.openjdk.java.net/browse/JDK-8180032
>>> Webrev: http://cr.openjdk.java.net/~mikael/webrevs/8180032/webrev.00/hotspot/webrev/ <http://cr.openjdk.java.net/~mikael/webrevs/8180032/webrev.00/hotspot/webrev/>
>> Overall this looks good to me. I like the refactoring from Bytes to Endian - that simplifies things a lot I think.
> Thanks, I like it too ;)
>
>> The actual "copy/swap" changes and templates I'm not an expert on but I get the gist and they seemed okay.
>>
>> Was a little unsure about all the changes to void* from u2*/u1* in classFileParser.h/cpp - does that just simplify use of the copy/swap code? Though I see some casts to u2* are no longer necessary as well.
> Right, I could go either way here, but when I personally see a u2* I feel tempted to just dereference it, and that’s not valid in these cases. With void* it’s more obvious that you need to do something else to get the data, but the width/type of the underlying data is lost. It may make sense to introduce a few helpful typedefs of void* which include in their names the underlying data type, or something along those lines.
>
> I suggest wrapping up and pushing what I have and working on improving the story here as a separate change. Reasonable?
>
>> A couple of oddities I noticed:
>>
>> src/share/vm/classfile/classFileStream.hpp
>>
>> Without the get_u2_buffer/get_u1_buffer distinction get_u1_buffer seems superfluous and all uses can be replaced by the existing current() accessor.
> Good point, get_u1_buffer can be removed (and it’s gone in the new webrev).
>
>> ---
>>
>> src/share/vm/classfile/classFileParser.cpp
>>
>> We do we have void* here:
>>
>> 1707   const void* const exception_table_start = cfs->get_u1_buffer();
>>
>> but u1* here:
>>
>> 1845   const u1* const localvariable_table_start = cfs->get_u1_buffer();
> Good catch. Changed to void*.
>
> Cheers,
> Mikael
>
>> Thanks,
>> David
>>
>>> * Background (from the JBS description)
>>>
>>> x86 is normally very forgiving when it comes to dereferencing unaligned pointers. Even if the pointer isn’t aligned on the natural size of the element being accessed, the hardware will do the Right Thing(tm) and nobody gets hurt. However, turns out there are exceptions to this. Specifically, SSA2 introduced the movdqa instruction, which is a 128-bit load/store which *does* require that the pointer is 128-bit aligned.
>>>
>>> Normally this isn’t a problem, because after all we don’t typically use 128-bit data types in our C/C++ code. However, just because we don’t use any such data types explicitly, there’s nothing preventing the C compiler to do so under the covers. Specifically, the C compiler tends to do loop unrolling and vectorization, which can turn pretty much any data access into vectorized SIMD accesses.
>>>
>>> We’ve actually run into a variation on this exact same problem a while back when upgrading to gcc 4.9.2. That time the problem (as captured in JDK-8141491) was in nio/Bits.c, and it was fixed by moving the copy functionality into hotspot (copy.[ch]pp), making sure the copy logic does the relevant alignment checks etc.
>>>
>>> This time the problem is with ClassFileParser. Or more accurately, it’s in the methods ClassFileParser makes use of. Specifically, the problem is with the copy_u2_with_conversion method, used to copy various data from the class file and put it in the “native” endian order in memory. It, in turn, uses Bytes::get_Java_u2 to read and potentially byte swap a 16-bit entry from the class file. bytes_x86.hpp has this to say about its implementation:
>>>
>>> // Efficient reading and writing of unaligned unsigned data in platform-specific byte ordering
>>> // (no special code is needed since x86 CPUs can access unaligned data)
>>>
>>> While that is /almost/ always true for the x86 architecture in itself, the C standard still expects accesses to be aligned, and the C compiler is free to make use of that expectation to, for example, vectorize operations and make use of the movdqa instruction.
>>>
>>> I noticed this when working on the portola/musl port, and in that environment the bug is tickled immediately. Why is it only a problem with musl? Turns out it sorta isn’t. It seems that this isn't an actual problem on any of the current platforms/toolchains we're using, but it's a latent bug which may be triggered at any point.
>>>
>>> bytes_x86.hpp will, in the end, actually use system library functions to do byte swapping. Specifically, on linux_x86 it will come down to bytes_linux_x86.inline.hpp which, on AMD64, uses the system <byteswap.h> functions/macros swap_u{16,32,64} to do the actual byte swapping. Now here’s the “funny” & interesting part:
>>>
>>> With glibc, the swap_u{16,32,64} methods are implemented using inline assembly - in the end it comes down to an inline rotate “rorw” instruction. Since GCC can’t see through the inline assembly, it will not realize that there are loop unrolling/vectorization opportunities, and of specific interest to us: the movdqa instruction will not be used. The code will potentially not be as efficient as it could be, but it will be functional.
>>>
>>> With musl, the swap methods are instead implemented as normal macros, shifting bits around to achieve the desired effect. GCC recognizes the bit shifting patterns, will realize that it’s just byte swapping a bunch of values, will vectorize the loop, and *will* make use of the movdqa instruction. Kaboom.
>>>
>>> To recap: dereferencing unaligned pointers in C/C++ is a no-no, even in cases where you think it should be okay. With the existing compilers and header files we are not currently running into this problem, but even a small change in the byte swap implementation exposes the problem.
>>>
>>>
>>>
>>> * About the change
>>>
>>> The key changes are in three different areas:
>>>
>>> 1. copy.[ch]pp
>>>
>>> Introducing: conjoint_swap_if_needed
>>>
>>> conjoint_swap_if_needed copies data, and bytes wap it on-the-fly if the specified endianness differs from the native/CPU endianness. It does this by either delegating to conjoint_swap (on endian mismatch), or conjoint_copy (on match). In copy.cpp, the changes all boil down to making the innermost do_conjoint_swap method more flexible so that it can be reused for both cases (straight copy as well as copy+swap).
>>>
>>> 2. classFile{Parser,Stream}
>>>
>>> The key change is in classFileParser.cpp, switching to copying data from the class file using the new conjoint_swap_if_needed method, replacing the loop implemented in copy_u2_with_conversion/Bytes::get_Java_u2.
>>>
>>> However, in addition to that change, I noticed that there are a lot of u2* passed around in the code, pointers which are not necessarily 16-bit aligned. While there’s nothing wrong with *having* an unaligned pointer in C - as long as it’s not dereferenced everything is peachy - it made me uneasy to see it passed around and used the way it is. Specifically, ClassFileStream::get_u2_buffer() could, to the untrained eye, be a bit misleading. One could accidentally and incorrectly assume that the returned pointer is, in fact, 16-bit aligned and start dereferencing it directly, where in fact there is no such guarantee. Perhaps even use it as an array and attract the wrath of the C compiler.
>>>
>>> Changing to void* may or may not be the right thing to do here. In a way I’d actually like to “carry” the type information, but in some way still prevent the pointer from being directly dereferenced. Taking suggestions.
>>>
>>>
>>> 3. bytes_x86.hpp
>>>
>>> This is addressing the wider concern that other parts of hotspot may use the same primitives in much the same (potentially broken) way, and in particular the fact that the get/put primitives aren’t checking whether the pointer argument is aligned before they dereference it. It may well be that a simple assert or two would do the trick here. That said:
>>>
>>> It turns out that the various platforms all have their own unique ways of implementing bytes.hpp, duplicating some logic which could/should be platform independent. I tried to clean up and unify it all a bit while at it by introducing an Endian helper class in bytes.hpp. The primitives for accessing data in memory now check for alignment and either perform the raw memory access (when the pointer is aligned), or does a memcpy (if unaligned). There’s some template “magic” in there avoid duplicating code, but hopefully the magic is relatively straightforward.
>>>
>>>
>>> * Testing
>>>
>>> I’ve run some basic testing on this, but I’m very much looking for advice on which tests to run. Let me know if you have any suggestions!
>>>
>>> Cheers,
>>> Mikael
>>>

Reply | Threaded
Open this post in threaded view
|

Re: RFR(L): 8180032: Unaligned pointer dereference in ClassFileParser

Vladimir Kozlov
In reply to this post by Mikael Vidstedt-3
I am concern about using libc call memcpy() for few bytes coping. Mikael says that he verified that it is intrinsified by most C++ compilers. What about Solaris C++ which is less advance then gcc?
Also what happens in fasdebug JVM?

May we should also put TODO comments into bytes_aarch64.hpp and bytes_s390.hpp. Or file bugs.

Thanks,
Vladimir

On 5/15/17 11:34 AM, Mikael Vidstedt wrote:

>
> New webrevs:
>
> full: http://cr.openjdk.java.net/~mikael/webrevs/8180032/webrev.01/hotspot/webrev/
> incremental (from webrev.00): http://cr.openjdk.java.net/~mikael/webrevs/8180032/webrev.01.incr/hotspot/webrev/
>
> I definitely want a second reviewer of this, and I’m (still) taking suggestions on tests to run etc.!
>
> Also, comments inline below..
>
>> On May 9, 2017, at 5:12 PM, David Holmes <[hidden email]> wrote:
>>
>> Hi Mikael,
>>
>> On 10/05/2017 8:40 AM, Mikael Vidstedt wrote:
>>>
>>> Warning: It may be wise to stock up on coffee or tea before reading this.
>>>
>>> Bug: https://bugs.openjdk.java.net/browse/JDK-8180032
>>> Webrev: http://cr.openjdk.java.net/~mikael/webrevs/8180032/webrev.00/hotspot/webrev/ <http://cr.openjdk.java.net/~mikael/webrevs/8180032/webrev.00/hotspot/webrev/>
>>
>> Overall this looks good to me. I like the refactoring from Bytes to Endian - that simplifies things a lot I think.
>
> Thanks, I like it too ;)
>
>> The actual "copy/swap" changes and templates I'm not an expert on but I get the gist and they seemed okay.
>>
>> Was a little unsure about all the changes to void* from u2*/u1* in classFileParser.h/cpp - does that just simplify use of the copy/swap code? Though I see some casts to u2* are no longer necessary as well.
>
> Right, I could go either way here, but when I personally see a u2* I feel tempted to just dereference it, and that’s not valid in these cases. With void* it’s more obvious that you need to do something else to get the data, but the width/type of the underlying data is lost. It may make sense to introduce a few helpful typedefs of void* which include in their names the underlying data type, or something along those lines.
>
> I suggest wrapping up and pushing what I have and working on improving the story here as a separate change. Reasonable?
>
>> A couple of oddities I noticed:
>>
>> src/share/vm/classfile/classFileStream.hpp
>>
>> Without the get_u2_buffer/get_u1_buffer distinction get_u1_buffer seems superfluous and all uses can be replaced by the existing current() accessor.
>
> Good point, get_u1_buffer can be removed (and it’s gone in the new webrev).
>
>>
>> ---
>>
>> src/share/vm/classfile/classFileParser.cpp
>>
>> We do we have void* here:
>>
>> 1707   const void* const exception_table_start = cfs->get_u1_buffer();
>>
>> but u1* here:
>>
>> 1845   const u1* const localvariable_table_start = cfs->get_u1_buffer();
>
> Good catch. Changed to void*.
>
> Cheers,
> Mikael
>
>>
>> Thanks,
>> David
>>
>>> * Background (from the JBS description)
>>>
>>> x86 is normally very forgiving when it comes to dereferencing unaligned pointers. Even if the pointer isn’t aligned on the natural size of the element being accessed, the hardware will do the Right Thing(tm) and nobody gets hurt. However, turns out there are exceptions to this. Specifically, SSA2 introduced the movdqa instruction, which is a 128-bit load/store which *does* require that the pointer is 128-bit aligned.
>>>
>>> Normally this isn’t a problem, because after all we don’t typically use 128-bit data types in our C/C++ code. However, just because we don’t use any such data types explicitly, there’s nothing preventing the C compiler to do so under the covers. Specifically, the C compiler tends to do loop unrolling and vectorization, which can turn pretty much any data access into vectorized SIMD accesses.
>>>
>>> We’ve actually run into a variation on this exact same problem a while back when upgrading to gcc 4.9.2. That time the problem (as captured in JDK-8141491) was in nio/Bits.c, and it was fixed by moving the copy functionality into hotspot (copy.[ch]pp), making sure the copy logic does the relevant alignment checks etc.
>>>
>>> This time the problem is with ClassFileParser. Or more accurately, it’s in the methods ClassFileParser makes use of. Specifically, the problem is with the copy_u2_with_conversion method, used to copy various data from the class file and put it in the “native” endian order in memory. It, in turn, uses Bytes::get_Java_u2 to read and potentially byte swap a 16-bit entry from the class file. bytes_x86.hpp has this to say about its implementation:
>>>
>>> // Efficient reading and writing of unaligned unsigned data in platform-specific byte ordering
>>> // (no special code is needed since x86 CPUs can access unaligned data)
>>>
>>> While that is /almost/ always true for the x86 architecture in itself, the C standard still expects accesses to be aligned, and the C compiler is free to make use of that expectation to, for example, vectorize operations and make use of the movdqa instruction.
>>>
>>> I noticed this when working on the portola/musl port, and in that environment the bug is tickled immediately. Why is it only a problem with musl? Turns out it sorta isn’t. It seems that this isn't an actual problem on any of the current platforms/toolchains we're using, but it's a latent bug which may be triggered at any point.
>>>
>>> bytes_x86.hpp will, in the end, actually use system library functions to do byte swapping. Specifically, on linux_x86 it will come down to bytes_linux_x86.inline.hpp which, on AMD64, uses the system <byteswap.h> functions/macros swap_u{16,32,64} to do the actual byte swapping. Now here’s the “funny” & interesting part:
>>>
>>> With glibc, the swap_u{16,32,64} methods are implemented using inline assembly - in the end it comes down to an inline rotate “rorw” instruction. Since GCC can’t see through the inline assembly, it will not realize that there are loop unrolling/vectorization opportunities, and of specific interest to us: the movdqa instruction will not be used. The code will potentially not be as efficient as it could be, but it will be functional.
>>>
>>> With musl, the swap methods are instead implemented as normal macros, shifting bits around to achieve the desired effect. GCC recognizes the bit shifting patterns, will realize that it’s just byte swapping a bunch of values, will vectorize the loop, and *will* make use of the movdqa instruction. Kaboom.
>>>
>>> To recap: dereferencing unaligned pointers in C/C++ is a no-no, even in cases where you think it should be okay. With the existing compilers and header files we are not currently running into this problem, but even a small change in the byte swap implementation exposes the problem.
>>>
>>>
>>>
>>> * About the change
>>>
>>> The key changes are in three different areas:
>>>
>>> 1. copy.[ch]pp
>>>
>>> Introducing: conjoint_swap_if_needed
>>>
>>> conjoint_swap_if_needed copies data, and bytes wap it on-the-fly if the specified endianness differs from the native/CPU endianness. It does this by either delegating to conjoint_swap (on endian mismatch), or conjoint_copy (on match). In copy.cpp, the changes all boil down to making the innermost do_conjoint_swap method more flexible so that it can be reused for both cases (straight copy as well as copy+swap).
>>>
>>> 2. classFile{Parser,Stream}
>>>
>>> The key change is in classFileParser.cpp, switching to copying data from the class file using the new conjoint_swap_if_needed method, replacing the loop implemented in copy_u2_with_conversion/Bytes::get_Java_u2.
>>>
>>> However, in addition to that change, I noticed that there are a lot of u2* passed around in the code, pointers which are not necessarily 16-bit aligned. While there’s nothing wrong with *having* an unaligned pointer in C - as long as it’s not dereferenced everything is peachy - it made me uneasy to see it passed around and used the way it is. Specifically, ClassFileStream::get_u2_buffer() could, to the untrained eye, be a bit misleading. One could accidentally and incorrectly assume that the returned pointer is, in fact, 16-bit aligned and start dereferencing it directly, where in fact there is no such guarantee. Perhaps even use it as an array and attract the wrath of the C compiler.
>>>
>>> Changing to void* may or may not be the right thing to do here. In a way I’d actually like to “carry” the type information, but in some way still prevent the pointer from being directly dereferenced. Taking suggestions.
>>>
>>>
>>> 3. bytes_x86.hpp
>>>
>>> This is addressing the wider concern that other parts of hotspot may use the same primitives in much the same (potentially broken) way, and in particular the fact that the get/put primitives aren’t checking whether the pointer argument is aligned before they dereference it. It may well be that a simple assert or two would do the trick here. That said:
>>>
>>> It turns out that the various platforms all have their own unique ways of implementing bytes.hpp, duplicating some logic which could/should be platform independent. I tried to clean up and unify it all a bit while at it by introducing an Endian helper class in bytes.hpp. The primitives for accessing data in memory now check for alignment and either perform the raw memory access (when the pointer is aligned), or does a memcpy (if unaligned). There’s some template “magic” in there avoid duplicating code, but hopefully the magic is relatively straightforward.
>>>
>>>
>>> * Testing
>>>
>>> I’ve run some basic testing on this, but I’m very much looking for advice on which tests to run. Let me know if you have any suggestions!
>>>
>>> Cheers,
>>> Mikael
>>>
>
Reply | Threaded
Open this post in threaded view
|

Re: RFR(L): 8180032: Unaligned pointer dereference in ClassFileParser

Kim Barrett
In reply to this post by Mikael Vidstedt-3
> On May 9, 2017, at 6:40 PM, Mikael Vidstedt <[hidden email]> wrote:
>
>
> Warning: It may be wise to stock up on coffee or tea before reading this.
>
> Bug: https://bugs.openjdk.java.net/browse/JDK-8180032
> Webrev: http://cr.openjdk.java.net/~mikael/webrevs/8180032/webrev.00/hotspot/webrev/ <http://cr.openjdk.java.net/~mikael/webrevs/8180032/webrev.00/hotspot/webrev/>

Not a review, just a question.

------------------------------------------------------------------------------
src/cpu/x86/vm/bytes_x86.hpp
  40   template <typename T>
  41   static inline T get_native(const void* p) {
  42     assert(p != NULL, "null pointer");
  43
  44     T x;
  45
  46     if (is_ptr_aligned(p, sizeof(T))) {
  47       x = *(T*)p;
  48     } else {
  49       memcpy(&x, p, sizeof(T));
  50     }
  51
  52     return x;

I'm looking at this and wondering if there's a good reason to not just
unconditionally use memcpy here.  gcc -O will generate a single move
instruction for that on x86_64.  I'm not sure what happens on 32bit
with an 8 byte value, but I suspect it will do something similarly
sensible, e.g. 2 4 byte memory to memory transfers.

------------------------------------------------------------------------------

Reply | Threaded
Open this post in threaded view
|

Re: RFR(L): 8180032: Unaligned pointer dereference in ClassFileParser

David Holmes
In reply to this post by Mikael Vidstedt-3
On 16/05/2017 4:34 AM, Mikael Vidstedt wrote:
>
> New webrevs:
>
> full: http://cr.openjdk.java.net/~mikael/webrevs/8180032/webrev.01/hotspot/webrev/
> incremental (from webrev.00): http://cr.openjdk.java.net/~mikael/webrevs/8180032/webrev.01.incr/hotspot/webrev/
>
> I definitely want a second reviewer of this, and I’m (still) taking suggestions on tests to run etc.!
>
> Also, comments inline below..

Response inline below..

>> On May 9, 2017, at 5:12 PM, David Holmes <[hidden email]> wrote:
>>
>> Hi Mikael,
>>
>> On 10/05/2017 8:40 AM, Mikael Vidstedt wrote:
>>>
>>> Warning: It may be wise to stock up on coffee or tea before reading this.
>>>
>>> Bug: https://bugs.openjdk.java.net/browse/JDK-8180032
>>> Webrev: http://cr.openjdk.java.net/~mikael/webrevs/8180032/webrev.00/hotspot/webrev/ <http://cr.openjdk.java.net/~mikael/webrevs/8180032/webrev.00/hotspot/webrev/>
>>
>> Overall this looks good to me. I like the refactoring from Bytes to Endian - that simplifies things a lot I think.
>
> Thanks, I like it too ;)
>
>> The actual "copy/swap" changes and templates I'm not an expert on but I get the gist and they seemed okay.
>>
>> Was a little unsure about all the changes to void* from u2*/u1* in classFileParser.h/cpp - does that just simplify use of the copy/swap code? Though I see some casts to u2* are no longer necessary as well.
>
> Right, I could go either way here, but when I personally see a u2* I feel tempted to just dereference it, and that’s not valid in these cases. With void* it’s more obvious that you need to do something else to get the data, but the width/type of the underlying data is lost. It may make sense to introduce a few helpful typedefs of void* which include in their names the underlying data type, or something along those lines.

That seems quite reasonable. I agree de-referencing a void* is not
something that would be done without thinking.

> I suggest wrapping up and pushing what I have and working on improving the story here as a separate change. Reasonable?

Yes.

>> A couple of oddities I noticed:
>>
>> src/share/vm/classfile/classFileStream.hpp
>>
>> Without the get_u2_buffer/get_u1_buffer distinction get_u1_buffer seems superfluous and all uses can be replaced by the existing current() accessor.
>
> Good point, get_u1_buffer can be removed (and it’s gone in the new webrev).

Looks good.

Thanks,
David

>>
>> ---
>>
>> src/share/vm/classfile/classFileParser.cpp
>>
>> We do we have void* here:
>>
>> 1707   const void* const exception_table_start = cfs->get_u1_buffer();
>>
>> but u1* here:
>>
>> 1845   const u1* const localvariable_table_start = cfs->get_u1_buffer();
>
> Good catch. Changed to void*.
>
> Cheers,
> Mikael
>
>>
>> Thanks,
>> David
>>
>>> * Background (from the JBS description)
>>>
>>> x86 is normally very forgiving when it comes to dereferencing unaligned pointers. Even if the pointer isn’t aligned on the natural size of the element being accessed, the hardware will do the Right Thing(tm) and nobody gets hurt. However, turns out there are exceptions to this. Specifically, SSA2 introduced the movdqa instruction, which is a 128-bit load/store which *does* require that the pointer is 128-bit aligned.
>>>
>>> Normally this isn’t a problem, because after all we don’t typically use 128-bit data types in our C/C++ code. However, just because we don’t use any such data types explicitly, there’s nothing preventing the C compiler to do so under the covers. Specifically, the C compiler tends to do loop unrolling and vectorization, which can turn pretty much any data access into vectorized SIMD accesses.
>>>
>>> We’ve actually run into a variation on this exact same problem a while back when upgrading to gcc 4.9.2. That time the problem (as captured in JDK-8141491) was in nio/Bits.c, and it was fixed by moving the copy functionality into hotspot (copy.[ch]pp), making sure the copy logic does the relevant alignment checks etc.
>>>
>>> This time the problem is with ClassFileParser. Or more accurately, it’s in the methods ClassFileParser makes use of. Specifically, the problem is with the copy_u2_with_conversion method, used to copy various data from the class file and put it in the “native” endian order in memory. It, in turn, uses Bytes::get_Java_u2 to read and potentially byte swap a 16-bit entry from the class file. bytes_x86.hpp has this to say about its implementation:
>>>
>>> // Efficient reading and writing of unaligned unsigned data in platform-specific byte ordering
>>> // (no special code is needed since x86 CPUs can access unaligned data)
>>>
>>> While that is /almost/ always true for the x86 architecture in itself, the C standard still expects accesses to be aligned, and the C compiler is free to make use of that expectation to, for example, vectorize operations and make use of the movdqa instruction.
>>>
>>> I noticed this when working on the portola/musl port, and in that environment the bug is tickled immediately. Why is it only a problem with musl? Turns out it sorta isn’t. It seems that this isn't an actual problem on any of the current platforms/toolchains we're using, but it's a latent bug which may be triggered at any point.
>>>
>>> bytes_x86.hpp will, in the end, actually use system library functions to do byte swapping. Specifically, on linux_x86 it will come down to bytes_linux_x86.inline.hpp which, on AMD64, uses the system <byteswap.h> functions/macros swap_u{16,32,64} to do the actual byte swapping. Now here’s the “funny” & interesting part:
>>>
>>> With glibc, the swap_u{16,32,64} methods are implemented using inline assembly - in the end it comes down to an inline rotate “rorw” instruction. Since GCC can’t see through the inline assembly, it will not realize that there are loop unrolling/vectorization opportunities, and of specific interest to us: the movdqa instruction will not be used. The code will potentially not be as efficient as it could be, but it will be functional.
>>>
>>> With musl, the swap methods are instead implemented as normal macros, shifting bits around to achieve the desired effect. GCC recognizes the bit shifting patterns, will realize that it’s just byte swapping a bunch of values, will vectorize the loop, and *will* make use of the movdqa instruction. Kaboom.
>>>
>>> To recap: dereferencing unaligned pointers in C/C++ is a no-no, even in cases where you think it should be okay. With the existing compilers and header files we are not currently running into this problem, but even a small change in the byte swap implementation exposes the problem.
>>>
>>>
>>>
>>> * About the change
>>>
>>> The key changes are in three different areas:
>>>
>>> 1. copy.[ch]pp
>>>
>>> Introducing: conjoint_swap_if_needed
>>>
>>> conjoint_swap_if_needed copies data, and bytes wap it on-the-fly if the specified endianness differs from the native/CPU endianness. It does this by either delegating to conjoint_swap (on endian mismatch), or conjoint_copy (on match). In copy.cpp, the changes all boil down to making the innermost do_conjoint_swap method more flexible so that it can be reused for both cases (straight copy as well as copy+swap).
>>>
>>> 2. classFile{Parser,Stream}
>>>
>>> The key change is in classFileParser.cpp, switching to copying data from the class file using the new conjoint_swap_if_needed method, replacing the loop implemented in copy_u2_with_conversion/Bytes::get_Java_u2.
>>>
>>> However, in addition to that change, I noticed that there are a lot of u2* passed around in the code, pointers which are not necessarily 16-bit aligned. While there’s nothing wrong with *having* an unaligned pointer in C - as long as it’s not dereferenced everything is peachy - it made me uneasy to see it passed around and used the way it is. Specifically, ClassFileStream::get_u2_buffer() could, to the untrained eye, be a bit misleading. One could accidentally and incorrectly assume that the returned pointer is, in fact, 16-bit aligned and start dereferencing it directly, where in fact there is no such guarantee. Perhaps even use it as an array and attract the wrath of the C compiler.
>>>
>>> Changing to void* may or may not be the right thing to do here. In a way I’d actually like to “carry” the type information, but in some way still prevent the pointer from being directly dereferenced. Taking suggestions.
>>>
>>>
>>> 3. bytes_x86.hpp
>>>
>>> This is addressing the wider concern that other parts of hotspot may use the same primitives in much the same (potentially broken) way, and in particular the fact that the get/put primitives aren’t checking whether the pointer argument is aligned before they dereference it. It may well be that a simple assert or two would do the trick here. That said:
>>>
>>> It turns out that the various platforms all have their own unique ways of implementing bytes.hpp, duplicating some logic which could/should be platform independent. I tried to clean up and unify it all a bit while at it by introducing an Endian helper class in bytes.hpp. The primitives for accessing data in memory now check for alignment and either perform the raw memory access (when the pointer is aligned), or does a memcpy (if unaligned). There’s some template “magic” in there avoid duplicating code, but hopefully the magic is relatively straightforward.
>>>
>>>
>>> * Testing
>>>
>>> I’ve run some basic testing on this, but I’m very much looking for advice on which tests to run. Let me know if you have any suggestions!
>>>
>>> Cheers,
>>> Mikael
>>>
>
Reply | Threaded
Open this post in threaded view
|

Re: RFR(L): 8180032: Unaligned pointer dereference in ClassFileParser

Robbin Ehn
In reply to this post by Kim Barrett
Hi,

On 05/17/2017 03:46 AM, Kim Barrett wrote:

>> On May 9, 2017, at 6:40 PM, Mikael Vidstedt <[hidden email]> wrote:
>>
>>
>> Warning: It may be wise to stock up on coffee or tea before reading this.
>>
>> Bug: https://bugs.openjdk.java.net/browse/JDK-8180032
>> Webrev: http://cr.openjdk.java.net/~mikael/webrevs/8180032/webrev.00/hotspot/webrev/ <http://cr.openjdk.java.net/~mikael/webrevs/8180032/webrev.00/hotspot/webrev/>
>
> Not a review, just a question.
>
> ------------------------------------------------------------------------------
> src/cpu/x86/vm/bytes_x86.hpp
>    40   template <typename T>
>    41   static inline T get_native(const void* p) {
>    42     assert(p != NULL, "null pointer");
>    43
>    44     T x;
>    45
>    46     if (is_ptr_aligned(p, sizeof(T))) {
>    47       x = *(T*)p;
>    48     } else {
>    49       memcpy(&x, p, sizeof(T));
>    50     }
>    51
>    52     return x;
>
> I'm looking at this and wondering if there's a good reason to not just
> unconditionally use memcpy here.  gcc -O will generate a single move
> instruction for that on x86_64.  I'm not sure what happens on 32bit
> with an 8 byte value, but I suspect it will do something similarly
> sensible, e.g. 2 4 byte memory to memory transfers.

Unconditionally memcpy would be nice!

Are going to look into that Mikael?

/Robbin

>
> ------------------------------------------------------------------------------
>
Reply | Threaded
Open this post in threaded view
|

Re: RFR(L): 8180032: Unaligned pointer dereference in ClassFileParser

Mikael Vidstedt-3

> On May 18, 2017, at 2:59 AM, Robbin Ehn <[hidden email]> wrote:
>
> Hi,
>
> On 05/17/2017 03:46 AM, Kim Barrett wrote:
>>> On May 9, 2017, at 6:40 PM, Mikael Vidstedt <[hidden email]> wrote:
>>>
>>>
>>> Warning: It may be wise to stock up on coffee or tea before reading this.
>>>
>>> Bug: https://bugs.openjdk.java.net/browse/JDK-8180032
>>> Webrev: http://cr.openjdk.java.net/~mikael/webrevs/8180032/webrev.00/hotspot/webrev/ <http://cr.openjdk.java.net/~mikael/webrevs/8180032/webrev.00/hotspot/webrev/>
>> Not a review, just a question.
>> ------------------------------------------------------------------------------
>> src/cpu/x86/vm/bytes_x86.hpp
>>   40   template <typename T>
>>   41   static inline T get_native(const void* p) {
>>   42     assert(p != NULL, "null pointer");
>>   43
>>   44     T x;
>>   45
>>   46     if (is_ptr_aligned(p, sizeof(T))) {
>>   47       x = *(T*)p;
>>   48     } else {
>>   49       memcpy(&x, p, sizeof(T));
>>   50     }
>>   51
>>   52     return x;
>> I'm looking at this and wondering if there's a good reason to not just
>> unconditionally use memcpy here.  gcc -O will generate a single move
>> instruction for that on x86_64.  I'm not sure what happens on 32bit
>> with an 8 byte value, but I suspect it will do something similarly
>> sensible, e.g. 2 4 byte memory to memory transfers.
>
> Unconditionally memcpy would be nice!
>
> Are going to look into that Mikael?

It’s complicated…

We may be able to switch, but there is (maybe) a subtle reason why the alignment check is in there: to avoid word tearing..

Think of two threads racing:

* thread 1 is writing to the memory location X
* thread 2 is reading from the same memory location X

Will thread 2 always see a consistent value (either the original value or the fully updated value)?

In the unaligned/memcpy case I think we can agree that there’s nothing preventing the compiler from doing individual loads/stores of the bytes making up the data. Especially in something like slowdebug that becomes more or less obvious - memcpy most likely isn’t intrinsified and is quite likely just copying a byte at a time. Given that the data is, in fact, unaligned, there is really no simple way to prevent word tearing, so I’m pretty sure that we never depend on it - if needed, we’re likely to already have some higher level synchronization in place guarding the accesses. And the fact that the other, non-x86 platforms already do individual byte loads/stores when the pointer is unaligned indicates is a further indication that that’s the case.

However, the aligned case is where stuff gets more interesting. I don’t think the C/C++ spec guarantees that accessing a memory location using a pointer of type T will result in code which does a single load/store of size >= sizeof(T), but for all the compilers we *actually* use that’s likely to be the case. If it’s true that the compilers don’t splits the memory accesses, that means we won’t have word tearing when using the Bytes::get/put methods with *aligned* pointers.

If I switch to always using memcpy, there’s a risk that it introduces tearing problems where earlier we had none. Two questions come to mind:

* For the cases where the get/put methods get used *today*, is that a problem?
* What happens if somebody in the *future* decides that put_Java_u4 seems like a great thing to use to write to a Java int field on the Java heap, and a Java thread is racing to read that same data?


All that said though, I think this is worth exploring and it may well turn out that word tearing really isn’t a problem. Also, I believe there may be opportunities to further clean up this code and perhaps unify it a bit across the various platforms.

And *that* said, I think the change as it stands is still an improvement, so I’m leaning towards pushing it and filing an enhancement and following up on it separately. Let me know if you strongly feel that this should be looked into and addressed now and I may reconsider :)

Cheers,
Mikael

>
> /Robbin
>
>> ------------------------------------------------------------------------------

Reply | Threaded
Open this post in threaded view
|

Re: RFR(L): 8180032: Unaligned pointer dereference in ClassFileParser

Mikael Vidstedt-3
In reply to this post by harold seigel

> On May 16, 2017, at 12:07 PM, harold seigel <[hidden email]> wrote:
>
> Hi Mikael,
>
> The changes look good.  One minor typo in bytes_sparc.hpp: "nativ byte”.

I believe you’re referring to the removed code, in which case it will go away automatically if/when I push this :)

>
> It might be good to run the RBT tier2 - tier5 tests on one big endian and one little endian platform.

Thanks, I will do that!

Cheers,
Mikael

>
> Thanks, Harold
>
>
> On 5/15/2017 2:34 PM, Mikael Vidstedt wrote:
>> New webrevs:
>>
>> full: http://cr.openjdk.java.net/~mikael/webrevs/8180032/webrev.01/hotspot/webrev/
>> incremental (from webrev.00): http://cr.openjdk.java.net/~mikael/webrevs/8180032/webrev.01.incr/hotspot/webrev/
>>
>> I definitely want a second reviewer of this, and I’m (still) taking suggestions on tests to run etc.!
>>
>> Also, comments inline below..
>>
>>> On May 9, 2017, at 5:12 PM, David Holmes <[hidden email]> wrote:
>>>
>>> Hi Mikael,
>>>
>>> On 10/05/2017 8:40 AM, Mikael Vidstedt wrote:
>>>> Warning: It may be wise to stock up on coffee or tea before reading this.
>>>>
>>>> Bug: https://bugs.openjdk.java.net/browse/JDK-8180032
>>>> Webrev: http://cr.openjdk.java.net/~mikael/webrevs/8180032/webrev.00/hotspot/webrev/ <http://cr.openjdk.java.net/~mikael/webrevs/8180032/webrev.00/hotspot/webrev/>
>>> Overall this looks good to me. I like the refactoring from Bytes to Endian - that simplifies things a lot I think.
>> Thanks, I like it too ;)
>>
>>> The actual "copy/swap" changes and templates I'm not an expert on but I get the gist and they seemed okay.
>>>
>>> Was a little unsure about all the changes to void* from u2*/u1* in classFileParser.h/cpp - does that just simplify use of the copy/swap code? Though I see some casts to u2* are no longer necessary as well.
>> Right, I could go either way here, but when I personally see a u2* I feel tempted to just dereference it, and that’s not valid in these cases. With void* it’s more obvious that you need to do something else to get the data, but the width/type of the underlying data is lost. It may make sense to introduce a few helpful typedefs of void* which include in their names the underlying data type, or something along those lines.
>>
>> I suggest wrapping up and pushing what I have and working on improving the story here as a separate change. Reasonable?
>>
>>> A couple of oddities I noticed:
>>>
>>> src/share/vm/classfile/classFileStream.hpp
>>>
>>> Without the get_u2_buffer/get_u1_buffer distinction get_u1_buffer seems superfluous and all uses can be replaced by the existing current() accessor.
>> Good point, get_u1_buffer can be removed (and it’s gone in the new webrev).
>>
>>> ---
>>>
>>> src/share/vm/classfile/classFileParser.cpp
>>>
>>> We do we have void* here:
>>>
>>> 1707   const void* const exception_table_start = cfs->get_u1_buffer();
>>>
>>> but u1* here:
>>>
>>> 1845   const u1* const localvariable_table_start = cfs->get_u1_buffer();
>> Good catch. Changed to void*.
>>
>> Cheers,
>> Mikael
>>
>>> Thanks,
>>> David
>>>
>>>> * Background (from the JBS description)
>>>>
>>>> x86 is normally very forgiving when it comes to dereferencing unaligned pointers. Even if the pointer isn’t aligned on the natural size of the element being accessed, the hardware will do the Right Thing(tm) and nobody gets hurt. However, turns out there are exceptions to this. Specifically, SSA2 introduced the movdqa instruction, which is a 128-bit load/store which *does* require that the pointer is 128-bit aligned.
>>>>
>>>> Normally this isn’t a problem, because after all we don’t typically use 128-bit data types in our C/C++ code. However, just because we don’t use any such data types explicitly, there’s nothing preventing the C compiler to do so under the covers. Specifically, the C compiler tends to do loop unrolling and vectorization, which can turn pretty much any data access into vectorized SIMD accesses.
>>>>
>>>> We’ve actually run into a variation on this exact same problem a while back when upgrading to gcc 4.9.2. That time the problem (as captured in JDK-8141491) was in nio/Bits.c, and it was fixed by moving the copy functionality into hotspot (copy.[ch]pp), making sure the copy logic does the relevant alignment checks etc.
>>>>
>>>> This time the problem is with ClassFileParser. Or more accurately, it’s in the methods ClassFileParser makes use of. Specifically, the problem is with the copy_u2_with_conversion method, used to copy various data from the class file and put it in the “native” endian order in memory. It, in turn, uses Bytes::get_Java_u2 to read and potentially byte swap a 16-bit entry from the class file. bytes_x86.hpp has this to say about its implementation:
>>>>
>>>> // Efficient reading and writing of unaligned unsigned data in platform-specific byte ordering
>>>> // (no special code is needed since x86 CPUs can access unaligned data)
>>>>
>>>> While that is /almost/ always true for the x86 architecture in itself, the C standard still expects accesses to be aligned, and the C compiler is free to make use of that expectation to, for example, vectorize operations and make use of the movdqa instruction.
>>>>
>>>> I noticed this when working on the portola/musl port, and in that environment the bug is tickled immediately. Why is it only a problem with musl? Turns out it sorta isn’t. It seems that this isn't an actual problem on any of the current platforms/toolchains we're using, but it's a latent bug which may be triggered at any point.
>>>>
>>>> bytes_x86.hpp will, in the end, actually use system library functions to do byte swapping. Specifically, on linux_x86 it will come down to bytes_linux_x86.inline.hpp which, on AMD64, uses the system <byteswap.h> functions/macros swap_u{16,32,64} to do the actual byte swapping. Now here’s the “funny” & interesting part:
>>>>
>>>> With glibc, the swap_u{16,32,64} methods are implemented using inline assembly - in the end it comes down to an inline rotate “rorw” instruction. Since GCC can’t see through the inline assembly, it will not realize that there are loop unrolling/vectorization opportunities, and of specific interest to us: the movdqa instruction will not be used. The code will potentially not be as efficient as it could be, but it will be functional.
>>>>
>>>> With musl, the swap methods are instead implemented as normal macros, shifting bits around to achieve the desired effect. GCC recognizes the bit shifting patterns, will realize that it’s just byte swapping a bunch of values, will vectorize the loop, and *will* make use of the movdqa instruction. Kaboom.
>>>>
>>>> To recap: dereferencing unaligned pointers in C/C++ is a no-no, even in cases where you think it should be okay. With the existing compilers and header files we are not currently running into this problem, but even a small change in the byte swap implementation exposes the problem.
>>>>
>>>>
>>>>
>>>> * About the change
>>>>
>>>> The key changes are in three different areas:
>>>>
>>>> 1. copy.[ch]pp
>>>>
>>>> Introducing: conjoint_swap_if_needed
>>>>
>>>> conjoint_swap_if_needed copies data, and bytes wap it on-the-fly if the specified endianness differs from the native/CPU endianness. It does this by either delegating to conjoint_swap (on endian mismatch), or conjoint_copy (on match). In copy.cpp, the changes all boil down to making the innermost do_conjoint_swap method more flexible so that it can be reused for both cases (straight copy as well as copy+swap).
>>>>
>>>> 2. classFile{Parser,Stream}
>>>>
>>>> The key change is in classFileParser.cpp, switching to copying data from the class file using the new conjoint_swap_if_needed method, replacing the loop implemented in copy_u2_with_conversion/Bytes::get_Java_u2.
>>>>
>>>> However, in addition to that change, I noticed that there are a lot of u2* passed around in the code, pointers which are not necessarily 16-bit aligned. While there’s nothing wrong with *having* an unaligned pointer in C - as long as it’s not dereferenced everything is peachy - it made me uneasy to see it passed around and used the way it is. Specifically, ClassFileStream::get_u2_buffer() could, to the untrained eye, be a bit misleading. One could accidentally and incorrectly assume that the returned pointer is, in fact, 16-bit aligned and start dereferencing it directly, where in fact there is no such guarantee. Perhaps even use it as an array and attract the wrath of the C compiler.
>>>>
>>>> Changing to void* may or may not be the right thing to do here. In a way I’d actually like to “carry” the type information, but in some way still prevent the pointer from being directly dereferenced. Taking suggestions.
>>>>
>>>>
>>>> 3. bytes_x86.hpp
>>>>
>>>> This is addressing the wider concern that other parts of hotspot may use the same primitives in much the same (potentially broken) way, and in particular the fact that the get/put primitives aren’t checking whether the pointer argument is aligned before they dereference it. It may well be that a simple assert or two would do the trick here. That said:
>>>>
>>>> It turns out that the various platforms all have their own unique ways of implementing bytes.hpp, duplicating some logic which could/should be platform independent. I tried to clean up and unify it all a bit while at it by introducing an Endian helper class in bytes.hpp. The primitives for accessing data in memory now check for alignment and either perform the raw memory access (when the pointer is aligned), or does a memcpy (if unaligned). There’s some template “magic” in there avoid duplicating code, but hopefully the magic is relatively straightforward.
>>>>
>>>>
>>>> * Testing
>>>>
>>>> I’ve run some basic testing on this, but I’m very much looking for advice on which tests to run. Let me know if you have any suggestions!
>>>>
>>>> Cheers,
>>>> Mikael
>>>>
>

Reply | Threaded
Open this post in threaded view
|

Re: RFR(L): 8180032: Unaligned pointer dereference in ClassFileParser

David Holmes
In reply to this post by Mikael Vidstedt-3
Hi Mikael,

On 19/05/2017 8:15 AM, Mikael Vidstedt wrote:

>
>> On May 18, 2017, at 2:59 AM, Robbin Ehn <[hidden email]> wrote:
>>
>> Hi,
>>
>> On 05/17/2017 03:46 AM, Kim Barrett wrote:
>>>> On May 9, 2017, at 6:40 PM, Mikael Vidstedt <[hidden email]> wrote:
>>>>
>>>>
>>>> Warning: It may be wise to stock up on coffee or tea before reading this.
>>>>
>>>> Bug: https://bugs.openjdk.java.net/browse/JDK-8180032
>>>> Webrev: http://cr.openjdk.java.net/~mikael/webrevs/8180032/webrev.00/hotspot/webrev/ <http://cr.openjdk.java.net/~mikael/webrevs/8180032/webrev.00/hotspot/webrev/>
>>> Not a review, just a question.
>>> ------------------------------------------------------------------------------
>>> src/cpu/x86/vm/bytes_x86.hpp
>>>   40   template <typename T>
>>>   41   static inline T get_native(const void* p) {
>>>   42     assert(p != NULL, "null pointer");
>>>   43
>>>   44     T x;
>>>   45
>>>   46     if (is_ptr_aligned(p, sizeof(T))) {
>>>   47       x = *(T*)p;
>>>   48     } else {
>>>   49       memcpy(&x, p, sizeof(T));
>>>   50     }
>>>   51
>>>   52     return x;
>>> I'm looking at this and wondering if there's a good reason to not just
>>> unconditionally use memcpy here.  gcc -O will generate a single move
>>> instruction for that on x86_64.  I'm not sure what happens on 32bit
>>> with an 8 byte value, but I suspect it will do something similarly
>>> sensible, e.g. 2 4 byte memory to memory transfers.
>>
>> Unconditionally memcpy would be nice!
>>
>> Are going to look into that Mikael?
>
> It’s complicated…
>
> We may be able to switch, but there is (maybe) a subtle reason why the alignment check is in there: to avoid word tearing..
>
> Think of two threads racing:
>
> * thread 1 is writing to the memory location X
> * thread 2 is reading from the same memory location X
>
> Will thread 2 always see a consistent value (either the original value or the fully updated value)?

We're talking about internal VM load and stores rights? For those we
need to use appropriate atomic routine if there are potential races. But
we should never be mixing these kind of accesses with Java level field
accesses - that would be very broken.

For classFileparser we should no concurrency issues.

David

> In the unaligned/memcpy case I think we can agree that there’s nothing preventing the compiler from doing individual loads/stores of the bytes making up the data. Especially in something like slowdebug that becomes more or less obvious - memcpy most likely isn’t intrinsified and is quite likely just copying a byte at a time. Given that the data is, in fact, unaligned, there is really no simple way to prevent word tearing, so I’m pretty sure that we never depend on it - if needed, we’re likely to already have some higher level synchronization in place guarding the accesses. And the fact that the other, non-x86 platforms already do individual byte loads/stores when the pointer is unaligned indicates is a further indication that that’s the case.
>
> However, the aligned case is where stuff gets more interesting. I don’t think the C/C++ spec guarantees that accessing a memory location using a pointer of type T will result in code which does a single load/store of size >= sizeof(T), but for all the compilers we *actually* use that’s likely to be the case. If it’s true that the compilers don’t splits the memory accesses, that means we won’t have word tearing when using the Bytes::get/put methods with *aligned* pointers.
>
> If I switch to always using memcpy, there’s a risk that it introduces tearing problems where earlier we had none. Two questions come to mind:
>
> * For the cases where the get/put methods get used *today*, is that a problem?
> * What happens if somebody in the *future* decides that put_Java_u4 seems like a great thing to use to write to a Java int field on the Java heap, and a Java thread is racing to read that same data?
>
>
> All that said though, I think this is worth exploring and it may well turn out that word tearing really isn’t a problem. Also, I believe there may be opportunities to further clean up this code and perhaps unify it a bit across the various platforms.
>
> And *that* said, I think the change as it stands is still an improvement, so I’m leaning towards pushing it and filing an enhancement and following up on it separately. Let me know if you strongly feel that this should be looked into and addressed now and I may reconsider :)
>
> Cheers,
> Mikael
>
>>
>> /Robbin
>>
>>> ------------------------------------------------------------------------------
>
Reply | Threaded
Open this post in threaded view
|

Re: RFR(L): 8180032: Unaligned pointer dereference in ClassFileParser

Mikael Vidstedt-3

> On May 18, 2017, at 3:50 PM, David Holmes <[hidden email]> wrote:
>
> Hi Mikael,
>
> On 19/05/2017 8:15 AM, Mikael Vidstedt wrote:
>>
>>> On May 18, 2017, at 2:59 AM, Robbin Ehn <[hidden email]> wrote:
>>>
>>> Hi,
>>>
>>> On 05/17/2017 03:46 AM, Kim Barrett wrote:
>>>>> On May 9, 2017, at 6:40 PM, Mikael Vidstedt <[hidden email]> wrote:
>>>>>
>>>>>
>>>>> Warning: It may be wise to stock up on coffee or tea before reading this.
>>>>>
>>>>> Bug: https://bugs.openjdk.java.net/browse/JDK-8180032
>>>>> Webrev: http://cr.openjdk.java.net/~mikael/webrevs/8180032/webrev.00/hotspot/webrev/ <http://cr.openjdk.java.net/~mikael/webrevs/8180032/webrev.00/hotspot/webrev/>
>>>> Not a review, just a question.
>>>> ------------------------------------------------------------------------------
>>>> src/cpu/x86/vm/bytes_x86.hpp
>>>>  40   template <typename T>
>>>>  41   static inline T get_native(const void* p) {
>>>>  42     assert(p != NULL, "null pointer");
>>>>  43
>>>>  44     T x;
>>>>  45
>>>>  46     if (is_ptr_aligned(p, sizeof(T))) {
>>>>  47       x = *(T*)p;
>>>>  48     } else {
>>>>  49       memcpy(&x, p, sizeof(T));
>>>>  50     }
>>>>  51
>>>>  52     return x;
>>>> I'm looking at this and wondering if there's a good reason to not just
>>>> unconditionally use memcpy here.  gcc -O will generate a single move
>>>> instruction for that on x86_64.  I'm not sure what happens on 32bit
>>>> with an 8 byte value, but I suspect it will do something similarly
>>>> sensible, e.g. 2 4 byte memory to memory transfers.
>>>
>>> Unconditionally memcpy would be nice!
>>>
>>> Are going to look into that Mikael?
>>
>> It’s complicated…
>>
>> We may be able to switch, but there is (maybe) a subtle reason why the alignment check is in there: to avoid word tearing..
>>
>> Think of two threads racing:
>>
>> * thread 1 is writing to the memory location X
>> * thread 2 is reading from the same memory location X
>>
>> Will thread 2 always see a consistent value (either the original value or the fully updated value)?
>
> We're talking about internal VM load and stores rights? For those we need to use appropriate atomic routine if there are potential races. But we should never be mixing these kind of accesses with Java level field accesses - that would be very broken.

That seems reasonable, but for my untrained eye it’s not trivially true that relaxing the implementation is correct for all the uses of the get/put primitives. I am therefore a bit reluctant to do so without understanding the implications.

> For classFileparser we should no concurrency issues.

That seems reasonable. What degree of certainty does your “should” come with? :)

Cheers,
Mikael

>
> David
>
>> In the unaligned/memcpy case I think we can agree that there’s nothing preventing the compiler from doing individual loads/stores of the bytes making up the data. Especially in something like slowdebug that becomes more or less obvious - memcpy most likely isn’t intrinsified and is quite likely just copying a byte at a time. Given that the data is, in fact, unaligned, there is really no simple way to prevent word tearing, so I’m pretty sure that we never depend on it - if needed, we’re likely to already have some higher level synchronization in place guarding the accesses. And the fact that the other, non-x86 platforms already do individual byte loads/stores when the pointer is unaligned indicates is a further indication that that’s the case.
>>
>> However, the aligned case is where stuff gets more interesting. I don’t think the C/C++ spec guarantees that accessing a memory location using a pointer of type T will result in code which does a single load/store of size >= sizeof(T), but for all the compilers we *actually* use that’s likely to be the case. If it’s true that the compilers don’t splits the memory accesses, that means we won’t have word tearing when using the Bytes::get/put methods with *aligned* pointers.
>>
>> If I switch to always using memcpy, there’s a risk that it introduces tearing problems where earlier we had none. Two questions come to mind:
>>
>> * For the cases where the get/put methods get used *today*, is that a problem?
>> * What happens if somebody in the *future* decides that put_Java_u4 seems like a great thing to use to write to a Java int field on the Java heap, and a Java thread is racing to read that same data?
>>
>>
>> All that said though, I think this is worth exploring and it may well turn out that word tearing really isn’t a problem. Also, I believe there may be opportunities to further clean up this code and perhaps unify it a bit across the various platforms.
>>
>> And *that* said, I think the change as it stands is still an improvement, so I’m leaning towards pushing it and filing an enhancement and following up on it separately. Let me know if you strongly feel that this should be looked into and addressed now and I may reconsider :)
>>
>> Cheers,
>> Mikael
>>
>>>
>>> /Robbin
>>>
>>>> ------------------------------------------------------------------------------

Reply | Threaded
Open this post in threaded view
|

Re: RFR(L): 8180032: Unaligned pointer dereference in ClassFileParser

David Holmes
On 19/05/2017 9:19 AM, Mikael Vidstedt wrote:

>
>> On May 18, 2017, at 3:50 PM, David Holmes <[hidden email]
>> <mailto:[hidden email]>> wrote:
>>
>> Hi Mikael,
>>
>> On 19/05/2017 8:15 AM, Mikael Vidstedt wrote:
>>>
>>>> On May 18, 2017, at 2:59 AM, Robbin Ehn <[hidden email]
>>>> <mailto:[hidden email]>> wrote:
>>>>
>>>> Hi,
>>>>
>>>> On 05/17/2017 03:46 AM, Kim Barrett wrote:
>>>>>> On May 9, 2017, at 6:40 PM, Mikael Vidstedt
>>>>>> <[hidden email] <mailto:[hidden email]>>
>>>>>> wrote:
>>>>>>
>>>>>>
>>>>>> Warning: It may be wise to stock up on coffee or tea before
>>>>>> reading this.
>>>>>>
>>>>>> Bug: https://bugs.openjdk.java.net/browse/JDK-8180032
>>>>>> Webrev:
>>>>>> http://cr.openjdk.java.net/~mikael/webrevs/8180032/webrev.00/hotspot/webrev/
>>>>>> <http://cr.openjdk.java.net/~mikael/webrevs/8180032/webrev.00/hotspot/webrev/>
>>>>> Not a review, just a question.
>>>>> ------------------------------------------------------------------------------
>>>>> src/cpu/x86/vm/bytes_x86.hpp
>>>>>  40   template <typename T>
>>>>>  41   static inline T get_native(const void* p) {
>>>>>  42     assert(p != NULL, "null pointer");
>>>>>  43
>>>>>  44     T x;
>>>>>  45
>>>>>  46     if (is_ptr_aligned(p, sizeof(T))) {
>>>>>  47       x = *(T*)p;
>>>>>  48     } else {
>>>>>  49       memcpy(&x, p, sizeof(T));
>>>>>  50     }
>>>>>  51
>>>>>  52     return x;
>>>>> I'm looking at this and wondering if there's a good reason to not just
>>>>> unconditionally use memcpy here.  gcc -O will generate a single move
>>>>> instruction for that on x86_64.  I'm not sure what happens on 32bit
>>>>> with an 8 byte value, but I suspect it will do something similarly
>>>>> sensible, e.g. 2 4 byte memory to memory transfers.
>>>>
>>>> Unconditionally memcpy would be nice!
>>>>
>>>> Are going to look into that Mikael?
>>>
>>> It’s complicated…
>>>
>>> We may be able to switch, but there is (maybe) a subtle reason why
>>> the alignment check is in there: to avoid word tearing..
>>>
>>> Think of two threads racing:
>>>
>>> * thread 1 is writing to the memory location X
>>> * thread 2 is reading from the same memory location X
>>>
>>> Will thread 2 always see a consistent value (either the original
>>> value or the fully updated value)?
>>
>> We're talking about internal VM load and stores rights? For those we
>> need to use appropriate atomic routine if there are potential races.
>> But we should never be mixing these kind of accesses with Java level
>> field accesses - that would be very broken.
>
> That seems reasonable, but for my untrained eye it’s not trivially true
> that relaxing the implementation is correct for all the uses of the
> get/put primitives. I am therefore a bit reluctant to do so without
> understanding the implications.

If a Copy routine doesn't have Atomic in its name then I don't expect
atomicity. Even then unaligned accesses are not atomic even in the
Atomic routine!

But I'm not clear exactly how all these routines get used.

>> For classFileparser we should no concurrency issues.
>
> That seems reasonable. What degree of certainty does your “should” come
> with? :)

Pretty high. We're parsing a stream of bytes and writing values into
local structures that will eventually be passed across to a klass
instance, which in turn will eventually be published via the SD as a
loaded class. The actual parsing phase is purely single-threaded.

David

> Cheers,
> Mikael
>
>>
>> David
>>
>>> In the unaligned/memcpy case I think we can agree that there’s
>>> nothing preventing the compiler from doing individual loads/stores of
>>> the bytes making up the data. Especially in something like slowdebug
>>> that becomes more or less obvious - memcpy most likely isn’t
>>> intrinsified and is quite likely just copying a byte at a time. Given
>>> that the data is, in fact, unaligned, there is really no simple way
>>> to prevent word tearing, so I’m pretty sure that we never depend on
>>> it - if needed, we’re likely to already have some higher level
>>> synchronization in place guarding the accesses. And the fact that the
>>> other, non-x86 platforms already do individual byte loads/stores when
>>> the pointer is unaligned indicates is a further indication that
>>> that’s the case.
>>>
>>> However, the aligned case is where stuff gets more interesting. I
>>> don’t think the C/C++ spec guarantees that accessing a memory
>>> location using a pointer of type T will result in code which does a
>>> single load/store of size >= sizeof(T), but for all the compilers we
>>> *actually* use that’s likely to be the case. If it’s true that the
>>> compilers don’t splits the memory accesses, that means we won’t have
>>> word tearing when using the Bytes::get/put methods with *aligned*
>>> pointers.
>>>
>>> If I switch to always using memcpy, there’s a risk that it introduces
>>> tearing problems where earlier we had none. Two questions come to mind:
>>>
>>> * For the cases where the get/put methods get used *today*, is that a
>>> problem?
>>> * What happens if somebody in the *future* decides that put_Java_u4
>>> seems like a great thing to use to write to a Java int field on the
>>> Java heap, and a Java thread is racing to read that same data?
>>>
>>>
>>> All that said though, I think this is worth exploring and it may well
>>> turn out that word tearing really isn’t a problem. Also, I believe
>>> there may be opportunities to further clean up this code and perhaps
>>> unify it a bit across the various platforms.
>>>
>>> And *that* said, I think the change as it stands is still an
>>> improvement, so I’m leaning towards pushing it and filing an
>>> enhancement and following up on it separately. Let me know if you
>>> strongly feel that this should be looked into and addressed now and I
>>> may reconsider :)
>>>
>>> Cheers,
>>> Mikael
>>>
>>>>
>>>> /Robbin
>>>>
>>>>> ------------------------------------------------------------------------------
>
Reply | Threaded
Open this post in threaded view
|

Re: RFR(L): 8180032: Unaligned pointer dereference in ClassFileParser

Mikael Vidstedt-3

I’ve been spending the last few days going down a rabbit hole of what turned out to be a totally unrelated performance issue. Long story short: startup time is affected, in some cases significantly, by the length of the path to the JDK in the file system. More on that in a separate thread/at another time.

After having looked at generated code, and having run benchmarks stressing class loading/startup time my conclusion is that this change is performance neutral. For example, the alignment check introduced in bytes_x86.hpp get_native/put_native collapses down to a single unconditional load unless, of course, it’s done in a loop in which case it gets unrolled+vectorized.

I also ran hs-tier2, which should more than cover the changes in question, and there were no failures.

With that in mind I would like to push the change in its current form[1] and handle a few things as follow-up work (roughly in order):

* Introduce typedefs in classFileParser for potentially unaligned pointer types
* Always using memcpy to do the read - need to investigate how the primitives are used wrt. tearing
* Unify the Bytes::* impl across platforms - need to investigate/verify the implications on performance

Reasonable?

Cheers,
Mikael

[1] http://cr.openjdk.java.net/~mikael/webrevs/8180032/webrev.01/hotspot/webrev/ <http://cr.openjdk.java.net/~mikael/webrevs/8180032/webrev.01/hotspot/webrev/>

> On May 18, 2017, at 5:18 PM, David Holmes <[hidden email]> wrote:
>
> On 19/05/2017 9:19 AM, Mikael Vidstedt wrote:
>>
>>> On May 18, 2017, at 3:50 PM, David Holmes <[hidden email]
>>> <mailto:[hidden email]>> wrote:
>>>
>>> Hi Mikael,
>>>
>>> On 19/05/2017 8:15 AM, Mikael Vidstedt wrote:
>>>>
>>>>> On May 18, 2017, at 2:59 AM, Robbin Ehn <[hidden email]
>>>>> <mailto:[hidden email]>> wrote:
>>>>>
>>>>> Hi,
>>>>>
>>>>> On 05/17/2017 03:46 AM, Kim Barrett wrote:
>>>>>>> On May 9, 2017, at 6:40 PM, Mikael Vidstedt
>>>>>>> <[hidden email] <mailto:[hidden email]>>
>>>>>>> wrote:
>>>>>>>
>>>>>>>
>>>>>>> Warning: It may be wise to stock up on coffee or tea before
>>>>>>> reading this.
>>>>>>>
>>>>>>> Bug: https://bugs.openjdk.java.net/browse/JDK-8180032
>>>>>>> Webrev:
>>>>>>> http://cr.openjdk.java.net/~mikael/webrevs/8180032/webrev.00/hotspot/webrev/
>>>>>>> <http://cr.openjdk.java.net/~mikael/webrevs/8180032/webrev.00/hotspot/webrev/>
>>>>>> Not a review, just a question.
>>>>>> ------------------------------------------------------------------------------
>>>>>> src/cpu/x86/vm/bytes_x86.hpp
>>>>>> 40   template <typename T>
>>>>>> 41   static inline T get_native(const void* p) {
>>>>>> 42     assert(p != NULL, "null pointer");
>>>>>> 43
>>>>>> 44     T x;
>>>>>> 45
>>>>>> 46     if (is_ptr_aligned(p, sizeof(T))) {
>>>>>> 47       x = *(T*)p;
>>>>>> 48     } else {
>>>>>> 49       memcpy(&x, p, sizeof(T));
>>>>>> 50     }
>>>>>> 51
>>>>>> 52     return x;
>>>>>> I'm looking at this and wondering if there's a good reason to not just
>>>>>> unconditionally use memcpy here.  gcc -O will generate a single move
>>>>>> instruction for that on x86_64.  I'm not sure what happens on 32bit
>>>>>> with an 8 byte value, but I suspect it will do something similarly
>>>>>> sensible, e.g. 2 4 byte memory to memory transfers.
>>>>>
>>>>> Unconditionally memcpy would be nice!
>>>>>
>>>>> Are going to look into that Mikael?
>>>>
>>>> It’s complicated…
>>>>
>>>> We may be able to switch, but there is (maybe) a subtle reason why
>>>> the alignment check is in there: to avoid word tearing..
>>>>
>>>> Think of two threads racing:
>>>>
>>>> * thread 1 is writing to the memory location X
>>>> * thread 2 is reading from the same memory location X
>>>>
>>>> Will thread 2 always see a consistent value (either the original
>>>> value or the fully updated value)?
>>>
>>> We're talking about internal VM load and stores rights? For those we
>>> need to use appropriate atomic routine if there are potential races.
>>> But we should never be mixing these kind of accesses with Java level
>>> field accesses - that would be very broken.
>>
>> That seems reasonable, but for my untrained eye it’s not trivially true
>> that relaxing the implementation is correct for all the uses of the
>> get/put primitives. I am therefore a bit reluctant to do so without
>> understanding the implications.
>
> If a Copy routine doesn't have Atomic in its name then I don't expect atomicity. Even then unaligned accesses are not atomic even in the Atomic routine!
>
> But I'm not clear exactly how all these routines get used.
>
>>> For classFileparser we should no concurrency issues.
>>
>> That seems reasonable. What degree of certainty does your “should” come
>> with? :)
>
> Pretty high. We're parsing a stream of bytes and writing values into local structures that will eventually be passed across to a klass instance, which in turn will eventually be published via the SD as a loaded class. The actual parsing phase is purely single-threaded.
>
> David
>
>> Cheers,
>> Mikael
>>
>>>
>>> David
>>>
>>>> In the unaligned/memcpy case I think we can agree that there’s
>>>> nothing preventing the compiler from doing individual loads/stores of
>>>> the bytes making up the data. Especially in something like slowdebug
>>>> that becomes more or less obvious - memcpy most likely isn’t
>>>> intrinsified and is quite likely just copying a byte at a time. Given
>>>> that the data is, in fact, unaligned, there is really no simple way
>>>> to prevent word tearing, so I’m pretty sure that we never depend on
>>>> it - if needed, we’re likely to already have some higher level
>>>> synchronization in place guarding the accesses. And the fact that the
>>>> other, non-x86 platforms already do individual byte loads/stores when
>>>> the pointer is unaligned indicates is a further indication that
>>>> that’s the case.
>>>>
>>>> However, the aligned case is where stuff gets more interesting. I
>>>> don’t think the C/C++ spec guarantees that accessing a memory
>>>> location using a pointer of type T will result in code which does a
>>>> single load/store of size >= sizeof(T), but for all the compilers we
>>>> *actually* use that’s likely to be the case. If it’s true that the
>>>> compilers don’t splits the memory accesses, that means we won’t have
>>>> word tearing when using the Bytes::get/put methods with *aligned*
>>>> pointers.
>>>>
>>>> If I switch to always using memcpy, there’s a risk that it introduces
>>>> tearing problems where earlier we had none. Two questions come to mind:
>>>>
>>>> * For the cases where the get/put methods get used *today*, is that a
>>>> problem?
>>>> * What happens if somebody in the *future* decides that put_Java_u4
>>>> seems like a great thing to use to write to a Java int field on the
>>>> Java heap, and a Java thread is racing to read that same data?
>>>>
>>>>
>>>> All that said though, I think this is worth exploring and it may well
>>>> turn out that word tearing really isn’t a problem. Also, I believe
>>>> there may be opportunities to further clean up this code and perhaps
>>>> unify it a bit across the various platforms.
>>>>
>>>> And *that* said, I think the change as it stands is still an
>>>> improvement, so I’m leaning towards pushing it and filing an
>>>> enhancement and following up on it separately. Let me know if you
>>>> strongly feel that this should be looked into and addressed now and I
>>>> may reconsider :)
>>>>
>>>> Cheers,
>>>> Mikael
>>>>
>>>>>
>>>>> /Robbin
>>>>>
>>>>>> ------------------------------------------------------------------------------
>>

Reply | Threaded
Open this post in threaded view
|

Re: RFR(L): 8180032: Unaligned pointer dereference in ClassFileParser

David Holmes
On 26/05/2017 8:00 AM, Mikael Vidstedt wrote:
> I’ve been spending the last few days going down a rabbit hole of what
> turned out to be a totally unrelated performance issue. Long story
> short: startup time is affected, in some cases significantly, by the
> length of the path to the JDK in the file system. More on that in a
> separate thread/at another time.

https://bugs.openjdk.java.net/browse/JDK-7196911 ?

> After having looked at generated code, and having run benchmarks
> stressing class loading/startup time my conclusion is that this change
> is performance neutral. For example, the alignment check introduced in
> bytes_x86.hpp get_native/put_native collapses down to a single
> unconditional load unless, of course, it’s done in a loop in which case
> it gets unrolled+vectorized.
>
> I also ran hs-tier2, which should more than cover the changes in
> question, and there were no failures.
>
> With that in mind I would like to push the change in its current form[1]
> and handle a few things as follow-up work (roughly in order):
>
> * Introduce typedefs in classFileParser for potentially unaligned
> pointer types
> * Always using memcpy to do the read - need to investigate how the
> primitives are used wrt. tearing
> * Unify the Bytes::* impl across platforms - need to investigate/verify
> the implications on performance
>
> Reasonable?

Reasonable.

Cheers,
David


> Cheers,
> Mikael
>
> [1]
> http://cr.openjdk.java.net/~mikael/webrevs/8180032/webrev.01/hotspot/webrev/
>
>> On May 18, 2017, at 5:18 PM, David Holmes <[hidden email]
>> <mailto:[hidden email]>> wrote:
>>
>> On 19/05/2017 9:19 AM, Mikael Vidstedt wrote:
>>>
>>>> On May 18, 2017, at 3:50 PM, David Holmes <[hidden email]
>>>> <mailto:[hidden email]>
>>>> <mailto:[hidden email]>> wrote:
>>>>
>>>> Hi Mikael,
>>>>
>>>> On 19/05/2017 8:15 AM, Mikael Vidstedt wrote:
>>>>>
>>>>>> On May 18, 2017, at 2:59 AM, Robbin Ehn <[hidden email]
>>>>>> <mailto:[hidden email]>
>>>>>> <mailto:[hidden email]>> wrote:
>>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> On 05/17/2017 03:46 AM, Kim Barrett wrote:
>>>>>>>> On May 9, 2017, at 6:40 PM, Mikael Vidstedt
>>>>>>>> <[hidden email] <mailto:[hidden email]>
>>>>>>>> <mailto:[hidden email]>>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>> Warning: It may be wise to stock up on coffee or tea before
>>>>>>>> reading this.
>>>>>>>>
>>>>>>>> Bug: https://bugs.openjdk.java.net/browse/JDK-8180032
>>>>>>>> Webrev:
>>>>>>>> http://cr.openjdk.java.net/~mikael/webrevs/8180032/webrev.00/hotspot/webrev/
>>>>>>>> <http://cr.openjdk.java.net/~mikael/webrevs/8180032/webrev.00/hotspot/webrev/>
>>>>>>> Not a review, just a question.
>>>>>>> ------------------------------------------------------------------------------
>>>>>>> src/cpu/x86/vm/bytes_x86.hpp
>>>>>>> 40   template <typename T>
>>>>>>> 41   static inline T get_native(const void* p) {
>>>>>>> 42     assert(p != NULL, "null pointer");
>>>>>>> 43
>>>>>>> 44     T x;
>>>>>>> 45
>>>>>>> 46     if (is_ptr_aligned(p, sizeof(T))) {
>>>>>>> 47       x = *(T*)p;
>>>>>>> 48     } else {
>>>>>>> 49       memcpy(&x, p, sizeof(T));
>>>>>>> 50     }
>>>>>>> 51
>>>>>>> 52     return x;
>>>>>>> I'm looking at this and wondering if there's a good reason to not
>>>>>>> just
>>>>>>> unconditionally use memcpy here.  gcc -O will generate a single move
>>>>>>> instruction for that on x86_64.  I'm not sure what happens on 32bit
>>>>>>> with an 8 byte value, but I suspect it will do something similarly
>>>>>>> sensible, e.g. 2 4 byte memory to memory transfers.
>>>>>>
>>>>>> Unconditionally memcpy would be nice!
>>>>>>
>>>>>> Are going to look into that Mikael?
>>>>>
>>>>> It’s complicated…
>>>>>
>>>>> We may be able to switch, but there is (maybe) a subtle reason why
>>>>> the alignment check is in there: to avoid word tearing..
>>>>>
>>>>> Think of two threads racing:
>>>>>
>>>>> * thread 1 is writing to the memory location X
>>>>> * thread 2 is reading from the same memory location X
>>>>>
>>>>> Will thread 2 always see a consistent value (either the original
>>>>> value or the fully updated value)?
>>>>
>>>> We're talking about internal VM load and stores rights? For those we
>>>> need to use appropriate atomic routine if there are potential races.
>>>> But we should never be mixing these kind of accesses with Java level
>>>> field accesses - that would be very broken.
>>>
>>> That seems reasonable, but for my untrained eye it’s not trivially true
>>> that relaxing the implementation is correct for all the uses of the
>>> get/put primitives. I am therefore a bit reluctant to do so without
>>> understanding the implications.
>>
>> If a Copy routine doesn't have Atomic in its name then I don't expect
>> atomicity. Even then unaligned accesses are not atomic even in the
>> Atomic routine!
>>
>> But I'm not clear exactly how all these routines get used.
>>
>>>> For classFileparser we should no concurrency issues.
>>>
>>> That seems reasonable. What degree of certainty does your “should” come
>>> with? :)
>>
>> Pretty high. We're parsing a stream of bytes and writing values into
>> local structures that will eventually be passed across to a klass
>> instance, which in turn will eventually be published via the SD as a
>> loaded class. The actual parsing phase is purely single-threaded.
>>
>> David
>>
>>> Cheers,
>>> Mikael
>>>
>>>>
>>>> David
>>>>
>>>>> In the unaligned/memcpy case I think we can agree that there’s
>>>>> nothing preventing the compiler from doing individual loads/stores of
>>>>> the bytes making up the data. Especially in something like slowdebug
>>>>> that becomes more or less obvious - memcpy most likely isn’t
>>>>> intrinsified and is quite likely just copying a byte at a time. Given
>>>>> that the data is, in fact, unaligned, there is really no simple way
>>>>> to prevent word tearing, so I’m pretty sure that we never depend on
>>>>> it - if needed, we’re likely to already have some higher level
>>>>> synchronization in place guarding the accesses. And the fact that the
>>>>> other, non-x86 platforms already do individual byte loads/stores when
>>>>> the pointer is unaligned indicates is a further indication that
>>>>> that’s the case.
>>>>>
>>>>> However, the aligned case is where stuff gets more interesting. I
>>>>> don’t think the C/C++ spec guarantees that accessing a memory
>>>>> location using a pointer of type T will result in code which does a
>>>>> single load/store of size >= sizeof(T), but for all the compilers we
>>>>> *actually* use that’s likely to be the case. If it’s true that the
>>>>> compilers don’t splits the memory accesses, that means we won’t have
>>>>> word tearing when using the Bytes::get/put methods with *aligned*
>>>>> pointers.
>>>>>
>>>>> If I switch to always using memcpy, there’s a risk that it introduces
>>>>> tearing problems where earlier we had none. Two questions come to mind:
>>>>>
>>>>> * For the cases where the get/put methods get used *today*, is that a
>>>>> problem?
>>>>> * What happens if somebody in the *future* decides that put_Java_u4
>>>>> seems like a great thing to use to write to a Java int field on the
>>>>> Java heap, and a Java thread is racing to read that same data?
>>>>>
>>>>>
>>>>> All that said though, I think this is worth exploring and it may well
>>>>> turn out that word tearing really isn’t a problem. Also, I believe
>>>>> there may be opportunities to further clean up this code and perhaps
>>>>> unify it a bit across the various platforms.
>>>>>
>>>>> And *that* said, I think the change as it stands is still an
>>>>> improvement, so I’m leaning towards pushing it and filing an
>>>>> enhancement and following up on it separately. Let me know if you
>>>>> strongly feel that this should be looked into and addressed now and I
>>>>> may reconsider :)
>>>>>
>>>>> Cheers,
>>>>> Mikael
>>>>>
>>>>>>
>>>>>> /Robbin
>>>>>>
>>>>>>> ------------------------------------------------------------------------------
>>>
>
Reply | Threaded
Open this post in threaded view
|

Re: RFR(L): 8180032: Unaligned pointer dereference in ClassFileParser

John Rose-3
In reply to this post by Mikael Vidstedt-3
On May 18, 2017, at 3:15 PM, Mikael Vidstedt <[hidden email]> wrote:
>
> I don’t think the C/C++ spec guarantees that accessing a memory location using a pointer of type T will result in code which does a single load/store of size >= sizeof(T), but for all the compilers we *actually* use that’s likely to be the case. If it’s true that the compilers don’t splits the memory accesses, that means we won’t have word tearing when using the Bytes::get/put methods with *aligned* pointers.

This is true I think and is the main reason memcpy isn't a trustworthy replacement for any racy code. We used it for primitive arraycopy long ago but it bit us with hard to reproduce bugs. That's why we have Copy now, including the explicit atomic versions.

For explicitly nonatomic copies memcpy is ok but I'd want to wrap it in an API that makes the nonatomicity explicit.

– John
Reply | Threaded
Open this post in threaded view
|

Re: RFR(L): 8180032: Unaligned pointer dereference in ClassFileParser

Robbin Ehn
In reply to this post by Mikael Vidstedt-3
Hi Mikael,

I see you have pushed this, good!
Sorry for the late response.

On 2017-05-26 00:00, Mikael Vidstedt wrote:
>
> Reasonable?

+1, thanks!

/Robbin

>
> Cheers,
> Mikael
>
> [1] http://cr.openjdk.java.net/~mikael/webrevs/8180032/webrev.01/hotspot/webrev/
>
>> On May 18, 2017, at 5:18 PM, David Holmes <[hidden email]
>> <mailto:[hidden email]>> wrote:
>>
>> On 19/05/2017 9:19 AM, Mikael Vidstedt wrote:
>>>
>>>> On May 18, 2017, at 3:50 PM, David Holmes <[hidden email]
>>>> <mailto:[hidden email]>
>>>> <mailto:[hidden email]>> wrote:
>>>>
>>>> Hi Mikael,
>>>>
>>>> On 19/05/2017 8:15 AM, Mikael Vidstedt wrote:
>>>>>
>>>>>> On May 18, 2017, at 2:59 AM, Robbin Ehn <[hidden email]
>>>>>> <mailto:[hidden email]>
>>>>>> <mailto:[hidden email]>> wrote:
>>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> On 05/17/2017 03:46 AM, Kim Barrett wrote:
>>>>>>>> On May 9, 2017, at 6:40 PM, Mikael Vidstedt
>>>>>>>> <[hidden email] <mailto:[hidden email]>
>>>>>>>> <mailto:[hidden email]>>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>> Warning: It may be wise to stock up on coffee or tea before
>>>>>>>> reading this.
>>>>>>>>
>>>>>>>> Bug: https://bugs.openjdk.java.net/browse/JDK-8180032
>>>>>>>> Webrev:
>>>>>>>> http://cr.openjdk.java.net/~mikael/webrevs/8180032/webrev.00/hotspot/webrev/
>>>>>>>> <http://cr.openjdk.java.net/~mikael/webrevs/8180032/webrev.00/hotspot/webrev/>
>>>>>>> Not a review, just a question.
>>>>>>> ------------------------------------------------------------------------------
>>>>>>> src/cpu/x86/vm/bytes_x86.hpp
>>>>>>> 40   template <typename T>
>>>>>>> 41   static inline T get_native(const void* p) {
>>>>>>> 42     assert(p != NULL, "null pointer");
>>>>>>> 43
>>>>>>> 44     T x;
>>>>>>> 45
>>>>>>> 46     if (is_ptr_aligned(p, sizeof(T))) {
>>>>>>> 47       x = *(T*)p;
>>>>>>> 48     } else {
>>>>>>> 49       memcpy(&x, p, sizeof(T));
>>>>>>> 50     }
>>>>>>> 51
>>>>>>> 52     return x;
>>>>>>> I'm looking at this and wondering if there's a good reason to not
>>>>>>> just
>>>>>>> unconditionally use memcpy here.  gcc -O will generate a single move
>>>>>>> instruction for that on x86_64.  I'm not sure what happens on 32bit
>>>>>>> with an 8 byte value, but I suspect it will do something similarly
>>>>>>> sensible, e.g. 2 4 byte memory to memory transfers.
>>>>>>
>>>>>> Unconditionally memcpy would be nice!
>>>>>>
>>>>>> Are going to look into that Mikael?
>>>>>
>>>>> It’s complicated…
>>>>>
>>>>> We may be able to switch, but there is (maybe) a subtle reason why
>>>>> the alignment check is in there: to avoid word tearing..
>>>>>
>>>>> Think of two threads racing:
>>>>>
>>>>> * thread 1 is writing to the memory location X
>>>>> * thread 2 is reading from the same memory location X
>>>>>
>>>>> Will thread 2 always see a consistent value (either the original
>>>>> value or the fully updated value)?
>>>>
>>>> We're talking about internal VM load and stores rights? For those we
>>>> need to use appropriate atomic routine if there are potential races.
>>>> But we should never be mixing these kind of accesses with Java level
>>>> field accesses - that would be very broken.
>>>
>>> That seems reasonable, but for my untrained eye it’s not trivially true
>>> that relaxing the implementation is correct for all the uses of the
>>> get/put primitives. I am therefore a bit reluctant to do so without
>>> understanding the implications.
>>
>> If a Copy routine doesn't have Atomic in its name then I don't expect
>> atomicity. Even then unaligned accesses are not atomic even in the
>> Atomic routine!
>>
>> But I'm not clear exactly how all these routines get used.
>>
>>>> For classFileparser we should no concurrency issues.
>>>
>>> That seems reasonable. What degree of certainty does your “should” come
>>> with? :)
>>
>> Pretty high. We're parsing a stream of bytes and writing values into
>> local structures that will eventually be passed across to a klass
>> instance, which in turn will eventually be published via the SD as a
>> loaded class. The actual parsing phase is purely single-threaded.
>>
>> David
>>
>>> Cheers,
>>> Mikael
>>>
>>>>
>>>> David
>>>>
>>>>> In the unaligned/memcpy case I think we can agree that there’s
>>>>> nothing preventing the compiler from doing individual loads/stores of
>>>>> the bytes making up the data. Especially in something like slowdebug
>>>>> that becomes more or less obvious - memcpy most likely isn’t
>>>>> intrinsified and is quite likely just copying a byte at a time. Given
>>>>> that the data is, in fact, unaligned, there is really no simple way
>>>>> to prevent word tearing, so I’m pretty sure that we never depend on
>>>>> it - if needed, we’re likely to already have some higher level
>>>>> synchronization in place guarding the accesses. And the fact that the
>>>>> other, non-x86 platforms already do individual byte loads/stores when
>>>>> the pointer is unaligned indicates is a further indication that
>>>>> that’s the case.
>>>>>
>>>>> However, the aligned case is where stuff gets more interesting. I
>>>>> don’t think the C/C++ spec guarantees that accessing a memory
>>>>> location using a pointer of type T will result in code which does a
>>>>> single load/store of size >= sizeof(T), but for all the compilers we
>>>>> *actually* use that’s likely to be the case. If it’s true that the
>>>>> compilers don’t splits the memory accesses, that means we won’t have
>>>>> word tearing when using the Bytes::get/put methods with *aligned*
>>>>> pointers.
>>>>>
>>>>> If I switch to always using memcpy, there’s a risk that it introduces
>>>>> tearing problems where earlier we had none. Two questions come to mind:
>>>>>
>>>>> * For the cases where the get/put methods get used *today*, is that a
>>>>> problem?
>>>>> * What happens if somebody in the *future* decides that put_Java_u4
>>>>> seems like a great thing to use to write to a Java int field on the
>>>>> Java heap, and a Java thread is racing to read that same data?
>>>>>
>>>>>
>>>>> All that said though, I think this is worth exploring and it may well
>>>>> turn out that word tearing really isn’t a problem. Also, I believe
>>>>> there may be opportunities to further clean up this code and perhaps
>>>>> unify it a bit across the various platforms.
>>>>>
>>>>> And *that* said, I think the change as it stands is still an
>>>>> improvement, so I’m leaning towards pushing it and filing an
>>>>> enhancement and following up on it separately. Let me know if you
>>>>> strongly feel that this should be looked into and addressed now and I
>>>>> may reconsider :)
>>>>>
>>>>> Cheers,
>>>>> Mikael
>>>>>
>>>>>>
>>>>>> /Robbin
>>>>>>
>>>>>>> ------------------------------------------------------------------------------
>>>
>