Quantcast

Optimizing byte reverse code for int value

classic Classic list List threaded Threaded
13 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Optimizing byte reverse code for int value

Michihiro Horie

Dear all,

Would you please review our change for JDK10 on ppc64?
Issue: https://bugs.openjdk.java.net/browse/JDK-8178294
Webrev: http://cr.openjdk.java.net/~horii/8178294/webrev.00/

This change adds two conversion rules of reversing contiguous 4 bytes for
int value.
The first conversion rule finds a pattern below and emits a lwz instruction
instead.

Original:
       lbz     r14,19(r12)
       lbz     r11,17(r12)
       lbz     r10,18(r12)
       lbz     r9,16(r12)
       extsb   r14,r14
       rlwinm  r10,r10,16,0,15
       rlwinm  r14,r14,24,0,7
       add     r14,r10,r14
       rlwinm  r11,r11,8,0,23
       add     r12,r11,r9
       add     r14,r12,r14

Optimization with first conversion rule:
       lwz     r14,16(r12)


The second conversion rule finds a pattern below and emits only lfs
instruction.

Original:
       lbz     r14,19(r12)
       lbz     r11,17(r12)
       lbz     r10,18(r12)
       lbz     r9,16(r12)
       extsb   r14,r14
       rlwinm  r10,r10,16,0,15
       rlwinm  r14,r14,24,0,7
       add     r14,r10,r14
       rlwinm  r11,r11,8,0,23
       add     r12,r11,r9
       add     r14,r12,r14
       stw     r14,156(r1)
       lfs     f12,156(r1)

Optimization with first conversion rule:
       lfs     f12,156(r1)


Our motivation comes from the fact that a performance bottleneck exists in
byte reversing code in Apache ORC on
Tez framework as shown below.
https://github.com/apache/orc/blob/master/java/core/src/java/org/apache/orc/impl/SerializationUtils.java
We believe this kind of procedures is typical in Java.

public float readFloat(InputStream in) throws IOException {
  readFully(in, readBuffer, 0, 4);
  int val = (((readBuffer[0] & 0xff) << 0)
          + ((readBuffer[1] & 0xff) << 8)
          + ((readBuffer[2] & 0xff) << 16)
          + ((readBuffer[3] & 0xff) << 24));
  return Float.intBitsToFloat(val);
}


By using our change, we could observe 5% performance improvement in a micro
benchmark.
(See attached file: ReadFloatTest.java)

Best regards,
--
Michihiro,
IBM Research - Tokyo
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

RE: Optimizing byte reverse code for int value

Doerr, Martin
Hi Michihiro,

thanks for providing the webrev. I appreciate improvements for this bottleneck.

After taking a first look over it, it looks good to me for ppc64le.
But I think it would break big endian platforms.

I suggest replacing the use of loadI by endianness specific code (which could possibly use lwbrx on big endian).

Best regards,
Martin


From: Michihiro Horie [mailto:[hidden email]]
Sent: Freitag, 7. April 2017 07:50
To: [hidden email]; [hidden email]
Cc: Doerr, Martin <[hidden email]>; Simonis, Volker <[hidden email]>; [hidden email]; Hiroshi H Horii <[hidden email]>; Lindenmaier, Goetz <[hidden email]>; Gustavo Bueno Romero <[hidden email]>
Subject: Optimizing byte reverse code for int value


Dear all,

Would you please review our change for JDK10 on ppc64?
Issue: https://bugs.openjdk.java.net/browse/JDK-8178294
Webrev: http://cr.openjdk.java.net/~horii/8178294/webrev.00/

This change adds two conversion rules of reversing contiguous 4 bytes for int value.
The first conversion rule finds a pattern below and emits a lwz instruction instead.

Original:
lbz r14,19(r12)
lbz r11,17(r12)
lbz r10,18(r12)
lbz r9,16(r12)
extsb r14,r14
rlwinm r10,r10,16,0,15
rlwinm r14,r14,24,0,7
add r14,r10,r14
rlwinm r11,r11,8,0,23
add r12,r11,r9
add r14,r12,r14

Optimization with first conversion rule:
lwz r14,16(r12)


The second conversion rule finds a pattern below and emits only lfs instruction.

Original:
lbz r14,19(r12)
lbz r11,17(r12)
lbz r10,18(r12)
lbz r9,16(r12)
extsb r14,r14
rlwinm r10,r10,16,0,15
rlwinm r14,r14,24,0,7
add r14,r10,r14
rlwinm r11,r11,8,0,23
add r12,r11,r9
add r14,r12,r14
stw r14,156(r1)
lfs f12,156(r1)

Optimization with first conversion rule:
lfs f12,156(r1)


Our motivation comes from the fact that a performance bottleneck exists in byte reversing code in Apache ORC on
Tez framework as shown below.
https://github.com/apache/orc/blob/master/java/core/src/java/org/apache/orc/impl/SerializationUtils.java
We believe this kind of procedures is typical in Java.

public float readFloat(InputStream in) throws IOException {
readFully(in, readBuffer, 0, 4);
int val = (((readBuffer[0] & 0xff) << 0)
+ ((readBuffer[1] & 0xff) << 8)
+ ((readBuffer[2] & 0xff) << 16)
+ ((readBuffer[3] & 0xff) << 24));
return Float.intBitsToFloat(val);
}


By using our change, we could observe 5% performance improvement in a micro benchmark.
(See attached file: ReadFloatTest.java)

Best regards,
--
Michihiro,
IBM Research - Tokyo
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Optimizing byte reverse code for int value

Andrew Haley
In reply to this post by Michihiro Horie
On 07/04/17 06:49, Michihiro Horie wrote:
>
> Would you please review our change for JDK10 on ppc64?
> Issue: https://bugs.openjdk.java.net/browse/JDK-8178294
> Webrev: http://cr.openjdk.java.net/~horii/8178294/webrev.00/
>
> This change adds two conversion rules of reversing contiguous 4 bytes for
> int value.
> The first conversion rule finds a pattern below and emits a lwz instruction
> instead.

Surely the source code needs fixing.  It could be:

public float readFloat(InputStream in) throws IOException {
  readFully(in, aByteBuffer, 0, 4);
  int val = aByteBuffer.getInt(0);

  return Float.intBitsToFloat(val);
}

Then there would be no need for a special ppc64 pattern.

Andrew.

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Optimizing byte reverse code for int value

Hiroshi H Horii
> I suggest replacing the use of loadI by endianness specific code
> (which could possibly use lwbrx on big endian).

I believe that lwbrx is necessary for BE.

> Surely the source code needs fixing.  It could be:
>
> public float readFloat(InputStream in) throws IOException {
>   readFully(in, aByteBuffer, 0, 4);
>   int val = aByteBuffer.getInt(0);
>
>   return Float.intBitsToFloat(val);
> }

In my understanding, ByteBuffer.getInt() does the similar thing. I guess
that application does not use ByteBuffer only for calling getInt().

Heap-X-Buffer.java.template
    public int getInt() {
        return Bits.getInt(this, ix(nextGetIndex(4)), bigEndian);
    }

Bits.java
    static int getInt(ByteBuffer bb, int bi, boolean bigEndian) {
        return bigEndian ? getIntB(bb, bi) : getIntL(bb, bi) ;
    }
    static int getIntL(ByteBuffer bb, int bi) {
        return makeInt(bb._get(bi + 3),
                       bb._get(bi + 2),
                       bb._get(bi + 1),
                       bb._get(bi    ));
    }
    static private int makeInt(byte b3, byte b2, byte b1, byte b0) {
        return (((b3       ) << 24) |
                ((b2 & 0xff) << 16) |
                ((b1 & 0xff) <<  8) |
                ((b0 & 0xff)      ));
    }

Heap-X-Buffer.java.template
    byte _get(int i) {                          // package-private
        return hb[i];
    }

Direct-X-Buffer.java.template
    byte _get(int i) {                          // package-private
        return unsafe.getByte(address + i);
    }

Regards,
Hiroshi

Andrew Haley <[hidden email]> wrote on 2017/04/07 17:36:13:

> From: Andrew Haley <[hidden email]>
> To: Michihiro Horie/Japan/IBM@IBMJP, ppc-aix-port-
> [hidden email], [hidden email]
> Cc: Hiroshi H Horii/Japan/IBM@IBMJP, [hidden email]
> Date: 2017/04/07 17:37
> Subject: Re: Optimizing byte reverse code for int value
>
> On 07/04/17 06:49, Michihiro Horie wrote:
> >
> > Would you please review our change for JDK10 on ppc64?
> > Issue: https://bugs.openjdk.java.net/browse/JDK-8178294
> > Webrev: http://cr.openjdk.java.net/~horii/8178294/webrev.00/
> >
> > This change adds two conversion rules of reversing contiguous 4 bytes
for
> > int value.
> > The first conversion rule finds a pattern below and emits a lwz
instruction

> > instead.
>
> Surely the source code needs fixing.  It could be:
>
> public float readFloat(InputStream in) throws IOException {
>   readFully(in, aByteBuffer, 0, 4);
>   int val = aByteBuffer.getInt(0);
>
>   return Float.intBitsToFloat(val);
> }
>
> Then there would be no need for a special ppc64 pattern.
>
> Andrew.
>


Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Optimizing byte reverse code for int value

Andrew Haley
On 07/04/17 18:51, Hiroshi H Horii wrote:

>> I suggest replacing the use of loadI by endianness specific code
>> (which could possibly use lwbrx on big endian).
>
> I believe that lwbrx is necessary for BE.
>
>> Surely the source code needs fixing.  It could be:
>>
>> public float readFloat(InputStream in) throws IOException {
>>   readFully(in, aByteBuffer, 0, 4);
>>   int val = aByteBuffer.getInt(0);
>>
>>   return Float.intBitsToFloat(val);
>> }
>
> In my understanding, ByteBuffer.getInt() does the similar thing.

It doesn't.

> I guess that application does not use ByteBuffer only for calling getInt().
>
> Heap-X-Buffer.java.template
>     public int getInt() {
>         return Bits.getInt(this, ix(nextGetIndex(4)), bigEndian);
>     }

This is old code.  In JDK9 it looks like

    public int getInt() {
        return unsafe.getIntUnaligned(hb, byteOffset(nextGetIndex(4)), bigEndian);
    }

Unsafe.getIntUnaligned is a HotSpot intrinsic.

Bits.java is not used for this.

Andrew.

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Optimizing byte reverse code for int value

Hiroshi H Horii
> > I guess that application does not use ByteBuffer only for calling
getInt().

> >
> > Heap-X-Buffer.java.template
> >     public int getInt() {
> >         return Bits.getInt(this, ix(nextGetIndex(4)), bigEndian);
> >     }
>
> This is old code.  In JDK9 it looks like
>
>     public int getInt() {
>         return unsafe.getIntUnaligned(hb,
> byteOffset(nextGetIndex(4)), bigEndian);
>     }
>
> Unsafe.getIntUnaligned is a HotSpot intrinsic.

Sorry. You are right. I checked jdk8u...

jdk8u doesn't have Unsafe.getIntUnaligned. This change will improve jdk8u.

Regards,
Hiroshi

Andrew Haley <[hidden email]> wrote on 2017/04/08 02:58:45:

> From: Andrew Haley <[hidden email]>
> To: Hiroshi H Horii/Japan/IBM@IBMJP
> Cc: [hidden email], Michihiro Horie/Japan/IBM@IBMJP,
> [hidden email], [hidden email]
> Date: 2017/04/08 02:59
> Subject: Re: Optimizing byte reverse code for int value
>
> On 07/04/17 18:51, Hiroshi H Horii wrote:
> >> I suggest replacing the use of loadI by endianness specific code
> >> (which could possibly use lwbrx on big endian).
> >
> > I believe that lwbrx is necessary for BE.
> >
> >> Surely the source code needs fixing.  It could be:
> >>
> >> public float readFloat(InputStream in) throws IOException {
> >>   readFully(in, aByteBuffer, 0, 4);
> >>   int val = aByteBuffer.getInt(0);
> >>
> >>   return Float.intBitsToFloat(val);
> >> }
> >
> > In my understanding, ByteBuffer.getInt() does the similar thing.
>
> It doesn't.
>
> > I guess that application does not use ByteBuffer only for calling
getInt().

> >
> > Heap-X-Buffer.java.template
> >     public int getInt() {
> >         return Bits.getInt(this, ix(nextGetIndex(4)), bigEndian);
> >     }
>
> This is old code.  In JDK9 it looks like
>
>     public int getInt() {
>         return unsafe.getIntUnaligned(hb,
> byteOffset(nextGetIndex(4)), bigEndian);
>     }
>
> Unsafe.getIntUnaligned is a HotSpot intrinsic.
>
> Bits.java is not used for this.
>
> Andrew.
>


Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Optimizing byte reverse code for int value

Andrew Haley
In reply to this post by Hiroshi H Horii
On 11/04/17 11:36, Michihiro Horie wrote:
> Thank you very much for letting us know Unsafe.getIntUnaligned is available in
> JDK9. I do agree we should fix Java source code.
> We think our byte-reverse optimization would still work on jdk8u as Hiroshi
> mentioned. Would you agree on this point?

I do, but I do not agree that this patch should necessarily be done in
the PowerPC-specific back end.  Have you considered it as a generic
optimization for all processors?

Andrew.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

RE: Optimizing byte reverse code for int value

Doerr, Martin
Hi Andrew,

thank you for your helpful comments. I fully agree with you.

In addition, I noticed that we don't have match rules which exploit byte reverse load/store instructions on PPC64.
SPARC already has them:
match(Set dst (ReverseBytesI/L/US/S (LoadI src)));
match(Set dst (StoreI dst (ReverseBytesI/L/US/S src)));
I think we should add them for jdk10. They should be used when the platform endianness doesn't match the bigEndian parameter in Unsafe methods.

Best regards,
Martin


-----Original Message-----
From: hotspot-dev [mailto:[hidden email]] On Behalf Of Andrew Haley
Sent: Dienstag, 11. April 2017 13:02
To: Michihiro Horie <[hidden email]>
Cc: Simonis, Volker <[hidden email]>; [hidden email]; [hidden email]; Hiroshi H Horii <[hidden email]>
Subject: Re: Optimizing byte reverse code for int value

On 11/04/17 11:36, Michihiro Horie wrote:
> Thank you very much for letting us know Unsafe.getIntUnaligned is available in
> JDK9. I do agree we should fix Java source code.
> We think our byte-reverse optimization would still work on jdk8u as Hiroshi
> mentioned. Would you agree on this point?

I do, but I do not agree that this patch should necessarily be done in
the PowerPC-specific back end.  Have you considered it as a generic
optimization for all processors?

Andrew.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

RE: Optimizing byte reverse code for int value

Doerr, Martin
In reply to this post by Andrew Haley
Hi Michihiro,

thanks for the quick reply.

I think Andrew’s idea is to optimize in the shared code instead of the platform backends. I haven’t thought about where this could be done.
Or would it be possible to backport jdk (especially Unsafe) changes? If the required changes are small enough and we don’t have to touch any public interface, this might be an option, too.

We’ll appreciate if you take care of the new match rules for PPC64. Thanks a lot.

Best regards,
Martin


From: Michihiro Horie [mailto:[hidden email]]
Sent: Dienstag, 11. April 2017 16:55
To: Doerr, Martin <[hidden email]>
Cc: [hidden email]; Hiroshi H Horii <[hidden email]>; [hidden email]; [hidden email]; Simonis, Volker <[hidden email]>
Subject: RE: Optimizing byte reverse code for int value

Andrew, Martin,

Thanks a lot for your helpful feedback.

>Have you considered it as a generic optimization for all processors?
We would support all processors for our byte-reverse optimization to make it generic.
Currently, I just finished adding match rules for little endian and big endian on PPC64, and am testing it in AIX.

>In addition, I noticed that we don't have match rules which exploit byte reverse load/store instructions on PPC64.
We would like to handle adding match rules for byte reverse load/store instructions on PPC64 for JDK10 if you would not mind.
Would it be fine with you?

Best regards,
--
Michihiro,
IBM Research - Tokyo


----- Original message -----
From: "Doerr, Martin" <[hidden email]<mailto:[hidden email]>>
To: Andrew Haley <[hidden email]<mailto:[hidden email]>>, Michihiro Horie/Japan/IBM@IBMJP
Cc: "Simonis, Volker" <[hidden email]<mailto:[hidden email]>>, "[hidden email]<mailto:[hidden email]>" <[hidden email]<mailto:[hidden email]>>, "[hidden email]<mailto:[hidden email]>" <[hidden email]<mailto:[hidden email]>>, Hiroshi H Horii/Japan/IBM@IBMJP
Subject: RE: Optimizing byte reverse code for int value
Date: Tue, Apr 11, 2017 10:44 PM

Hi Andrew,

thank you for your helpful comments. I fully agree with you.

In addition, I noticed that we don't have match rules which exploit byte reverse load/store instructions on PPC64.
SPARC already has them:
match(Set dst (ReverseBytesI/L/US/S (LoadI src)));
match(Set dst (StoreI dst (ReverseBytesI/L/US/S src)));
I think we should add them for jdk10. They should be used when the platform endianness doesn't match the bigEndian parameter in Unsafe methods.

Best regards,
Martin


-----Original Message-----
From: hotspot-dev [mailto:[hidden email]] On Behalf Of Andrew Haley
Sent: Dienstag, 11. April 2017 13:02
To: Michihiro Horie <[hidden email]<mailto:[hidden email]>>
Cc: Simonis, Volker <[hidden email]<mailto:[hidden email]>>; [hidden email]<mailto:[hidden email]>; [hidden email]<mailto:[hidden email]>; Hiroshi H Horii <[hidden email]<mailto:[hidden email]>>
Subject: Re: Optimizing byte reverse code for int value

On 11/04/17 11:36, Michihiro Horie wrote:
> Thank you very much for letting us know Unsafe.getIntUnaligned is available in
> JDK9. I do agree we should fix Java source code.
> We think our byte-reverse optimization would still work on jdk8u as Hiroshi
> mentioned. Would you agree on this point?

I do, but I do not agree that this patch should necessarily be done in
the PowerPC-specific back end.  Have you considered it as a generic
optimization for all processors?

Andrew.


Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

RE: Optimizing byte reverse code for int value

Michihiro Horie
In reply to this post by Andrew Haley


I noticed my silly mistake that previous change does not work in big
endian, although I ran jtreg..
This time I checked also with my micro benchmark ReadFloatTest.java.

Would you review the following newest change?
Webrev: http://cr.openjdk.java.net/~horii/8178294/webrev.02/

 (See attached file: ReadFloatTest.java)
Best regards,
--
Michihiro,
IBM Research - Tokyo


 ----- Original message -----
 From: Michihiro Horie/Japan/IBM
 To: [hidden email]
 Cc: [hidden email], Hiroshi H Horii/Japan/IBM@IBMJP,
 [hidden email], [hidden email],
 [hidden email], Gustavo Bueno Romero/Brazil/IBM@IBMBR
 Subject: RE: Optimizing byte reverse code for int value
 Date: Sat, Apr 22, 2017 1:18 AM

 Would you review following change for jdk8?

 Webrev: http://cr.openjdk.java.net/~horii/8178294/webrev.01/

 Our byte-reverse optimization now works in shared code. I tested it with
 jtreg on x86, ppc64, and ppc64le.
 Best regards,
 --
 Michihiro,
 IBM Research - Tokyo


 ----- Original message -----
 From: "Doerr, Martin" <[hidden email]>
 To: Michihiro Horie/Japan/IBM@IBMJP
 Cc: "[hidden email]" <[hidden email]>, Hiroshi H Horii/Japan/IBM@IBMJP,
 "[hidden email]" <[hidden email]>,
 "[hidden email]" <[hidden email]>,
 "Simonis, Volker" <[hidden email]>
 Subject: RE: Optimizing byte reverse code for int value
 Date: Wed, Apr 12, 2017 12:13 AM



 Hi Michihiro,





 thanks for the quick reply.





 I think Andrew’s idea is to optimize in the shared code instead of the
 platform backends. I haven’t thought about where this could be done.


 Or would it be possible to backport jdk (especially Unsafe) changes? If
 the required changes are small enough and we don’t have to touch any
 public interface, this might be an option, too.





 We’ll appreciate if you take care of the new match rules for PPC64. Thanks
 a lot.





 Best regards,


 Martin








 From: Michihiro Horie [mailto:[hidden email]]
 Sent: Dienstag, 11. April 2017 16:55
 To: Doerr, Martin <[hidden email]>
 Cc: [hidden email]; Hiroshi H Horii <[hidden email]>;
 [hidden email]; [hidden email]; Simonis,
 Volker <[hidden email]>
 Subject: RE: Optimizing byte reverse code for int value





 Andrew, Martin,





 Thanks a lot for your helpful feedback.





 >Have you considered it as a generic optimization for all processors?


 We would support all processors for our byte-reverse optimization to make
 it generic.


 Currently, I just finished adding match rules for little endian and big
 endian on PPC64, and am testing it in AIX.





 >In addition, I noticed that we don't have match rules which exploit byte
 reverse load/store instructions on PPC64.


 We would like to handle adding match rules for byte reverse load/store
 instructions on PPC64 for JDK10 if you would not mind.


 Would it be fine with you?





 Best regards,
 --
 Michihiro,
 IBM Research - Tokyo






  ----- Original message -----
  From: "Doerr, Martin" <[hidden email]>
  To: Andrew Haley <[hidden email]>, Michihiro Horie/Japan/IBM@IBMJP
  Cc: "Simonis, Volker" <[hidden email]>, "
  [hidden email]" <[hidden email]>, "
  [hidden email]" <[hidden email]>, Hiroshi H
  Horii/Japan/IBM@IBMJP
  Subject: RE: Optimizing byte reverse code for int value
  Date: Tue, Apr 11, 2017 10:44 PM



  Hi Andrew,

  thank you for your helpful comments. I fully agree with you.

  In addition, I noticed that we don't have match rules which exploit byte
  reverse load/store instructions on PPC64.
  SPARC already has them:
  match(Set dst (ReverseBytesI/L/US/S (LoadI src)));
  match(Set dst (StoreI dst (ReverseBytesI/L/US/S src)));
  I think we should add them for jdk10. They should be used when the
  platform endianness doesn't match the bigEndian parameter in Unsafe
  methods.

  Best regards,
  Martin


  -----Original Message-----
  From: hotspot-dev [mailto:[hidden email]] On Behalf
  Of Andrew Haley
  Sent: Dienstag, 11. April 2017 13:02
  To: Michihiro Horie <[hidden email]>
  Cc: Simonis, Volker <[hidden email]>;
  [hidden email]; [hidden email]; Hiroshi
  H Horii <[hidden email]>
  Subject: Re: Optimizing byte reverse code for int value

  On 11/04/17 11:36, Michihiro Horie wrote:
  > Thank you very much for letting us know Unsafe.getIntUnaligned is
  available in
  > JDK9. I do agree we should fix Java source code.
  > We think our byte-reverse optimization would still work on jdk8u as
  Hiroshi
  > mentioned. Would you agree on this point?

  I do, but I do not agree that this patch should necessarily be done in
  the PowerPC-specific back end.  Have you considered it as a generic
  optimization for all processors?

  Andrew.










Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

RE: Optimizing byte reverse code for int value

Doerr, Martin
In reply to this post by Andrew Haley
Hi Michihiro,

please note that I’m not a jdk8u reviewer.
However, I have taken a quick look and I have the following concerns:

1.       I think it’s incorrect for Big Endian.

2.       The pattern can also match for an unaligned 4 byte address which would break platforms like SPARC.

3.       I couldn’t see checks for shift amount and masks.

Best regards,
Martin


From: Michihiro Horie [mailto:[hidden email]]
Sent: Freitag, 21. April 2017 18:18
To: Doerr, Martin <[hidden email]>
Cc: [hidden email]; Hiroshi H Horii <[hidden email]>; [hidden email]; [hidden email]; Simonis, Volker <[hidden email]>; Gustavo Bueno Romero <[hidden email]>
Subject: RE: Optimizing byte reverse code for int value

Would you review following change for jdk8?

Webrev: http://cr.openjdk.java.net/~horii/8178294/webrev.01/

Our byte-reverse optimization now works in shared code. I tested it with jtreg on x86, ppc64, and ppc64le.
Best regards,
--
Michihiro,
IBM Research - Tokyo


----- Original message -----
From: "Doerr, Martin" <[hidden email]<mailto:[hidden email]>>
To: Michihiro Horie/Japan/IBM@IBMJP
Cc: "[hidden email]<mailto:[hidden email]>" <[hidden email]<mailto:[hidden email]>>, Hiroshi H Horii/Japan/IBM@IBMJP, "[hidden email]<mailto:[hidden email]>" <[hidden email]<mailto:[hidden email]>>, "[hidden email]<mailto:[hidden email]>" <[hidden email]<mailto:[hidden email]>>, "Simonis, Volker" <[hidden email]<mailto:[hidden email]>>
Subject: RE: Optimizing byte reverse code for int value
Date: Wed, Apr 12, 2017 12:13 AM


Hi Michihiro,



thanks for the quick reply.



I think Andrew’s idea is to optimize in the shared code instead of the platform backends. I haven’t thought about where this could be done.

Or would it be possible to backport jdk (especially Unsafe) changes? If the required changes are small enough and we don’t have to touch any public interface, this might be an option, too.



We’ll appreciate if you take care of the new match rules for PPC64. Thanks a lot.



Best regards,

Martin





From: Michihiro Horie [mailto:[hidden email]]
Sent: Dienstag, 11. April 2017 16:55
To: Doerr, Martin <[hidden email]<mailto:[hidden email]>>
Cc: [hidden email]<mailto:[hidden email]>; Hiroshi H Horii <[hidden email]<mailto:[hidden email]>>; [hidden email]<mailto:[hidden email]>; [hidden email]<mailto:[hidden email]>; Simonis, Volker <[hidden email]<mailto:[hidden email]>>
Subject: RE: Optimizing byte reverse code for int value



Andrew, Martin,



Thanks a lot for your helpful feedback.



>Have you considered it as a generic optimization for all processors?

We would support all processors for our byte-reverse optimization to make it generic.

Currently, I just finished adding match rules for little endian and big endian on PPC64, and am testing it in AIX.



>In addition, I noticed that we don't have match rules which exploit byte reverse load/store instructions on PPC64.

We would like to handle adding match rules for byte reverse load/store instructions on PPC64 for JDK10 if you would not mind.

Would it be fine with you?



Best regards,
--
Michihiro,
IBM Research - Tokyo





----- Original message -----
From: "Doerr, Martin" <[hidden email]<mailto:[hidden email]>>
To: Andrew Haley <[hidden email]<mailto:[hidden email]>>, Michihiro Horie/Japan/IBM@IBMJP
Cc: "Simonis, Volker" <[hidden email]<mailto:[hidden email]>>, "[hidden email]<mailto:[hidden email]>" <[hidden email]<mailto:[hidden email]>>, "[hidden email]<mailto:[hidden email]>" <[hidden email]<mailto:[hidden email]>>, Hiroshi H Horii/Japan/IBM@IBMJP
Subject: RE: Optimizing byte reverse code for int value
Date: Tue, Apr 11, 2017 10:44 PM


Hi Andrew,

thank you for your helpful comments. I fully agree with you.

In addition, I noticed that we don't have match rules which exploit byte reverse load/store instructions on PPC64.
SPARC already has them:
match(Set dst (ReverseBytesI/L/US/S (LoadI src)));
match(Set dst (StoreI dst (ReverseBytesI/L/US/S src)));
I think we should add them for jdk10. They should be used when the platform endianness doesn't match the bigEndian parameter in Unsafe methods.

Best regards,
Martin


-----Original Message-----
From: hotspot-dev [mailto:[hidden email]] On Behalf Of Andrew Haley
Sent: Dienstag, 11. April 2017 13:02
To: Michihiro Horie <[hidden email]<mailto:[hidden email]>>
Cc: Simonis, Volker <[hidden email]<mailto:[hidden email]>>; [hidden email]<mailto:[hidden email]>; [hidden email]<mailto:[hidden email]>; Hiroshi H Horii <[hidden email]<mailto:[hidden email]>>
Subject: Re: Optimizing byte reverse code for int value

On 11/04/17 11:36, Michihiro Horie wrote:
> Thank you very much for letting us know Unsafe.getIntUnaligned is available in
> JDK9. I do agree we should fix Java source code.
> We think our byte-reverse optimization would still work on jdk8u as Hiroshi
> mentioned. Would you agree on this point?

I do, but I do not agree that this patch should necessarily be done in
the PowerPC-specific back end.  Have you considered it as a generic
optimization for all processors?

Andrew.






Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Optimizing byte reverse code for int value

Andrew Haley
On 11/05/17 07:46, Michihiro Horie wrote:

> Thanks a lot for your helpful comments. I fixed my code.
> http://cr.openjdk.java.net/~horii/8178294/webrev.06/
>
>> @Andrew: Do you think this is the right way to do it and is there a chance
> to get it in jdk8u?
> Andrew, I would be grateful if you would approve this change for jdk8u.

The list of jdk8u reviewers is at
http://openjdk.java.net/census#jdk8u.  You'll want someone who is on
the HotSpot team.

I have mixed feelings about this patch.  It seems too specific to me:
if you had something that would work with any integer type it would be
more useful, I feel.  And - generally speaking - the rule is that
patches go into JDK 9 first, but JDK 9 is closed for enhancements.

So, I'm sorry for the bad news.  Your patch looks interesting and
useful but I do not know how to get it committed.

Andrew.




Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

RE: Optimizing byte reverse code for int value

White, Derek
Hi Michihiro,

Not a jdk8u reviewer OR C2 expert, but a possible simplification:

I think a tree like:

 //     AndI
 //      /\
// LoadB  ConI(255)

will get turned into a LoadUBNode, via AndINode::Ideal() and AndINode::Identity(). It certainly should, considering how often this code pattern is used!

If so, you should be able to simplify your pattern matching greatly.

 - Derek

-----Original Message-----
From: hotspot-dev [mailto:[hidden email]] On Behalf Of Andrew Haley
Sent: Thursday, May 11, 2017 5:02 AM
To: Michihiro Horie <[hidden email]>; Doerr, Martin <[hidden email]>
Cc: [hidden email]; [hidden email]; Hiroshi H Horii <[hidden email]>; Simonis, Volker <[hidden email]>
Subject: Re: Optimizing byte reverse code for int value

On 11/05/17 07:46, Michihiro Horie wrote:

> Thanks a lot for your helpful comments. I fixed my code.
> http://cr.openjdk.java.net/~horii/8178294/webrev.06/
>
>> @Andrew: Do you think this is the right way to do it and is there a
>> chance
> to get it in jdk8u?
> Andrew, I would be grateful if you would approve this change for jdk8u.

The list of jdk8u reviewers is at
http://openjdk.java.net/census#jdk8u.  You'll want someone who is on the HotSpot team.

I have mixed feelings about this patch.  It seems too specific to me:
if you had something that would work with any integer type it would be more useful, I feel.  And - generally speaking - the rule is that patches go into JDK 9 first, but JDK 9 is closed for enhancements.

So, I'm sorry for the bad news.  Your patch looks interesting and useful but I do not know how to get it committed.

Andrew.




Loading...