Quantcast

RFR: 8179004: Add an efficient implementation of the "count trailing zeros" operation

classic Classic list List threaded Threaded
12 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

RFR: 8179004: Add an efficient implementation of the "count trailing zeros" operation

Kim Barrett
Please review this addition of the count_trailing_zeros function.

Unfortunately, there isn't an obvious direct and portable way to write
such a function. But supported hardware architectures generally
provide an efficient implementation, e.g. a single instruction or a
short sequence.  Compilers often provide "built-in" or "intrinsic"
access to that hardware implementation, or one can use inline
assembler.

If a platform doesn't have such a built-in efficient implementation,
the function can be implemented using de Bruijn sequences as discussed
in the literature.  But all of the OpenJDK-supported platforms provide
an efficient implementation, so we aren't providing such a fallback.

As part of reviewing this change, please feel free to suggest
alternative ways to structure the code. I'm not completely happy with
the way I've done it, but alternatives I've tried seemed worse.

CR:
https://bugs.openjdk.java.net/browse/JDK-8179004

Webrev:
http://cr.openjdk.java.net/~kbarrett/8179004/hotspot.00/

Testing:
Unit test for new function.

Experimented with modifying BitMap::get_next_one_offset and the like
to replace the existing for-loop with a call to this new function, and
measured substantial speedups on all tested platforms for a test
program that counts the bits in a bitmap by iterating calls to that
search function. The speedup varies with the density of set bits, with
very sparse or nearly full being similar to the existing code, since
the bit counting is not the dominant factor in those situations. But
in between give speedups of factors of x1.5 to x5 or more, depending
on the density and the platform.

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: RFR: 8179004: Add an efficient implementation of the "count trailing zeros" operation

joe darcy
Hi Kim,

What relation does this work have to the
java.lang.Integer.numberOfTrailingZeros​() method which is marked as
@HotSpotIntrinsicCandidate?

Thanks,

-Joe


On 4/19/2017 11:17 PM, Kim Barrett wrote:

> Please review this addition of the count_trailing_zeros function.
>
> Unfortunately, there isn't an obvious direct and portable way to write
> such a function. But supported hardware architectures generally
> provide an efficient implementation, e.g. a single instruction or a
> short sequence.  Compilers often provide "built-in" or "intrinsic"
> access to that hardware implementation, or one can use inline
> assembler.
>
> If a platform doesn't have such a built-in efficient implementation,
> the function can be implemented using de Bruijn sequences as discussed
> in the literature.  But all of the OpenJDK-supported platforms provide
> an efficient implementation, so we aren't providing such a fallback.
>
> As part of reviewing this change, please feel free to suggest
> alternative ways to structure the code. I'm not completely happy with
> the way I've done it, but alternatives I've tried seemed worse.
>
> CR:
> https://bugs.openjdk.java.net/browse/JDK-8179004
>
> Webrev:
> http://cr.openjdk.java.net/~kbarrett/8179004/hotspot.00/
>
> Testing:
> Unit test for new function.
>
> Experimented with modifying BitMap::get_next_one_offset and the like
> to replace the existing for-loop with a call to this new function, and
> measured substantial speedups on all tested platforms for a test
> program that counts the bits in a bitmap by iterating calls to that
> search function. The speedup varies with the density of set bits, with
> very sparse or nearly full being similar to the existing code, since
> the bit counting is not the dominant factor in those situations. But
> in between give speedups of factors of x1.5 to x5 or more, depending
> on the density and the platform.
>

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: RFR: 8179004: Add an efficient implementation of the "count trailing zeros" operation

Kim Barrett
> On Apr 20, 2017, at 2:53 AM, joe darcy <[hidden email]> wrote:
>
> Hi Kim,
>
> What relation does this work have to the java.lang.Integer.numberOfTrailingZeros​() method which is marked as @HotSpotIntrinsicCandidate?

I’m going to guess little or none.  This change is about providing a C++ callable routine for use in within the VM.

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: RFR: 8179004: Add an efficient implementation of the "count trailing zeros" operation

Chris Plummer
In reply to this post by Kim Barrett
On 4/19/17 11:17 PM, Kim Barrett wrote:

> Please review this addition of the count_trailing_zeros function.
>
> Unfortunately, there isn't an obvious direct and portable way to write
> such a function. But supported hardware architectures generally
> provide an efficient implementation, e.g. a single instruction or a
> short sequence.  Compilers often provide "built-in" or "intrinsic"
> access to that hardware implementation, or one can use inline
> assembler.
>
> If a platform doesn't have such a built-in efficient implementation,
> the function can be implemented using de Bruijn sequences as discussed
> in the literature.  But all of the OpenJDK-supported platforms provide
> an efficient implementation, so we aren't providing such a fallback.
>
> As part of reviewing this change, please feel free to suggest
> alternative ways to structure the code. I'm not completely happy with
> the way I've done it, but alternatives I've tried seemed worse.
Are you talking about the #if/#elseif structure and the //////
delimiters. I'd suggest replacing the ////// with a block comment that
stands out. Something like:

/**************
  * GCC Support
  **************/

Otherwise it looks fine to me, but I'm no expert in this area. The
testing looks satisfactory and the bitmap results you are getting are
reason enough to add this functionality.

Chris

>
> CR:
> https://bugs.openjdk.java.net/browse/JDK-8179004
>
> Webrev:
> http://cr.openjdk.java.net/~kbarrett/8179004/hotspot.00/
>
> Testing:
> Unit test for new function.
>
> Experimented with modifying BitMap::get_next_one_offset and the like
> to replace the existing for-loop with a call to this new function, and
> measured substantial speedups on all tested platforms for a test
> program that counts the bits in a bitmap by iterating calls to that
> search function. The speedup varies with the density of set bits, with
> very sparse or nearly full being similar to the existing code, since
> the bit counting is not the dominant factor in those situations. But
> in between give speedups of factors of x1.5 to x5 or more, depending
> on the density and the platform.
>

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: RFR: 8179004: Add an efficient implementation of the "count trailing zeros" operation

Kim Barrett
> On Apr 21, 2017, at 1:54 AM, Chris Plummer <[hidden email]> wrote:
>
> On 4/19/17 11:17 PM, Kim Barrett wrote:
>> Please review this addition of the count_trailing_zeros function.
>>
>> […]
>> As part of reviewing this change, please feel free to suggest
>> alternative ways to structure the code. I'm not completely happy with
>> the way I've done it, but alternatives I've tried seemed worse.
> Are you talking about the #if/#elseif structure and the ////// delimiters. I'd suggest replacing the ////// with a block comment that stands out. Something like:
>
> /**************
> * GCC Support
> **************/

Sure, that makes things stand out a bit more.

Updated webrev (only comment changes in count_trailing_zeros.hpp)
http://cr.openjdk.java.net/~kbarrett/8179004/hotspot.01/

What I'm looking for is whether this TARGET_COMPILER dispatch with the
corresponding code is okay, or whether some other structure would be
preferred.

Some alternatives I considered were:

* Always #include OS_CPU_INCLUDE(count_trailing_zeros). But that
proliforates files which will be duplicates (possibly duplicates
by #include of some shared toolchain-related include).

* Dispatch to #include "utilities/count_trailing_zeros_<tool>.hpp"
similarly to what globalDefinitions.hpp does.  This adds 4 small
files, which seems a little excessive.

* Put these toolchain-specific definitions in the corresponding
globalDefintions_<tool>.hpp. But the #include of those files by
globalDefinitions.hpp happens very early in that file, before things
like the uintx type are defined. Also, presently debug.hpp can't
be #include'd by globalDefinitions.hpp, so the latter does similar
things "by hand". (I think that's something that ought to be fixed.)

> Otherwise it looks fine to me, but I'm no expert in this area. The testing looks satisfactory and the bitmap results you are getting are reason enough to add this functionality.

Thanks.  Do you want to be counted as a reviewer?

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: RFR: 8179004: Add an efficient implementation of the "count trailing zeros" operation

Chris Plummer
On 4/21/17 10:41 AM, Kim Barrett wrote:

>> On Apr 21, 2017, at 1:54 AM, Chris Plummer <[hidden email]> wrote:
>>
>> On 4/19/17 11:17 PM, Kim Barrett wrote:
>>> Please review this addition of the count_trailing_zeros function.
>>>
>>> […]
>>> As part of reviewing this change, please feel free to suggest
>>> alternative ways to structure the code. I'm not completely happy with
>>> the way I've done it, but alternatives I've tried seemed worse.
>> Are you talking about the #if/#elseif structure and the ////// delimiters. I'd suggest replacing the ////// with a block comment that stands out. Something like:
>>
>> /**************
>> * GCC Support
>> **************/
> Sure, that makes things stand out a bit more.
>
> Updated webrev (only comment changes in count_trailing_zeros.hpp)
> http://cr.openjdk.java.net/~kbarrett/8179004/hotspot.01/
>
> What I'm looking for is whether this TARGET_COMPILER dispatch with the
> corresponding code is okay, or whether some other structure would be
> preferred.
>
> Some alternatives I considered were:
>
> * Always #include OS_CPU_INCLUDE(count_trailing_zeros). But that
> proliforates files which will be duplicates (possibly duplicates
> by #include of some shared toolchain-related include).
>
> * Dispatch to #include "utilities/count_trailing_zeros_<tool>.hpp"
> similarly to what globalDefinitions.hpp does.  This adds 4 small
> files, which seems a little excessive.
globalDefinitions.hpp seems to be the only other case of using <tool> in
the source. For globalDefinitions.hpp, there is a fair amount of code in
each globalDefinitions_<tool>.hpp header file, so I think in this case
it does help organize the source and adds to readability. In your case
we aren't talking about much code, and it may do more harm than good to
separate the various impls out this way.
>
> * Put these toolchain-specific definitions in the corresponding
> globalDefintions_<tool>.hpp. But the #include of those files by
> globalDefinitions.hpp happens very early in that file, before things
> like the uintx type are defined. Also, presently debug.hpp can't
> be #include'd by globalDefinitions.hpp, so the latter does similar
> things "by hand". (I think that's something that ought to be fixed.)
I also had this thought before reading your comment above. If your code
was all toolchain independent, would you still consider putting it in
globalDefintions.hpp? If not, I don't necessarily think that it makes
sense to put it there just to piggyback on globalDefinitions.hpp the
<tool> logic. I skimmed through globalDefintions.hpp and couldn't get
enough a good feel for whether or not your API belongs there.
>
>> Otherwise it looks fine to me, but I'm no expert in this area. The testing looks satisfactory and the bitmap results you are getting are reason enough to add this functionality.
> Thanks.  Do you want to be counted as a reviewer?
Sure. FYI I'm not a (R)eviewer.

Chris
>


Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: RFR: 8179004: Add an efficient implementation of the "count trailing zeros" operation

Kim Barrett
> On Apr 21, 2017, at 2:21 PM, Chris Plummer <[hidden email]> wrote:
>
> On 4/21/17 10:41 AM, Kim Barrett wrote:
>> * Put these toolchain-specific definitions in the corresponding
>> globalDefintions_<tool>.hpp. But the #include of those files by
>> globalDefinitions.hpp happens very early in that file, before things
>> like the uintx type are defined. Also, presently debug.hpp can't
>> be #include'd by globalDefinitions.hpp, so the latter does similar
>> things "by hand". (I think that's something that ought to be fixed.)
> I also had this thought before reading your comment above. If your code was all toolchain independent, would you still consider putting it in globalDefintions.hpp? If not, I don't necessarily think that it makes sense to put it there just to piggyback on globalDefinitions.hpp the <tool> logic. I skimmed through globalDefintions.hpp and couldn't get enough a good feel for whether or not your API belongs there.

globalDefinitions.hpp contains a lot of little utility functions, and
this one (and some related ones that I'm not (yet) proposing) fall
into that category.  So I think it fits.  OTOH, I pretty strongly
dislike files like globalDefinitions.hpp.  If it were up to me, that
file would get some significant refactoring, with several new files
being pulled out of it.

But the noted structural issues make this function not fit in there
very nicely at present.

>>> Otherwise it looks fine to me, but I'm no expert in this area. The testing looks satisfactory and the bitmap results you are getting are reason enough to add this functionality.
>> Thanks.  Do you want to be counted as a reviewer?
> Sure. FYI I'm not a (R)eviewer.

OK.  Thanks for the review.

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: RFR: 8179004: Add an efficient implementation of the "count trailing zeros" operation

Volker Simonis
In reply to this post by Kim Barrett
Hi Kim,

thanks for implementing the AIX part of the change!
I've compiled and tested it and it works fine.
Although we currently have no 32-bit port on AIX I think you could
leave the 32-bit variant in for reference (and to help the brave
developer who starts the 32-bit version of the AIX port :)

Thank you and best regards,
Volker


On Fri, Apr 21, 2017 at 7:41 PM, Kim Barrett <[hidden email]> wrote:

>> On Apr 21, 2017, at 1:54 AM, Chris Plummer <[hidden email]> wrote:
>>
>> On 4/19/17 11:17 PM, Kim Barrett wrote:
>>> Please review this addition of the count_trailing_zeros function.
>>>
>>> […]
>>> As part of reviewing this change, please feel free to suggest
>>> alternative ways to structure the code. I'm not completely happy with
>>> the way I've done it, but alternatives I've tried seemed worse.
>> Are you talking about the #if/#elseif structure and the ////// delimiters. I'd suggest replacing the ////// with a block comment that stands out. Something like:
>>
>> /**************
>> * GCC Support
>> **************/
>
> Sure, that makes things stand out a bit more.
>
> Updated webrev (only comment changes in count_trailing_zeros.hpp)
> http://cr.openjdk.java.net/~kbarrett/8179004/hotspot.01/
>
> What I'm looking for is whether this TARGET_COMPILER dispatch with the
> corresponding code is okay, or whether some other structure would be
> preferred.
>
> Some alternatives I considered were:
>
> * Always #include OS_CPU_INCLUDE(count_trailing_zeros). But that
> proliforates files which will be duplicates (possibly duplicates
> by #include of some shared toolchain-related include).
>
> * Dispatch to #include "utilities/count_trailing_zeros_<tool>.hpp"
> similarly to what globalDefinitions.hpp does.  This adds 4 small
> files, which seems a little excessive.
>
> * Put these toolchain-specific definitions in the corresponding
> globalDefintions_<tool>.hpp. But the #include of those files by
> globalDefinitions.hpp happens very early in that file, before things
> like the uintx type are defined. Also, presently debug.hpp can't
> be #include'd by globalDefinitions.hpp, so the latter does similar
> things "by hand". (I think that's something that ought to be fixed.)
>
>> Otherwise it looks fine to me, but I'm no expert in this area. The testing looks satisfactory and the bitmap results you are getting are reason enough to add this functionality.
>
> Thanks.  Do you want to be counted as a reviewer?
>
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: RFR: 8179004: Add an efficient implementation of the "count trailing zeros" operation

Kim Barrett
> On Apr 25, 2017, at 8:37 AM, Volker Simonis <[hidden email]> wrote:
>
> Hi Kim,
>
> thanks for implementing the AIX part of the change!
> I've compiled and tested it and it works fine.
> Although we currently have no 32-bit port on AIX I think you could
> leave the 32-bit variant in for reference (and to help the brave
> developer who starts the 32-bit version of the AIX port :)
>
> Thank you and best regards,
> Volker
>

Thanks.

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: RFR: 8179004: Add an efficient implementation of the "count trailing zeros" operation

Kim Barrett
In reply to this post by Kim Barrett
Still looking for a Reviewer.

> On Apr 20, 2017, at 2:17 AM, Kim Barrett <[hidden email]> wrote:
>
> Please review this addition of the count_trailing_zeros function.
>
> Unfortunately, there isn't an obvious direct and portable way to write
> such a function. But supported hardware architectures generally
> provide an efficient implementation, e.g. a single instruction or a
> short sequence.  Compilers often provide "built-in" or "intrinsic"
> access to that hardware implementation, or one can use inline
> assembler.
>
> If a platform doesn't have such a built-in efficient implementation,
> the function can be implemented using de Bruijn sequences as discussed
> in the literature.  But all of the OpenJDK-supported platforms provide
> an efficient implementation, so we aren't providing such a fallback.
>
> As part of reviewing this change, please feel free to suggest
> alternative ways to structure the code. I'm not completely happy with
> the way I've done it, but alternatives I've tried seemed worse.
>
> CR:
> https://bugs.openjdk.java.net/browse/JDK-8179004
>
> Webrev:
> http://cr.openjdk.java.net/~kbarrett/8179004/hotspot.00/
>
> Testing:
> Unit test for new function.
>
> Experimented with modifying BitMap::get_next_one_offset and the like
> to replace the existing for-loop with a call to this new function, and
> measured substantial speedups on all tested platforms for a test
> program that counts the bits in a bitmap by iterating calls to that
> search function. The speedup varies with the density of set bits, with
> very sparse or nearly full being similar to the existing code, since
> the bit counting is not the dominant factor in those situations. But
> in between give speedups of factors of x1.5 to x5 or more, depending
> on the density and the platform.


Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: RFR: 8179004: Add an efficient implementation of the "count trailing zeros" operation

David Holmes
On 5/05/2017 3:43 PM, Kim Barrett wrote:
> Still looking for a Reviewer.

Reviewed.

I don't see an obviously better way to deal with the structure.

Thanks,
David

>> On Apr 20, 2017, at 2:17 AM, Kim Barrett <[hidden email]> wrote:
>>
>> Please review this addition of the count_trailing_zeros function.
>>
>> Unfortunately, there isn't an obvious direct and portable way to write
>> such a function. But supported hardware architectures generally
>> provide an efficient implementation, e.g. a single instruction or a
>> short sequence.  Compilers often provide "built-in" or "intrinsic"
>> access to that hardware implementation, or one can use inline
>> assembler.
>>
>> If a platform doesn't have such a built-in efficient implementation,
>> the function can be implemented using de Bruijn sequences as discussed
>> in the literature.  But all of the OpenJDK-supported platforms provide
>> an efficient implementation, so we aren't providing such a fallback.
>>
>> As part of reviewing this change, please feel free to suggest
>> alternative ways to structure the code. I'm not completely happy with
>> the way I've done it, but alternatives I've tried seemed worse.
>>
>> CR:
>> https://bugs.openjdk.java.net/browse/JDK-8179004
>>
>> Webrev:
>> http://cr.openjdk.java.net/~kbarrett/8179004/hotspot.00/
>>
>> Testing:
>> Unit test for new function.
>>
>> Experimented with modifying BitMap::get_next_one_offset and the like
>> to replace the existing for-loop with a call to this new function, and
>> measured substantial speedups on all tested platforms for a test
>> program that counts the bits in a bitmap by iterating calls to that
>> search function. The speedup varies with the density of set bits, with
>> very sparse or nearly full being similar to the existing code, since
>> the bit counting is not the dominant factor in those situations. But
>> in between give speedups of factors of x1.5 to x5 or more, depending
>> on the density and the platform.
>
>
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: RFR: 8179004: Add an efficient implementation of the "count trailing zeros" operation

Kim Barrett
> On May 5, 2017, at 4:02 AM, David Holmes <[hidden email]> wrote:
>
> On 5/05/2017 3:43 PM, Kim Barrett wrote:
>> Still looking for a Reviewer.
>
> Reviewed.
>
> I don't see an obviously better way to deal with the structure.

Thanks.

Loading...