Faster Math ?

classic Classic list List threaded Threaded
12 messages Options
Reply | Threaded
Open this post in threaded view
|

Faster Math ?

Laurent Bourgès
Hi,

The Marlin renderer (JEP265) uses few Math functions: sqrt, cbrt, acos...

Could you check if the current JDK uses C2 intrinsics or libfdm (native /
JNI overhead?) and tell me if such functions are already highly optimized
in jdk9 or 10 ?

Some people have implemented their own fast Math like Apache Commons Math
or JaFaMa libraries that are 10x faster for acos / cbrt.

I wonder if I should implement my own cbrt function (cubics) in pure java
as I do not need the highest accuracy but SPEED.

Would it sound possible to have a JDK FastMath public API (lots faster but
less accurate?)

Do you know if recent CPU (intel?) have dedicated instructions for such
math operations ?
Why not use it instead?
Maybe that's part of the new Vectorization API (panama) ?

Cheers,
Laurent Bourges
Reply | Threaded
Open this post in threaded view
|

Re: Faster Math ?

Jonas Konrad
Hey,

Most functions in the Math class are intrinsic (
http://hg.openjdk.java.net/jdk8u/jdk8u/hotspot/file/tip/src/share/vm/classfile/vmSymbols.hpp#l664 
) and will use native instructions where available. You can also test
this yourself using jitwatch. There is no native call overhead.

The standard library does not currently include less accurate but faster
Math functions, maybe someone else can answer if that is something to be
considered.

- Jonas Konrad

On 11/09/2017 10:00 AM, Laurent Bourgès wrote:

> Hi,
>
> The Marlin renderer (JEP265) uses few Math functions: sqrt, cbrt, acos...
>
> Could you check if the current JDK uses C2 intrinsics or libfdm (native /
> JNI overhead?) and tell me if such functions are already highly optimized
> in jdk9 or 10 ?
>
> Some people have implemented their own fast Math like Apache Commons Math
> or JaFaMa libraries that are 10x faster for acos / cbrt.
>
> I wonder if I should implement my own cbrt function (cubics) in pure java
> as I do not need the highest accuracy but SPEED.
>
> Would it sound possible to have a JDK FastMath public API (lots faster but
> less accurate?)
>
> Do you know if recent CPU (intel?) have dedicated instructions for such
> math operations ?
> Why not use it instead?
> Maybe that's part of the new Vectorization API (panama) ?
>
> Cheers,
> Laurent Bourges
>
Reply | Threaded
Open this post in threaded view
|

Re: Faster Math ?

Laurent Bourgès
I checked in the latest jdk master and both cbrt / acos are NOT intrinsics.

However, cbrt(x) = pow(x, 1/3) so it may be optmized...

Could someone tell me how cbrt() is concretely implemented ?

In native libfdm, there is no e_cbrt.c !

Thanks for your help,
Laurent

Le 9 nov. 2017 10:52 AM, "Jonas Konrad" <[hidden email]> a écrit :

> Hey,
>
> Most functions in the Math class are intrinsic (
> http://hg.openjdk.java.net/jdk8u/jdk8u/hotspot/file/tip/src/
> share/vm/classfile/vmSymbols.hpp#l664 ) and will use native instructions
> where available. You can also test this yourself using jitwatch. There is
> no native call overhead.
>
> The standard library does not currently include less accurate but faster
> Math functions, maybe someone else can answer if that is something to be
> considered.
>
> - Jonas Konrad
>
> On 11/09/2017 10:00 AM, Laurent Bourgès wrote:
>
>> Hi,
>>
>> The Marlin renderer (JEP265) uses few Math functions: sqrt, cbrt, acos...
>>
>> Could you check if the current JDK uses C2 intrinsics or libfdm (native /
>> JNI overhead?) and tell me if such functions are already highly optimized
>> in jdk9 or 10 ?
>>
>> Some people have implemented their own fast Math like Apache Commons Math
>> or JaFaMa libraries that are 10x faster for acos / cbrt.
>>
>> I wonder if I should implement my own cbrt function (cubics) in pure java
>> as I do not need the highest accuracy but SPEED.
>>
>> Would it sound possible to have a JDK FastMath public API (lots faster but
>> less accurate?)
>>
>> Do you know if recent CPU (intel?) have dedicated instructions for such
>> math operations ?
>> Why not use it instead?
>> Maybe that's part of the new Vectorization API (panama) ?
>>
>> Cheers,
>> Laurent Bourges
>>
>>
Reply | Threaded
Open this post in threaded view
|

Re: Faster Math ?

Laurent Bourgès
Hi,

Here are very basic benchmark results from (JaFaMa 2 - FastMathPerf) made
on my laptop (i7-6820HQ set @ 2Ghz + JDK8):
--- testing asin(double) ---
Loop on     Math.asin(double) took 6.675 s
Loop on FastMath.asin(double) took 0.162 s

--- testing acos(double) ---
Loop on     Math.acos(double) took 6.332 s
Loop on FastMath.acos(double) took 0.16 s

--- testing atan(double) ---
Loop on     Math.atan(double) took 0.766 s
Loop on FastMath.atan(double) took 0.167

--- testing sqrt(double) ---
Loop on     Math.sqrt(double), args in [0.0,10.0], took 0.095 s
Loop on FastMath.sqrt(double), args in [0.0,10.0], took 0.097 s
Loop on     Math.sqrt(double), args in [0.0,1.0E12], took 0.109 s
Loop on FastMath.sqrt(double), args in [0.0,1.0E12], took 0.093 s
Loop on     Math.sqrt(double), args in all magnitudes (>=0), took 0.091 s
Loop on FastMath.sqrt(double), args in all magnitudes (>=0), took 0.092

--- testing cbrt(double) ---
Loop on     Math.cbrt(double), args in [-10.0,10.0], took 1.152 s
Loop on FastMath.cbrt(double), args in [-10.0,10.0], took 0.195 s
Loop on     Math.cbrt(double), args in [-1.0E12,1.0E12], took 1.153 s
Loop on FastMath.cbrt(double), args in [-1.0E12,1.0E12], took 0.193 s
Loop on     Math.cbrt(double), args in all magnitudes, took 1.154 s
Loop on FastMath.cbrt(double), args in all magnitudes, took 0.272

--- testing cbrt(double) = pow(double, 1/3) ---
Loop on     Math.pow(double, 1/3), args in [-10.0,10.0], took 0.739 s
Loop on FastMath.cbrt(double), args in [-10.0,10.0], took 0.166 s
Loop on     Math.pow(double, 1/3), args in [-0.7,0.7], took 0.746 s
Loop on FastMath.cbrt(double), args in [-0.7,0.7], took 0.166 s
Loop on     Math.pow(double, 1/3), args in [-0.1,0.1], took 0.742 s
Loop on FastMath.cbrt(double), args in [-0.1,0.1], took 0.165 s
Loop on     Math.pow(double, 1/3), args in all magnitudes, took 0.753 s
Loop on FastMath.cbrt(double), args in all magnitudes, took 0.244

Conclusion:
- acos / asin / atan functions are quite slow: it confirms these are not
optimized by hotspot intrinsics.

- cbrt() is slower than sqrt() : 1.1s vs 0.1 => 10x slower
- cbrt() is slower than pow(1/3) : 1.1s vs 0.7s => 50% slower

Any plan to enhance these specific math operations ?

Laurent


2017-11-09 14:33 GMT+01:00 Laurent Bourgès <[hidden email]>:

> I checked in the latest jdk master and both cbrt / acos are NOT intrinsics.
>
> However, cbrt(x) = pow(x, 1/3) so it may be optmized...
>
> Could someone tell me how cbrt() is concretely implemented ?
>
> In native libfdm, there is no e_cbrt.c !
>
> Thanks for your help,
> Laurent
>
> Le 9 nov. 2017 10:52 AM, "Jonas Konrad" <[hidden email]> a écrit :
>
>> Hey,
>>
>> Most functions in the Math class are intrinsic (
>> http://hg.openjdk.java.net/jdk8u/jdk8u/hotspot/file/tip/src/
>> share/vm/classfile/vmSymbols.hpp#l664 ) and will use native instructions
>> where available. You can also test this yourself using jitwatch. There is
>> no native call overhead.
>>
>> The standard library does not currently include less accurate but faster
>> Math functions, maybe someone else can answer if that is something to be
>> considered.
>>
>> - Jonas Konrad
>>
>> On 11/09/2017 10:00 AM, Laurent Bourgès wrote:
>>
>>> Hi,
>>>
>>> The Marlin renderer (JEP265) uses few Math functions: sqrt, cbrt, acos...
>>>
>>> Could you check if the current JDK uses C2 intrinsics or libfdm (native /
>>> JNI overhead?) and tell me if such functions are already highly optimized
>>> in jdk9 or 10 ?
>>>
>>> Some people have implemented their own fast Math like Apache Commons Math
>>> or JaFaMa libraries that are 10x faster for acos / cbrt.
>>>
>>> I wonder if I should implement my own cbrt function (cubics) in pure java
>>> as I do not need the highest accuracy but SPEED.
>>>
>>> Would it sound possible to have a JDK FastMath public API (lots faster
>>> but
>>> less accurate?)
>>>
>>> Do you know if recent CPU (intel?) have dedicated instructions for such
>>> math operations ?
>>> Why not use it instead?
>>> Maybe that's part of the new Vectorization API (panama) ?
>>>
>>> Cheers,
>>> Laurent Bourges
>>>
>>>


--
--
Laurent Bourgès
Reply | Threaded
Open this post in threaded view
|

Re: Faster Math ?

Andrew Haley
In reply to this post by Laurent Bourgès
On 09/11/17 13:33, Laurent Bourgès wrote:
> I checked in the latest jdk master and both cbrt / acos are NOT intrinsics.
>
> However, cbrt(x) = pow(x, 1/3) so it may be optmized...
>
> Could someone tell me how cbrt() is concretely implemented ?

It's in FdLibm.java.  It's not great, but it's better than it used to be
now that it's not a native call.  I'm seeing that it's twice as fast as
the previous native implementation.

> In native libfdm, there is no e_cbrt.c !

--
Andrew Haley
Java Platform Lead Engineer
Red Hat UK Ltd. <https://www.redhat.com>
EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671
Reply | Threaded
Open this post in threaded view
|

Re: Faster Math ?

Andrew Haley
In reply to this post by Laurent Bourgès
On 09/11/17 15:02, Laurent Bourgès wrote:

> --- testing cbrt(double) = pow(double, 1/3) ---
> Loop on     Math.pow(double, 1/3), args in [-10.0,10.0], took 0.739 s
> Loop on FastMath.cbrt(double), args in [-10.0,10.0], took 0.166 s
> Loop on     Math.pow(double, 1/3), args in [-0.7,0.7], took 0.746 s
> Loop on FastMath.cbrt(double), args in [-0.7,0.7], took 0.166 s
> Loop on     Math.pow(double, 1/3), args in [-0.1,0.1], took 0.742 s
> Loop on FastMath.cbrt(double), args in [-0.1,0.1], took 0.165 s
> Loop on     Math.pow(double, 1/3), args in all magnitudes, took 0.753 s
> Loop on FastMath.cbrt(double), args in all magnitudes, took 0.244
>
> Conclusion:
> - acos / asin / atan functions are quite slow: it confirms these are not
> optimized by hotspot intrinsics.
>
> - cbrt() is slower than sqrt() : 1.1s vs 0.1 => 10x slower
> - cbrt() is slower than pow(1/3) : 1.1s vs 0.7s => 50% slower

No.  cbrt() is faster than pow(1/3) : 0.24 vs 0.75

--
Andrew Haley
Java Platform Lead Engineer
Red Hat UK Ltd. <https://www.redhat.com>
EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671
Reply | Threaded
Open this post in threaded view
|

Re: Faster Math ?

Paul Sandoz
In reply to this post by Laurent Bourgès
Hi Laurent,

A Java method is a candidate for intrinsification if it is annotated with @HotSpotIntrinsicCandidate. When running Java code you can also use the HotSpot flags -XX:+PrintCompilarion -XX:+PrintInlining to show methods that are intrinsic (JIT watch, as mentioned, is also excellent in this regard).

I recommend cloning OpenJDK and browsing the source.

Some of the math functions are intrinsic in the interpreter and all the runtime compilers to ensure consistent results across interpretation and compilation.

Work was done by Intel to improve many of the math functions. See:

  Update for x86 sin and cos in the math lib
  https://bugs.openjdk.java.net/browse/JDK-8143353

  Update for x86 pow in the math lib
  https://bugs.openjdk.java.net/browse/JDK-8145688
 
  (From these you can track related issues.)

Other Math functions are not intrinsic like cbrt (non-native) and acos (native). There is ongoing work to turn native implementations into Java implementations (i don’t know if there would be any follow up on intrinsification).

  https://bugs.openjdk.java.net/browse/JDK-8134780
  https://bugs.openjdk.java.net/browse/JDK-8171407

Joe knows more.



As part of the Vector API effort we will likely need to investigate the support for less accurate but faster math functions. It’s too early to tell if something like a FastMath class will pop out of that, but FWIW i am sympathetic to that :-)

I liked this tweet:

https://twitter.com/FioraAeterna/status/926150700836405248

  life as a gpu compiler dev is basically just fielding repeated complaints that
  "fast math" isn't precise and "precise math" isn't fast

as an indication of what we could be getting into :-)

Paul.

> On 9 Nov 2017, at 01:00, Laurent Bourgès <[hidden email]> wrote:
>
> Hi,
>
> The Marlin renderer (JEP265) uses few Math functions: sqrt, cbrt, acos...
>
> Could you check if the current JDK uses C2 intrinsics or libfdm (native /
> JNI overhead?) and tell me if such functions are already highly optimized
> in jdk9 or 10 ?
>
> Some people have implemented their own fast Math like Apache Commons Math
> or JaFaMa libraries that are 10x faster for acos / cbrt.
>
> I wonder if I should implement my own cbrt function (cubics) in pure java
> as I do not need the highest accuracy but SPEED.
>
> Would it sound possible to have a JDK FastMath public API (lots faster but
> less accurate?)
>
> Do you know if recent CPU (intel?) have dedicated instructions for such
> math operations ?
> Why not use it instead?
> Maybe that's part of the new Vectorization API (panama) ?
>
> Cheers,
> Laurent Bourges

Reply | Threaded
Open this post in threaded view
|

Re: Faster Math ?

Laurent Bourgès
In reply to this post by Andrew Haley
Thanks, andrew.

I searched on the web and I understand now:
Fdlibm native library has been ported in Java code for jdk9 (like the
jafama library).

Cbrt changeset:
http://hg.openjdk.java.net/jdk9/jdk9/jdk/rev/7dc9726cfa82

I will anyway compare jdk9 with latest jafama 2.2 to have an up-to-date
comparison.

Does somebody else need a faster but less accurate math library ?
JaFaMa has such alternative methods...

Preserving double-precision may be very costly in terms of performance and
sometimes such accuracy is not required.

Last question:
Is there a sin & cos function returning both values for the same angle ?
It is very useful to compute exact fourrier transform...
It is called sinAndCos(double wrappers) in jafama.

Cheers,
Laurent

Le 9 nov. 2017 17:08, "Andrew Haley" <[hidden email]> a écrit :

> On 09/11/17 15:02, Laurent Bourgès wrote:
> > --- testing cbrt(double) = pow(double, 1/3) ---
> > Loop on     Math.pow(double, 1/3), args in [-10.0,10.0], took 0.739 s
> > Loop on FastMath.cbrt(double), args in [-10.0,10.0], took 0.166 s
> > Loop on     Math.pow(double, 1/3), args in [-0.7,0.7], took 0.746 s
> > Loop on FastMath.cbrt(double), args in [-0.7,0.7], took 0.166 s
> > Loop on     Math.pow(double, 1/3), args in [-0.1,0.1], took 0.742 s
> > Loop on FastMath.cbrt(double), args in [-0.1,0.1], took 0.165 s
> > Loop on     Math.pow(double, 1/3), args in all magnitudes, took 0.753 s
> > Loop on FastMath.cbrt(double), args in all magnitudes, took 0.244
> >
> > Conclusion:
> > - acos / asin / atan functions are quite slow: it confirms these are not
> > optimized by hotspot intrinsics.
> >
> > - cbrt() is slower than sqrt() : 1.1s vs 0.1 => 10x slower
> > - cbrt() is slower than pow(1/3) : 1.1s vs 0.7s => 50% slower
>
> No.  cbrt() is faster than pow(1/3) : 0.24 vs 0.75
>
> --
> Andrew Haley
> Java Platform Lead Engineer
> Red Hat UK Ltd. <https://www.redhat.com>
> EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671
>
Reply | Threaded
Open this post in threaded view
|

Re: Faster Math ?

Laurent Bourgès
In reply to this post by Paul Sandoz
Hi Paul,

Thank you so much for this complete summary !

I will still perform some benchmarks and could port acos native code into
java code as it is used by marlin.

Anyway I will backport the Cbrt java code into Marlin @ github for JDK8
users (GPL v2).

Thanks,
Laurent

Le 9 nov. 2017 18:19, "Paul Sandoz" <[hidden email]> a écrit :

> Hi Laurent,
>
> A Java method is a candidate for intrinsification if it is annotated with
> @HotSpotIntrinsicCandidate. When running Java code you can also use the
> HotSpot flags -XX:+PrintCompilarion -XX:+PrintInlining to show methods that
> are intrinsic (JIT watch, as mentioned, is also excellent in this regard).
>
> I recommend cloning OpenJDK and browsing the source.
>
> Some of the math functions are intrinsic in the interpreter and all the
> runtime compilers to ensure consistent results across interpretation and
> compilation.
>
> Work was done by Intel to improve many of the math functions. See:
>
>   Update for x86 sin and cos in the math lib
>   https://bugs.openjdk.java.net/browse/JDK-8143353
>
>   Update for x86 pow in the math lib
>   https://bugs.openjdk.java.net/browse/JDK-8145688
>
>   (From these you can track related issues.)
>
> Other Math functions are not intrinsic like cbrt (non-native) and acos
> (native). There is ongoing work to turn native implementations into Java
> implementations (i don’t know if there would be any follow up on
> intrinsification).
>
>   https://bugs.openjdk.java.net/browse/JDK-8134780
>   https://bugs.openjdk.java.net/browse/JDK-8171407
>
> Joe knows more.
>
> —
>
> As part of the Vector API effort we will likely need to investigate the
> support for less accurate but faster math functions. It’s too early to tell
> if something like a FastMath class will pop out of that, but FWIW i am
> sympathetic to that :-)
>
> I liked this tweet:
>
> https://twitter.com/FioraAeterna/status/926150700836405248
>
>   life as a gpu compiler dev is basically just fielding repeated
> complaints that
>   "fast math" isn't precise and "precise math" isn't fast
>
> as an indication of what we could be getting into :-)
>
> Paul.
>
> > On 9 Nov 2017, at 01:00, Laurent Bourgès <[hidden email]>
> wrote:
> >
> > Hi,
> >
> > The Marlin renderer (JEP265) uses few Math functions: sqrt, cbrt, acos...
> >
> > Could you check if the current JDK uses C2 intrinsics or libfdm (native /
> > JNI overhead?) and tell me if such functions are already highly optimized
> > in jdk9 or 10 ?
> >
> > Some people have implemented their own fast Math like Apache Commons Math
> > or JaFaMa libraries that are 10x faster for acos / cbrt.
> >
> > I wonder if I should implement my own cbrt function (cubics) in pure java
> > as I do not need the highest accuracy but SPEED.
> >
> > Would it sound possible to have a JDK FastMath public API (lots faster
> but
> > less accurate?)
> >
> > Do you know if recent CPU (intel?) have dedicated instructions for such
> > math operations ?
> > Why not use it instead?
> > Maybe that's part of the new Vectorization API (panama) ?
> >
> > Cheers,
> > Laurent Bourges
>
>
Reply | Threaded
Open this post in threaded view
|

Re: Faster Math ?

joe darcy
In reply to this post by Paul Sandoz
Hello,

A few comments on this thread:

As Paul noted, a portion of fdlibm has been ported from C to Java. I do
intend to finish the port at some point. The port gives an
implementation speedup by avoiding Java -> C -> Java transition
overheads. However, the same algorithms are being used of course.

The fdlibm code was first written several decades ago and there has been
work in the interim on developing other algorithms for math libraries.
One significant effort has focused on correctly rounded libraries, that
is, libraries that have full floating-point accuracy. In particular
Jean-Michel Muller and his students and collaborators have worked in
this area and produce the crlibm package. If a specification for a
StrictMath-style class were newly written today, I would recommend it be
specified to be correctly rounded. Correct rounding is conceptually the
"best" answer and it does not require the exact implementation
algorithms to be specified to achieve reproducibility, unlike fdlibm.

However, the extra precise answer can come at the cost of extra time or
space for the computation in some cases.

The notion of a "FastMath" library has been considered before (as well
as the faster underlying numerics [1]). As also discussed earlier in the
thread, specifying what degrees of inaccuracy is acceptable for what
speed is non-obvious. (And offhand I don't know the error bounds of the
other implementations being discussed.)

Working with Intel in OpenJDK, we are using optimized math library
implementations for x64 for many interesting methods. For most math
library methods, the trend has been to move to software-based
implementations rather than having specialized hardware instructions.
(Functionality like reciprocal square root is a counter-example, but we
don't have that method in the Java math library.)

Note that since 1/3 is a repeating fraction in binary and decimal,
pow(x, 1.0/3.0) is only approximately equivalent to cbrt(x).

Knowing which particular methods would be of interest for fast-but-loose
math would be helpful. The sqrt method has long been intrinsified to the
corresponding hardware instruction on many platforms so I don't think
that would be a useful candidate in most circumstances.

In short, we might get a selection of looser but faster math methods at
some point, but not immediately and not without more investigation.

Cheers,

-Joe

[1] Forward looking statements during "Forward to the Past: The Case for
Uniformly Strict Floating Point Arithmetic on the JVM"
https://youtu.be/qTKeU_3rhk4?t=2513


On 11/9/2017 9:19 AM, Paul Sandoz wrote:

> Hi Laurent,
>
> A Java method is a candidate for intrinsification if it is annotated with @HotSpotIntrinsicCandidate. When running Java code you can also use the HotSpot flags -XX:+PrintCompilarion -XX:+PrintInlining to show methods that are intrinsic (JIT watch, as mentioned, is also excellent in this regard).
>
> I recommend cloning OpenJDK and browsing the source.
>
> Some of the math functions are intrinsic in the interpreter and all the runtime compilers to ensure consistent results across interpretation and compilation.
>
> Work was done by Intel to improve many of the math functions. See:
>
>    Update for x86 sin and cos in the math lib
>    https://bugs.openjdk.java.net/browse/JDK-8143353
>
>    Update for x86 pow in the math lib
>    https://bugs.openjdk.java.net/browse/JDK-8145688
>    
>    (From these you can track related issues.)
>
> Other Math functions are not intrinsic like cbrt (non-native) and acos (native). There is ongoing work to turn native implementations into Java implementations (i don’t know if there would be any follow up on intrinsification).
>
>    https://bugs.openjdk.java.net/browse/JDK-8134780
>    https://bugs.openjdk.java.net/browse/JDK-8171407
>
> Joe knows more.
>
> —
>
> As part of the Vector API effort we will likely need to investigate the support for less accurate but faster math functions. It’s too early to tell if something like a FastMath class will pop out of that, but FWIW i am sympathetic to that :-)
>
> I liked this tweet:
>
> https://twitter.com/FioraAeterna/status/926150700836405248
>
>    life as a gpu compiler dev is basically just fielding repeated complaints that
>    "fast math" isn't precise and "precise math" isn't fast
>
> as an indication of what we could be getting into :-)
>
> Paul.
>
>> On 9 Nov 2017, at 01:00, Laurent Bourgès <[hidden email]> wrote:
>>
>> Hi,
>>
>> The Marlin renderer (JEP265) uses few Math functions: sqrt, cbrt, acos...
>>
>> Could you check if the current JDK uses C2 intrinsics or libfdm (native /
>> JNI overhead?) and tell me if such functions are already highly optimized
>> in jdk9 or 10 ?
>>
>> Some people have implemented their own fast Math like Apache Commons Math
>> or JaFaMa libraries that are 10x faster for acos / cbrt.
>>
>> I wonder if I should implement my own cbrt function (cubics) in pure java
>> as I do not need the highest accuracy but SPEED.
>>
>> Would it sound possible to have a JDK FastMath public API (lots faster but
>> less accurate?)
>>
>> Do you know if recent CPU (intel?) have dedicated instructions for such
>> math operations ?
>> Why not use it instead?
>> Maybe that's part of the new Vectorization API (panama) ?
>>
>> Cheers,
>> Laurent Bourges

Reply | Threaded
Open this post in threaded view
|

Re: Faster Math ?

Andrew Haley
In reply to this post by Laurent Bourgès
On 09/11/17 09:00, Laurent Bourgès wrote:

> The Marlin renderer (JEP265) uses few Math functions: sqrt, cbrt, acos...
>
> Could you check if the current JDK uses C2 intrinsics or libfdm (native /
> JNI overhead?) and tell me if such functions are already highly optimized
> in jdk9 or 10 ?
>
> Some people have implemented their own fast Math like Apache Commons Math
> or JaFaMa libraries that are 10x faster for acos / cbrt.

I'm not seeing that with Apache Commons Math.  I'm seeing this:

Benchmark                   Mode  Cnt   Score   Error  Units
MathBenchmark.fastMathCbrt  avgt    5  33.199 ? 0.122  ns/op
MathBenchmark.mathCbrt      avgt    5  43.124 ? 0.162  ns/op
MathBenchmark.fastMathAcos  avgt    5  85.985 ? 4.586  ns/op
MathBenchmark.mathAcos      avgt    5  28.326 ? 0.044  ns/op

It's nice, but it certainly isn't 10x.

--
Andrew Haley
Java Platform Lead Engineer
Red Hat UK Ltd. <https://www.redhat.com>
EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671
Reply | Threaded
Open this post in threaded view
|

Re: Faster Math ?

Laurent Bourgès
In reply to this post by joe darcy
Hello,

Some context first:
- Marlin renderer is now the default JDK & JFX renderer. Please consider
improving the performance of following 2 Math functions: cbrt, acos.
- I work for the public research in astrophysics by making software for
astronomy as java desktop apps (javaws + scientific computations) see
http://jmmc.fr . It is hard to promote Java in science as both Python &
Julia languages are wide spread.

Please consider any change making Java more competitive for Science:
- faster math, more math functions (matrix & vector API), FFT, GPU
computing... the Panama project is very important to me
- struct (value type) & Friend interface (fast native lib reuse) are
promising features ... Vahalla ?

Now I answer below:

Thanks for your feedback



As Paul noted, a portion of fdlibm has been ported from C to Java. I do
intend to finish the port at some point. The port gives an implementation
speedup by avoiding Java -> C -> Java transition overheads. However, the
same algorithms are being used of course.

The fdlibm code was first written several decades ago and there has been
work in the interim on developing other algorithms for math libraries. One
significant effort has focused on correctly rounded libraries, that is,
libraries that have full floating-point accuracy. In particular Jean-Michel
Muller and his students and collaborators have worked in this area and
produce the crlibm package. If a specification for a StrictMath-style class
were newly written today, I would recommend it be specified to be correctly
rounded. Correct rounding is conceptually the "best" answer and it does not
require the exact implementation algorithms to be specified to achieve
reproducibility, unlike fdlibm.


Accuracy is important but what is the cost ? Java has 2 Math
implementations: Math & StrictMath... but also the strictfp keyword.

So the Math class is the JDK fast math... compared to StrictMath.
Maybe it could give results less accurate: 1 or 2 last digits ... maybe 10
or 100 ulps ?


However, the extra precise answer can come at the cost of extra time or
space for the computation in some cases.

The notion of a "FastMath" library has been considered before (as well as
the faster underlying numerics [1]). As also discussed earlier in the
thread, specifying what degrees of inaccuracy is acceptable for what speed
is non-obvious. (And offhand I don't know the error bounds of the other
implementations being discussed.)


Please look JaFaMa @ github whose FastMath gives correct results at 1e-15
precision and is very fast.

I will give you my benchmark results on jdk9...


Working with Intel in OpenJDK, we are using optimized math library
implementations for x64 for many interesting methods. For most math library
methods, the trend has been to move to software-based implementations
rather than having specialized hardware instructions. (Functionality like
reciprocal square root is a counter-example, but we don't have that method
in the Java math library.)


Please port all maths in java first, delete fdlibm native code and later
make intrinsics for most used methods (any math used within jdk or jfx...)
Who could help ?


Note that since 1/3 is a repeating fraction in binary and decimal, pow(x,
1.0/3.0) is only approximately equivalent to cbrt(x).

Knowing which particular methods would be of interest for fast-but-loose
math would be helpful. The sqrt method has long been intrinsified to the
corresponding hardware instruction on many platforms so I don't think that
would be a useful candidate in most circumstances.


Yes but no CBRT intrinsics ! It is important for our cubics curve solver.
ACOS / ASIN are still slow.
I could make the port... in java.


In short, we might get a selection of looser but faster math methods at
some point, but not immediately and not without more investigation.


Of course.

Cheers,
Laurent


[1] Forward looking statements during "Forward to the Past: The Case for
Uniformly Strict Floating Point Arithmetic on the JVM"
https://youtu.be/qTKeU_3rhk4?t=2513



On 11/9/2017 9:19 AM, Paul Sandoz wrote:

> Hi Laurent,
>
> A Java method is a candidate for intrinsification if it is annotated with
> @HotSpotIntrinsicCandidate. When running Java code you can also use the
> HotSpot flags -XX:+PrintCompilarion -XX:+PrintInlining to show methods that
> are intrinsic (JIT watch, as mentioned, is also excellent in this regard).
>
> I recommend cloning OpenJDK and browsing the source.
>
> Some of the math functions are intrinsic in the interpreter and all the
> runtime compilers to ensure consistent results across interpretation and
> compilation.
>
> Work was done by Intel to improve many of the math functions. See:
>
>    Update for x86 sin and cos in the math lib
>    https://bugs.openjdk.java.net/browse/JDK-8143353
>
>    Update for x86 pow in the math lib
>    https://bugs.openjdk.java.net/browse/JDK-8145688
>       (From these you can track related issues.)
>
> Other Math functions are not intrinsic like cbrt (non-native) and acos
> (native). There is ongoing work to turn native implementations into Java
> implementations (i don’t know if there would be any follow up on
> intrinsification).
>
>    https://bugs.openjdk.java.net/browse/JDK-8134780
>    https://bugs.openjdk.java.net/browse/JDK-8171407
>
> Joe knows more.
>
> —
>
> As part of the Vector API effort we will likely need to investigate the
> support for less accurate but faster math functions. It’s too early to tell
> if something like a FastMath class will pop out of that, but FWIW i am
> sympathetic to that :-)
>
> I liked this tweet:
>
> https://twitter.com/FioraAeterna/status/926150700836405248
>
>    life as a gpu compiler dev is basically just fielding repeated
> complaints that
>    "fast math" isn't precise and "precise math" isn't fast
>
> as an indication of what we could be getting into :-)
>
> Paul.
>
> On 9 Nov 2017, at 01:00, Laurent Bourgès <[hidden email]>
>> wrote:
>>
>> Hi,
>>
>> The Marlin renderer (JEP265) uses few Math functions: sqrt, cbrt, acos...
>>
>> Could you check if the current JDK uses C2 intrinsics or libfdm (native /
>> JNI overhead?) and tell me if such functions are already highly optimized
>> in jdk9 or 10 ?
>>
>> Some people have implemented their own fast Math like Apache Commons Math
>> or JaFaMa libraries that are 10x faster for acos / cbrt.
>>
>> I wonder if I should implement my own cbrt function (cubics) in pure java
>> as I do not need the highest accuracy but SPEED.
>>
>> Would it sound possible to have a JDK FastMath public API (lots faster but
>> less accurate?)
>>
>> Do you know if recent CPU (intel?) have dedicated instructions for such
>> math operations ?
>> Why not use it instead?
>> Maybe that's part of the new Vectorization API (panama) ?
>>
>> Cheers,
>> Laurent Bourges
>>
>