Hi,
The Marlin renderer (JEP265) uses few Math functions: sqrt, cbrt, acos... Could you check if the current JDK uses C2 intrinsics or libfdm (native / JNI overhead?) and tell me if such functions are already highly optimized in jdk9 or 10 ? Some people have implemented their own fast Math like Apache Commons Math or JaFaMa libraries that are 10x faster for acos / cbrt. I wonder if I should implement my own cbrt function (cubics) in pure java as I do not need the highest accuracy but SPEED. Would it sound possible to have a JDK FastMath public API (lots faster but less accurate?) Do you know if recent CPU (intel?) have dedicated instructions for such math operations ? Why not use it instead? Maybe that's part of the new Vectorization API (panama) ? Cheers, Laurent Bourges |
Hey,
Most functions in the Math class are intrinsic ( http://hg.openjdk.java.net/jdk8u/jdk8u/hotspot/file/tip/src/share/vm/classfile/vmSymbols.hpp#l664 ) and will use native instructions where available. You can also test this yourself using jitwatch. There is no native call overhead. The standard library does not currently include less accurate but faster Math functions, maybe someone else can answer if that is something to be considered. - Jonas Konrad On 11/09/2017 10:00 AM, Laurent Bourgès wrote: > Hi, > > The Marlin renderer (JEP265) uses few Math functions: sqrt, cbrt, acos... > > Could you check if the current JDK uses C2 intrinsics or libfdm (native / > JNI overhead?) and tell me if such functions are already highly optimized > in jdk9 or 10 ? > > Some people have implemented their own fast Math like Apache Commons Math > or JaFaMa libraries that are 10x faster for acos / cbrt. > > I wonder if I should implement my own cbrt function (cubics) in pure java > as I do not need the highest accuracy but SPEED. > > Would it sound possible to have a JDK FastMath public API (lots faster but > less accurate?) > > Do you know if recent CPU (intel?) have dedicated instructions for such > math operations ? > Why not use it instead? > Maybe that's part of the new Vectorization API (panama) ? > > Cheers, > Laurent Bourges > |
I checked in the latest jdk master and both cbrt / acos are NOT intrinsics.
However, cbrt(x) = pow(x, 1/3) so it may be optmized... Could someone tell me how cbrt() is concretely implemented ? In native libfdm, there is no e_cbrt.c ! Thanks for your help, Laurent Le 9 nov. 2017 10:52 AM, "Jonas Konrad" <[hidden email]> a écrit : > Hey, > > Most functions in the Math class are intrinsic ( > http://hg.openjdk.java.net/jdk8u/jdk8u/hotspot/file/tip/src/ > share/vm/classfile/vmSymbols.hpp#l664 ) and will use native instructions > where available. You can also test this yourself using jitwatch. There is > no native call overhead. > > The standard library does not currently include less accurate but faster > Math functions, maybe someone else can answer if that is something to be > considered. > > - Jonas Konrad > > On 11/09/2017 10:00 AM, Laurent Bourgès wrote: > >> Hi, >> >> The Marlin renderer (JEP265) uses few Math functions: sqrt, cbrt, acos... >> >> Could you check if the current JDK uses C2 intrinsics or libfdm (native / >> JNI overhead?) and tell me if such functions are already highly optimized >> in jdk9 or 10 ? >> >> Some people have implemented their own fast Math like Apache Commons Math >> or JaFaMa libraries that are 10x faster for acos / cbrt. >> >> I wonder if I should implement my own cbrt function (cubics) in pure java >> as I do not need the highest accuracy but SPEED. >> >> Would it sound possible to have a JDK FastMath public API (lots faster but >> less accurate?) >> >> Do you know if recent CPU (intel?) have dedicated instructions for such >> math operations ? >> Why not use it instead? >> Maybe that's part of the new Vectorization API (panama) ? >> >> Cheers, >> Laurent Bourges >> >> |
Hi,
Here are very basic benchmark results from (JaFaMa 2 - FastMathPerf) made on my laptop (i7-6820HQ set @ 2Ghz + JDK8): --- testing asin(double) --- Loop on Math.asin(double) took 6.675 s Loop on FastMath.asin(double) took 0.162 s --- testing acos(double) --- Loop on Math.acos(double) took 6.332 s Loop on FastMath.acos(double) took 0.16 s --- testing atan(double) --- Loop on Math.atan(double) took 0.766 s Loop on FastMath.atan(double) took 0.167 --- testing sqrt(double) --- Loop on Math.sqrt(double), args in [0.0,10.0], took 0.095 s Loop on FastMath.sqrt(double), args in [0.0,10.0], took 0.097 s Loop on Math.sqrt(double), args in [0.0,1.0E12], took 0.109 s Loop on FastMath.sqrt(double), args in [0.0,1.0E12], took 0.093 s Loop on Math.sqrt(double), args in all magnitudes (>=0), took 0.091 s Loop on FastMath.sqrt(double), args in all magnitudes (>=0), took 0.092 --- testing cbrt(double) --- Loop on Math.cbrt(double), args in [-10.0,10.0], took 1.152 s Loop on FastMath.cbrt(double), args in [-10.0,10.0], took 0.195 s Loop on Math.cbrt(double), args in [-1.0E12,1.0E12], took 1.153 s Loop on FastMath.cbrt(double), args in [-1.0E12,1.0E12], took 0.193 s Loop on Math.cbrt(double), args in all magnitudes, took 1.154 s Loop on FastMath.cbrt(double), args in all magnitudes, took 0.272 --- testing cbrt(double) = pow(double, 1/3) --- Loop on Math.pow(double, 1/3), args in [-10.0,10.0], took 0.739 s Loop on FastMath.cbrt(double), args in [-10.0,10.0], took 0.166 s Loop on Math.pow(double, 1/3), args in [-0.7,0.7], took 0.746 s Loop on FastMath.cbrt(double), args in [-0.7,0.7], took 0.166 s Loop on Math.pow(double, 1/3), args in [-0.1,0.1], took 0.742 s Loop on FastMath.cbrt(double), args in [-0.1,0.1], took 0.165 s Loop on Math.pow(double, 1/3), args in all magnitudes, took 0.753 s Loop on FastMath.cbrt(double), args in all magnitudes, took 0.244 Conclusion: - acos / asin / atan functions are quite slow: it confirms these are not optimized by hotspot intrinsics. - cbrt() is slower than sqrt() : 1.1s vs 0.1 => 10x slower - cbrt() is slower than pow(1/3) : 1.1s vs 0.7s => 50% slower Any plan to enhance these specific math operations ? Laurent 2017-11-09 14:33 GMT+01:00 Laurent Bourgès <[hidden email]>: > I checked in the latest jdk master and both cbrt / acos are NOT intrinsics. > > However, cbrt(x) = pow(x, 1/3) so it may be optmized... > > Could someone tell me how cbrt() is concretely implemented ? > > In native libfdm, there is no e_cbrt.c ! > > Thanks for your help, > Laurent > > Le 9 nov. 2017 10:52 AM, "Jonas Konrad" <[hidden email]> a écrit : > >> Hey, >> >> Most functions in the Math class are intrinsic ( >> http://hg.openjdk.java.net/jdk8u/jdk8u/hotspot/file/tip/src/ >> share/vm/classfile/vmSymbols.hpp#l664 ) and will use native instructions >> where available. You can also test this yourself using jitwatch. There is >> no native call overhead. >> >> The standard library does not currently include less accurate but faster >> Math functions, maybe someone else can answer if that is something to be >> considered. >> >> - Jonas Konrad >> >> On 11/09/2017 10:00 AM, Laurent Bourgès wrote: >> >>> Hi, >>> >>> The Marlin renderer (JEP265) uses few Math functions: sqrt, cbrt, acos... >>> >>> Could you check if the current JDK uses C2 intrinsics or libfdm (native / >>> JNI overhead?) and tell me if such functions are already highly optimized >>> in jdk9 or 10 ? >>> >>> Some people have implemented their own fast Math like Apache Commons Math >>> or JaFaMa libraries that are 10x faster for acos / cbrt. >>> >>> I wonder if I should implement my own cbrt function (cubics) in pure java >>> as I do not need the highest accuracy but SPEED. >>> >>> Would it sound possible to have a JDK FastMath public API (lots faster >>> but >>> less accurate?) >>> >>> Do you know if recent CPU (intel?) have dedicated instructions for such >>> math operations ? >>> Why not use it instead? >>> Maybe that's part of the new Vectorization API (panama) ? >>> >>> Cheers, >>> Laurent Bourges >>> >>> -- -- Laurent Bourgès |
In reply to this post by Laurent Bourgès
On 09/11/17 13:33, Laurent Bourgès wrote:
> I checked in the latest jdk master and both cbrt / acos are NOT intrinsics. > > However, cbrt(x) = pow(x, 1/3) so it may be optmized... > > Could someone tell me how cbrt() is concretely implemented ? It's in FdLibm.java. It's not great, but it's better than it used to be now that it's not a native call. I'm seeing that it's twice as fast as the previous native implementation. > In native libfdm, there is no e_cbrt.c ! -- Andrew Haley Java Platform Lead Engineer Red Hat UK Ltd. <https://www.redhat.com> EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 |
In reply to this post by Laurent Bourgès
On 09/11/17 15:02, Laurent Bourgès wrote:
> --- testing cbrt(double) = pow(double, 1/3) --- > Loop on Math.pow(double, 1/3), args in [-10.0,10.0], took 0.739 s > Loop on FastMath.cbrt(double), args in [-10.0,10.0], took 0.166 s > Loop on Math.pow(double, 1/3), args in [-0.7,0.7], took 0.746 s > Loop on FastMath.cbrt(double), args in [-0.7,0.7], took 0.166 s > Loop on Math.pow(double, 1/3), args in [-0.1,0.1], took 0.742 s > Loop on FastMath.cbrt(double), args in [-0.1,0.1], took 0.165 s > Loop on Math.pow(double, 1/3), args in all magnitudes, took 0.753 s > Loop on FastMath.cbrt(double), args in all magnitudes, took 0.244 > > Conclusion: > - acos / asin / atan functions are quite slow: it confirms these are not > optimized by hotspot intrinsics. > > - cbrt() is slower than sqrt() : 1.1s vs 0.1 => 10x slower > - cbrt() is slower than pow(1/3) : 1.1s vs 0.7s => 50% slower No. cbrt() is faster than pow(1/3) : 0.24 vs 0.75 -- Andrew Haley Java Platform Lead Engineer Red Hat UK Ltd. <https://www.redhat.com> EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 |
In reply to this post by Laurent Bourgès
Hi Laurent,
A Java method is a candidate for intrinsification if it is annotated with @HotSpotIntrinsicCandidate. When running Java code you can also use the HotSpot flags -XX:+PrintCompilarion -XX:+PrintInlining to show methods that are intrinsic (JIT watch, as mentioned, is also excellent in this regard). I recommend cloning OpenJDK and browsing the source. Some of the math functions are intrinsic in the interpreter and all the runtime compilers to ensure consistent results across interpretation and compilation. Work was done by Intel to improve many of the math functions. See: Update for x86 sin and cos in the math lib https://bugs.openjdk.java.net/browse/JDK-8143353 Update for x86 pow in the math lib https://bugs.openjdk.java.net/browse/JDK-8145688 (From these you can track related issues.) Other Math functions are not intrinsic like cbrt (non-native) and acos (native). There is ongoing work to turn native implementations into Java implementations (i don’t know if there would be any follow up on intrinsification). https://bugs.openjdk.java.net/browse/JDK-8134780 https://bugs.openjdk.java.net/browse/JDK-8171407 Joe knows more. — As part of the Vector API effort we will likely need to investigate the support for less accurate but faster math functions. It’s too early to tell if something like a FastMath class will pop out of that, but FWIW i am sympathetic to that :-) I liked this tweet: https://twitter.com/FioraAeterna/status/926150700836405248 life as a gpu compiler dev is basically just fielding repeated complaints that "fast math" isn't precise and "precise math" isn't fast as an indication of what we could be getting into :-) Paul. > On 9 Nov 2017, at 01:00, Laurent Bourgès <[hidden email]> wrote: > > Hi, > > The Marlin renderer (JEP265) uses few Math functions: sqrt, cbrt, acos... > > Could you check if the current JDK uses C2 intrinsics or libfdm (native / > JNI overhead?) and tell me if such functions are already highly optimized > in jdk9 or 10 ? > > Some people have implemented their own fast Math like Apache Commons Math > or JaFaMa libraries that are 10x faster for acos / cbrt. > > I wonder if I should implement my own cbrt function (cubics) in pure java > as I do not need the highest accuracy but SPEED. > > Would it sound possible to have a JDK FastMath public API (lots faster but > less accurate?) > > Do you know if recent CPU (intel?) have dedicated instructions for such > math operations ? > Why not use it instead? > Maybe that's part of the new Vectorization API (panama) ? > > Cheers, > Laurent Bourges |
In reply to this post by Andrew Haley
Thanks, andrew.
I searched on the web and I understand now: Fdlibm native library has been ported in Java code for jdk9 (like the jafama library). Cbrt changeset: http://hg.openjdk.java.net/jdk9/jdk9/jdk/rev/7dc9726cfa82 I will anyway compare jdk9 with latest jafama 2.2 to have an up-to-date comparison. Does somebody else need a faster but less accurate math library ? JaFaMa has such alternative methods... Preserving double-precision may be very costly in terms of performance and sometimes such accuracy is not required. Last question: Is there a sin & cos function returning both values for the same angle ? It is very useful to compute exact fourrier transform... It is called sinAndCos(double wrappers) in jafama. Cheers, Laurent Le 9 nov. 2017 17:08, "Andrew Haley" <[hidden email]> a écrit : > On 09/11/17 15:02, Laurent Bourgès wrote: > > --- testing cbrt(double) = pow(double, 1/3) --- > > Loop on Math.pow(double, 1/3), args in [-10.0,10.0], took 0.739 s > > Loop on FastMath.cbrt(double), args in [-10.0,10.0], took 0.166 s > > Loop on Math.pow(double, 1/3), args in [-0.7,0.7], took 0.746 s > > Loop on FastMath.cbrt(double), args in [-0.7,0.7], took 0.166 s > > Loop on Math.pow(double, 1/3), args in [-0.1,0.1], took 0.742 s > > Loop on FastMath.cbrt(double), args in [-0.1,0.1], took 0.165 s > > Loop on Math.pow(double, 1/3), args in all magnitudes, took 0.753 s > > Loop on FastMath.cbrt(double), args in all magnitudes, took 0.244 > > > > Conclusion: > > - acos / asin / atan functions are quite slow: it confirms these are not > > optimized by hotspot intrinsics. > > > > - cbrt() is slower than sqrt() : 1.1s vs 0.1 => 10x slower > > - cbrt() is slower than pow(1/3) : 1.1s vs 0.7s => 50% slower > > No. cbrt() is faster than pow(1/3) : 0.24 vs 0.75 > > -- > Andrew Haley > Java Platform Lead Engineer > Red Hat UK Ltd. <https://www.redhat.com> > EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 > |
In reply to this post by Paul Sandoz
Hi Paul,
Thank you so much for this complete summary ! I will still perform some benchmarks and could port acos native code into java code as it is used by marlin. Anyway I will backport the Cbrt java code into Marlin @ github for JDK8 users (GPL v2). Thanks, Laurent Le 9 nov. 2017 18:19, "Paul Sandoz" <[hidden email]> a écrit : > Hi Laurent, > > A Java method is a candidate for intrinsification if it is annotated with > @HotSpotIntrinsicCandidate. When running Java code you can also use the > HotSpot flags -XX:+PrintCompilarion -XX:+PrintInlining to show methods that > are intrinsic (JIT watch, as mentioned, is also excellent in this regard). > > I recommend cloning OpenJDK and browsing the source. > > Some of the math functions are intrinsic in the interpreter and all the > runtime compilers to ensure consistent results across interpretation and > compilation. > > Work was done by Intel to improve many of the math functions. See: > > Update for x86 sin and cos in the math lib > https://bugs.openjdk.java.net/browse/JDK-8143353 > > Update for x86 pow in the math lib > https://bugs.openjdk.java.net/browse/JDK-8145688 > > (From these you can track related issues.) > > Other Math functions are not intrinsic like cbrt (non-native) and acos > (native). There is ongoing work to turn native implementations into Java > implementations (i don’t know if there would be any follow up on > intrinsification). > > https://bugs.openjdk.java.net/browse/JDK-8134780 > https://bugs.openjdk.java.net/browse/JDK-8171407 > > Joe knows more. > > — > > As part of the Vector API effort we will likely need to investigate the > support for less accurate but faster math functions. It’s too early to tell > if something like a FastMath class will pop out of that, but FWIW i am > sympathetic to that :-) > > I liked this tweet: > > https://twitter.com/FioraAeterna/status/926150700836405248 > > life as a gpu compiler dev is basically just fielding repeated > complaints that > "fast math" isn't precise and "precise math" isn't fast > > as an indication of what we could be getting into :-) > > Paul. > > > On 9 Nov 2017, at 01:00, Laurent Bourgès <[hidden email]> > wrote: > > > > Hi, > > > > The Marlin renderer (JEP265) uses few Math functions: sqrt, cbrt, acos... > > > > Could you check if the current JDK uses C2 intrinsics or libfdm (native / > > JNI overhead?) and tell me if such functions are already highly optimized > > in jdk9 or 10 ? > > > > Some people have implemented their own fast Math like Apache Commons Math > > or JaFaMa libraries that are 10x faster for acos / cbrt. > > > > I wonder if I should implement my own cbrt function (cubics) in pure java > > as I do not need the highest accuracy but SPEED. > > > > Would it sound possible to have a JDK FastMath public API (lots faster > but > > less accurate?) > > > > Do you know if recent CPU (intel?) have dedicated instructions for such > > math operations ? > > Why not use it instead? > > Maybe that's part of the new Vectorization API (panama) ? > > > > Cheers, > > Laurent Bourges > > |
In reply to this post by Paul Sandoz
Hello,
A few comments on this thread: As Paul noted, a portion of fdlibm has been ported from C to Java. I do intend to finish the port at some point. The port gives an implementation speedup by avoiding Java -> C -> Java transition overheads. However, the same algorithms are being used of course. The fdlibm code was first written several decades ago and there has been work in the interim on developing other algorithms for math libraries. One significant effort has focused on correctly rounded libraries, that is, libraries that have full floating-point accuracy. In particular Jean-Michel Muller and his students and collaborators have worked in this area and produce the crlibm package. If a specification for a StrictMath-style class were newly written today, I would recommend it be specified to be correctly rounded. Correct rounding is conceptually the "best" answer and it does not require the exact implementation algorithms to be specified to achieve reproducibility, unlike fdlibm. However, the extra precise answer can come at the cost of extra time or space for the computation in some cases. The notion of a "FastMath" library has been considered before (as well as the faster underlying numerics [1]). As also discussed earlier in the thread, specifying what degrees of inaccuracy is acceptable for what speed is non-obvious. (And offhand I don't know the error bounds of the other implementations being discussed.) Working with Intel in OpenJDK, we are using optimized math library implementations for x64 for many interesting methods. For most math library methods, the trend has been to move to software-based implementations rather than having specialized hardware instructions. (Functionality like reciprocal square root is a counter-example, but we don't have that method in the Java math library.) Note that since 1/3 is a repeating fraction in binary and decimal, pow(x, 1.0/3.0) is only approximately equivalent to cbrt(x). Knowing which particular methods would be of interest for fast-but-loose math would be helpful. The sqrt method has long been intrinsified to the corresponding hardware instruction on many platforms so I don't think that would be a useful candidate in most circumstances. In short, we might get a selection of looser but faster math methods at some point, but not immediately and not without more investigation. Cheers, -Joe [1] Forward looking statements during "Forward to the Past: The Case for Uniformly Strict Floating Point Arithmetic on the JVM" https://youtu.be/qTKeU_3rhk4?t=2513 On 11/9/2017 9:19 AM, Paul Sandoz wrote: > Hi Laurent, > > A Java method is a candidate for intrinsification if it is annotated with @HotSpotIntrinsicCandidate. When running Java code you can also use the HotSpot flags -XX:+PrintCompilarion -XX:+PrintInlining to show methods that are intrinsic (JIT watch, as mentioned, is also excellent in this regard). > > I recommend cloning OpenJDK and browsing the source. > > Some of the math functions are intrinsic in the interpreter and all the runtime compilers to ensure consistent results across interpretation and compilation. > > Work was done by Intel to improve many of the math functions. See: > > Update for x86 sin and cos in the math lib > https://bugs.openjdk.java.net/browse/JDK-8143353 > > Update for x86 pow in the math lib > https://bugs.openjdk.java.net/browse/JDK-8145688 > > (From these you can track related issues.) > > Other Math functions are not intrinsic like cbrt (non-native) and acos (native). There is ongoing work to turn native implementations into Java implementations (i don’t know if there would be any follow up on intrinsification). > > https://bugs.openjdk.java.net/browse/JDK-8134780 > https://bugs.openjdk.java.net/browse/JDK-8171407 > > Joe knows more. > > — > > As part of the Vector API effort we will likely need to investigate the support for less accurate but faster math functions. It’s too early to tell if something like a FastMath class will pop out of that, but FWIW i am sympathetic to that :-) > > I liked this tweet: > > https://twitter.com/FioraAeterna/status/926150700836405248 > > life as a gpu compiler dev is basically just fielding repeated complaints that > "fast math" isn't precise and "precise math" isn't fast > > as an indication of what we could be getting into :-) > > Paul. > >> On 9 Nov 2017, at 01:00, Laurent Bourgès <[hidden email]> wrote: >> >> Hi, >> >> The Marlin renderer (JEP265) uses few Math functions: sqrt, cbrt, acos... >> >> Could you check if the current JDK uses C2 intrinsics or libfdm (native / >> JNI overhead?) and tell me if such functions are already highly optimized >> in jdk9 or 10 ? >> >> Some people have implemented their own fast Math like Apache Commons Math >> or JaFaMa libraries that are 10x faster for acos / cbrt. >> >> I wonder if I should implement my own cbrt function (cubics) in pure java >> as I do not need the highest accuracy but SPEED. >> >> Would it sound possible to have a JDK FastMath public API (lots faster but >> less accurate?) >> >> Do you know if recent CPU (intel?) have dedicated instructions for such >> math operations ? >> Why not use it instead? >> Maybe that's part of the new Vectorization API (panama) ? >> >> Cheers, >> Laurent Bourges |
In reply to this post by Laurent Bourgès
On 09/11/17 09:00, Laurent Bourgès wrote:
> The Marlin renderer (JEP265) uses few Math functions: sqrt, cbrt, acos... > > Could you check if the current JDK uses C2 intrinsics or libfdm (native / > JNI overhead?) and tell me if such functions are already highly optimized > in jdk9 or 10 ? > > Some people have implemented their own fast Math like Apache Commons Math > or JaFaMa libraries that are 10x faster for acos / cbrt. I'm not seeing that with Apache Commons Math. I'm seeing this: Benchmark Mode Cnt Score Error Units MathBenchmark.fastMathCbrt avgt 5 33.199 ? 0.122 ns/op MathBenchmark.mathCbrt avgt 5 43.124 ? 0.162 ns/op MathBenchmark.fastMathAcos avgt 5 85.985 ? 4.586 ns/op MathBenchmark.mathAcos avgt 5 28.326 ? 0.044 ns/op It's nice, but it certainly isn't 10x. -- Andrew Haley Java Platform Lead Engineer Red Hat UK Ltd. <https://www.redhat.com> EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 |
In reply to this post by Joseph D. Darcy
Hello,
Some context first: - Marlin renderer is now the default JDK & JFX renderer. Please consider improving the performance of following 2 Math functions: cbrt, acos. - I work for the public research in astrophysics by making software for astronomy as java desktop apps (javaws + scientific computations) see http://jmmc.fr . It is hard to promote Java in science as both Python & Julia languages are wide spread. Please consider any change making Java more competitive for Science: - faster math, more math functions (matrix & vector API), FFT, GPU computing... the Panama project is very important to me - struct (value type) & Friend interface (fast native lib reuse) are promising features ... Vahalla ? Now I answer below: Thanks for your feedback As Paul noted, a portion of fdlibm has been ported from C to Java. I do intend to finish the port at some point. The port gives an implementation speedup by avoiding Java -> C -> Java transition overheads. However, the same algorithms are being used of course. The fdlibm code was first written several decades ago and there has been work in the interim on developing other algorithms for math libraries. One significant effort has focused on correctly rounded libraries, that is, libraries that have full floating-point accuracy. In particular Jean-Michel Muller and his students and collaborators have worked in this area and produce the crlibm package. If a specification for a StrictMath-style class were newly written today, I would recommend it be specified to be correctly rounded. Correct rounding is conceptually the "best" answer and it does not require the exact implementation algorithms to be specified to achieve reproducibility, unlike fdlibm. Accuracy is important but what is the cost ? Java has 2 Math implementations: Math & StrictMath... but also the strictfp keyword. So the Math class is the JDK fast math... compared to StrictMath. Maybe it could give results less accurate: 1 or 2 last digits ... maybe 10 or 100 ulps ? However, the extra precise answer can come at the cost of extra time or space for the computation in some cases. The notion of a "FastMath" library has been considered before (as well as the faster underlying numerics [1]). As also discussed earlier in the thread, specifying what degrees of inaccuracy is acceptable for what speed is non-obvious. (And offhand I don't know the error bounds of the other implementations being discussed.) Please look JaFaMa @ github whose FastMath gives correct results at 1e-15 precision and is very fast. I will give you my benchmark results on jdk9... Working with Intel in OpenJDK, we are using optimized math library implementations for x64 for many interesting methods. For most math library methods, the trend has been to move to software-based implementations rather than having specialized hardware instructions. (Functionality like reciprocal square root is a counter-example, but we don't have that method in the Java math library.) Please port all maths in java first, delete fdlibm native code and later make intrinsics for most used methods (any math used within jdk or jfx...) Who could help ? Note that since 1/3 is a repeating fraction in binary and decimal, pow(x, 1.0/3.0) is only approximately equivalent to cbrt(x). Knowing which particular methods would be of interest for fast-but-loose math would be helpful. The sqrt method has long been intrinsified to the corresponding hardware instruction on many platforms so I don't think that would be a useful candidate in most circumstances. Yes but no CBRT intrinsics ! It is important for our cubics curve solver. ACOS / ASIN are still slow. I could make the port... in java. In short, we might get a selection of looser but faster math methods at some point, but not immediately and not without more investigation. Of course. Cheers, Laurent [1] Forward looking statements during "Forward to the Past: The Case for Uniformly Strict Floating Point Arithmetic on the JVM" https://youtu.be/qTKeU_3rhk4?t=2513 On 11/9/2017 9:19 AM, Paul Sandoz wrote: > Hi Laurent, > > A Java method is a candidate for intrinsification if it is annotated with > @HotSpotIntrinsicCandidate. When running Java code you can also use the > HotSpot flags -XX:+PrintCompilarion -XX:+PrintInlining to show methods that > are intrinsic (JIT watch, as mentioned, is also excellent in this regard). > > I recommend cloning OpenJDK and browsing the source. > > Some of the math functions are intrinsic in the interpreter and all the > runtime compilers to ensure consistent results across interpretation and > compilation. > > Work was done by Intel to improve many of the math functions. See: > > Update for x86 sin and cos in the math lib > https://bugs.openjdk.java.net/browse/JDK-8143353 > > Update for x86 pow in the math lib > https://bugs.openjdk.java.net/browse/JDK-8145688 > (From these you can track related issues.) > > Other Math functions are not intrinsic like cbrt (non-native) and acos > (native). There is ongoing work to turn native implementations into Java > implementations (i don’t know if there would be any follow up on > intrinsification). > > https://bugs.openjdk.java.net/browse/JDK-8134780 > https://bugs.openjdk.java.net/browse/JDK-8171407 > > Joe knows more. > > — > > As part of the Vector API effort we will likely need to investigate the > support for less accurate but faster math functions. It’s too early to tell > if something like a FastMath class will pop out of that, but FWIW i am > sympathetic to that :-) > > I liked this tweet: > > https://twitter.com/FioraAeterna/status/926150700836405248 > > life as a gpu compiler dev is basically just fielding repeated > complaints that > "fast math" isn't precise and "precise math" isn't fast > > as an indication of what we could be getting into :-) > > Paul. > > On 9 Nov 2017, at 01:00, Laurent Bourgès <[hidden email]> >> wrote: >> >> Hi, >> >> The Marlin renderer (JEP265) uses few Math functions: sqrt, cbrt, acos... >> >> Could you check if the current JDK uses C2 intrinsics or libfdm (native / >> JNI overhead?) and tell me if such functions are already highly optimized >> in jdk9 or 10 ? >> >> Some people have implemented their own fast Math like Apache Commons Math >> or JaFaMa libraries that are 10x faster for acos / cbrt. >> >> I wonder if I should implement my own cbrt function (cubics) in pure java >> as I do not need the highest accuracy but SPEED. >> >> Would it sound possible to have a JDK FastMath public API (lots faster but >> less accurate?) >> >> Do you know if recent CPU (intel?) have dedicated instructions for such >> math operations ? >> Why not use it instead? >> Maybe that's part of the new Vectorization API (panama) ? >> >> Cheers, >> Laurent Bourges >> > |
Free forum by Nabble | Edit this page |