Quantcast

counting down loop and unwanted test?

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

counting down loop and unwanted test?

Peter Veentjer
I'm looking at some assembly generated for a very simple for loop:

public class LoopCountDown {

    public static void main(String[] args) {
        int result = 0;
        for (int k = 0; k < 100_000; k++) {
            result += doMath(100);
        }
        System.out.println(result);
    }

    public static int doMath(int n) {
        int result = n;
        for (int k = n; k > 0; k--) {
            result += 2 * result;
        }
        return result;
    }
}

I'm not interested in the results of this program. I'm just curious about how the JIT optimized the actual loop-logic. 

When I output the assembler (full listing at the end), I get the following relevant part:

 0x000000011049868c: test   esi,esi
  0x000000011049868e: jle    0x00000001104986ad  ;*ifle
                                                ; - com.loop.LoopCountDown::doMath@5 (line 15)

  0x0000000110498690: mov    eax,esi            ;*iload_1
                                                ; - com.loop.LoopCountDown::doMath@8 (line 16)

  0x0000000110498692: mov    r10d,eax
  0x0000000110498695: shl    r10d,1
  0x0000000110498698: add    eax,r10d           ;*iadd
                                                ; - com.loop.LoopCountDown::doMath@12 (line 16)

  0x000000011049869b: dec    esi                ;*iinc
                                                ; - com.loop.LoopCountDown::doMath@14 (line 15)

  0x000000011049869d: test   esi,esi
  0x000000011049869f: jg     0x0000000110498692  ;*ifle
  

My question is about the test at 0x000000011049869d. Why is it needed? Perhaps it is too much of an edge case to optimize?

The dec instruction will set the NZ flag:

http://x86.renejeschke.de/html/file_module_x86_id_71.html

So if the jg at the end would be changed to a jnz, the jnz can make use of this NZ flag; no need for a test.

For more information see page 90 of http://www.agner.org/optimize/optimizing_assembly.pdf

My JVM arguments:

-XX:+UnlockDiagnosticVMOptions -XX:PrintAssemblyOptions=intel -XX:-TieredCompilation -XX:-Inline -XX:-BackgroundCompilation -XX:CompileCommand=print,*LoopCountDown.doMath -XX:LoopUnrollLimit=0

JVM version:

java version "1.8.0_91"

Java(TM) SE Runtime Environment (build 1.8.0_91-b14)

Java HotSpot(TM) 64-Bit Server VM (build 25.91-b14, mixed mode)



CompilerOracle: print *LoopCountDown.doMath
Compiled method (c2)     174    8             com.loop.LoopCountDown::doMath (22 bytes)
 total in heap  [0x0000000110498550,0x0000000110498790] = 576
 relocation     [0x0000000110498670,0x0000000110498678] = 8
 main code      [0x0000000110498680,0x00000001104986c0] = 64
 stub code      [0x00000001104986c0,0x00000001104986d8] = 24
 oops           [0x00000001104986d8,0x00000001104986e0] = 8
 metadata       [0x00000001104986e0,0x00000001104986e8] = 8
 scopes data    [0x00000001104986e8,0x0000000110498708] = 32
 scopes pcs     [0x0000000110498708,0x0000000110498788] = 128
 dependencies   [0x0000000110498788,0x0000000110498790] = 8
Loaded disassembler from /Library/Java/JavaVirtualMachines/jdk1.8.0.jdk/Contents/Home/jre/lib/hsdis-amd64.dylib
Decoding compiled method 0x0000000110498550:
Code:
[Disassembling for mach='i386:x86-64']
[Entry Point]
[Verified Entry Point]
[Constants]
  # {method} {0x00000001d89df478} 'doMath' '(I)I' in 'com/loop/LoopCountDown'
  # parm0:    rsi       = int
  #           [sp+0x20]  (sp of caller)
  0x0000000110498680: sub    rsp,0x18
  0x0000000110498687: mov    QWORD PTR [rsp+0x10],rbp  ;*synchronization entry
                                                ; - com.loop.LoopCountDown::doMath@-1 (line 14)

  0x000000011049868c: test   esi,esi
  0x000000011049868e: jle    0x00000001104986ad  ;*ifle
                                                ; - com.loop.LoopCountDown::doMath@5 (line 15)

  0x0000000110498690: mov    eax,esi            ;*iload_1
                                                ; - com.loop.LoopCountDown::doMath@8 (line 16)

  0x0000000110498692: mov    r10d,eax
  0x0000000110498695: shl    r10d,1
  0x0000000110498698: add    eax,r10d           ;*iadd
                                                ; - com.loop.LoopCountDown::doMath@12 (line 16)

  0x000000011049869b: dec    esi                ;*iinc
                                                ; - com.loop.LoopCountDown::doMath@14 (line 15)

  0x000000011049869d: test   esi,esi
  0x000000011049869f: jg     0x0000000110498692  ;*ifle
                                                ; - com.loop.LoopCountDown::doMath@5 (line 15)

  0x00000001104986a1: add    rsp,0x10
  0x00000001104986a5: pop    rbp
  0x00000001104986a6: test   DWORD PTR [rip+0xfffffffffff71954],eax        # 0x000000011040a000
                                                ;   {poll_return}
  0x00000001104986ac: ret    
  0x00000001104986ad: mov    eax,esi
  0x00000001104986af: jmp    0x00000001104986a1
  0x00000001104986b1: hlt    
  0x00000001104986b2: hlt    
  0x00000001104986b3: hlt    
  0x00000001104986b4: hlt    
  0x00000001104986b5: hlt    
  0x00000001104986b6: hlt    
  0x00000001104986b7: hlt    
  0x00000001104986b8: hlt    
  0x00000001104986b9: hlt    
  0x00000001104986ba: hlt    
  0x00000001104986bb: hlt    
  0x00000001104986bc: hlt    
  0x00000001104986bd: hlt    
  0x00000001104986be: hlt    
  0x00000001104986bf: hlt    
[Exception Handler]
[Stub Code]
  0x00000001104986c0: jmp    0x0000000110488f60  ;   {no_reloc}
[Deopt Handler Code]
  0x00000001104986c5: call   0x00000001104986ca
  0x00000001104986ca: sub    QWORD PTR [rsp],0x5
  0x00000001104986cf: jmp    0x0000000110463d00  ;   {runtime_call}
  0x00000001104986d4: hlt    
  0x00000001104986d5: hlt    
  0x00000001104986d6: hlt    
  0x00000001104986d7: hlt    
OopMapSet contains 0 OopMaps
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: counting down loop and unwanted test?

Peter Veentjer
Small correction.

It should be ZF and not NZ flag.

On Tue, Jan 31, 2017 at 11:30 AM, Peter Veentjer <[hidden email]> wrote:
I'm looking at some assembly generated for a very simple for loop:

public class LoopCountDown {

    public static void main(String[] args) {
        int result = 0;
        for (int k = 0; k < 100_000; k++) {
            result += doMath(100);
        }
        System.out.println(result);
    }

    public static int doMath(int n) {
        int result = n;
        for (int k = n; k > 0; k--) {
            result += 2 * result;
        }
        return result;
    }
}

I'm not interested in the results of this program. I'm just curious about how the JIT optimized the actual loop-logic. 

When I output the assembler (full listing at the end), I get the following relevant part:

 0x000000011049868c: test   esi,esi
  0x000000011049868e: jle    0x00000001104986ad  ;*ifle
                                                ; - com.loop.LoopCountDown::doMath@5 (line 15)

  0x0000000110498690: mov    eax,esi            ;*iload_1
                                                ; - com.loop.LoopCountDown::doMath@8 (line 16)

  0x0000000110498692: mov    r10d,eax
  0x0000000110498695: shl    r10d,1
  0x0000000110498698: add    eax,r10d           ;*iadd
                                                ; - com.loop.LoopCountDown::doMath@12 (line 16)

  0x000000011049869b: dec    esi                ;*iinc
                                                ; - com.loop.LoopCountDown::doMath@14 (line 15)

  0x000000011049869d: test   esi,esi
  0x000000011049869f: jg     0x0000000110498692  ;*ifle
  

My question is about the test at 0x000000011049869d. Why is it needed? Perhaps it is too much of an edge case to optimize?

The dec instruction will set the NZ flag:

http://x86.renejeschke.de/html/file_module_x86_id_71.html

So if the jg at the end would be changed to a jnz, the jnz can make use of this NZ flag; no need for a test.

For more information see page 90 of http://www.agner.org/optimize/optimizing_assembly.pdf

My JVM arguments:

-XX:+UnlockDiagnosticVMOptions -XX:PrintAssemblyOptions=intel -XX:-TieredCompilation -XX:-Inline -XX:-BackgroundCompilation -XX:CompileCommand=print,*LoopCountDown.doMath -XX:LoopUnrollLimit=0

JVM version:

java version "1.8.0_91"

Java(TM) SE Runtime Environment (build 1.8.0_91-b14)

Java HotSpot(TM) 64-Bit Server VM (build 25.91-b14, mixed mode)



CompilerOracle: print *LoopCountDown.doMath
Compiled method (c2)     174    8             com.loop.LoopCountDown::doMath (22 bytes)
 total in heap  [0x0000000110498550,0x0000000110498790] = 576
 relocation     [0x0000000110498670,0x0000000110498678] = 8
 main code      [0x0000000110498680,0x00000001104986c0] = 64
 stub code      [0x00000001104986c0,0x00000001104986d8] = 24
 oops           [0x00000001104986d8,0x00000001104986e0] = 8
 metadata       [0x00000001104986e0,0x00000001104986e8] = 8
 scopes data    [0x00000001104986e8,0x0000000110498708] = 32
 scopes pcs     [0x0000000110498708,0x0000000110498788] = 128
 dependencies   [0x0000000110498788,0x0000000110498790] = 8
Loaded disassembler from /Library/Java/JavaVirtualMachines/jdk1.8.0.jdk/Contents/Home/jre/lib/hsdis-amd64.dylib
Decoding compiled method 0x0000000110498550:
Code:
[Disassembling for mach='i386:x86-64']
[Entry Point]
[Verified Entry Point]
[Constants]
  # {method} {0x00000001d89df478} 'doMath' '(I)I' in 'com/loop/LoopCountDown'
  # parm0:    rsi       = int
  #           [sp+0x20]  (sp of caller)
  0x0000000110498680: sub    rsp,0x18
  0x0000000110498687: mov    QWORD PTR [rsp+0x10],rbp  ;*synchronization entry
                                                ; - com.loop.LoopCountDown::doMath@-1 (line 14)

  0x000000011049868c: test   esi,esi
  0x000000011049868e: jle    0x00000001104986ad  ;*ifle
                                                ; - com.loop.LoopCountDown::doMath@5 (line 15)

  0x0000000110498690: mov    eax,esi            ;*iload_1
                                                ; - com.loop.LoopCountDown::doMath@8 (line 16)

  0x0000000110498692: mov    r10d,eax
  0x0000000110498695: shl    r10d,1
  0x0000000110498698: add    eax,r10d           ;*iadd
                                                ; - com.loop.LoopCountDown::doMath@12 (line 16)

  0x000000011049869b: dec    esi                ;*iinc
                                                ; - com.loop.LoopCountDown::doMath@14 (line 15)

  0x000000011049869d: test   esi,esi
  0x000000011049869f: jg     0x0000000110498692  ;*ifle
                                                ; - com.loop.LoopCountDown::doMath@5 (line 15)

  0x00000001104986a1: add    rsp,0x10
  0x00000001104986a5: pop    rbp
  0x00000001104986a6: test   DWORD PTR [rip+0xfffffffffff71954],eax        # 0x000000011040a000
                                                ;   {poll_return}
  0x00000001104986ac: ret    
  0x00000001104986ad: mov    eax,esi
  0x00000001104986af: jmp    0x00000001104986a1
  0x00000001104986b1: hlt    
  0x00000001104986b2: hlt    
  0x00000001104986b3: hlt    
  0x00000001104986b4: hlt    
  0x00000001104986b5: hlt    
  0x00000001104986b6: hlt    
  0x00000001104986b7: hlt    
  0x00000001104986b8: hlt    
  0x00000001104986b9: hlt    
  0x00000001104986ba: hlt    
  0x00000001104986bb: hlt    
  0x00000001104986bc: hlt    
  0x00000001104986bd: hlt    
  0x00000001104986be: hlt    
  0x00000001104986bf: hlt    
[Exception Handler]
[Stub Code]
  0x00000001104986c0: jmp    0x0000000110488f60  ;   {no_reloc}
[Deopt Handler Code]
  0x00000001104986c5: call   0x00000001104986ca
  0x00000001104986ca: sub    QWORD PTR [rsp],0x5
  0x00000001104986cf: jmp    0x0000000110463d00  ;   {runtime_call}
  0x00000001104986d4: hlt    
  0x00000001104986d5: hlt    
  0x00000001104986d6: hlt    
  0x00000001104986d7: hlt    
OopMapSet contains 0 OopMaps

Loading...