SuperWordLoopUnrollAnalysis and loop unrolling

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

SuperWordLoopUnrollAnalysis and loop unrolling

Andrew Haley
If I set SuperWordLoopUnrollAnalysis=true, then AArch64 C2 stops after
unrolling a simple loop 4 times.  If I set
SuperWordLoopUnrollAnalysis=false, it stops after unrolling 16 times.
Why is is that SuperWordLoopUnrollAnalysis limits unrolling in this
way?

The info says
"Map number of unrolls for main loop via Superword Level Parallelism
analysis" but that doesn't help me very much.  Only AArch64 and x86
set this option.

Thanks,

Andrew.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: SuperWordLoopUnrollAnalysis and loop unrolling

Vladimir Kozlov
Try to run debug VM with -XX:+TraceSuperWordLoopUnrollAnalysis.

It could be https://bugs.openjdk.java.net/browse/JDK-8175096

Vladimir

On 4/12/17 10:22 AM, Andrew Haley wrote:

> If I set SuperWordLoopUnrollAnalysis=true, then AArch64 C2 stops after
> unrolling a simple loop 4 times.  If I set
> SuperWordLoopUnrollAnalysis=false, it stops after unrolling 16 times.
> Why is is that SuperWordLoopUnrollAnalysis limits unrolling in this
> way?
>
> The info says
> "Map number of unrolls for main loop via Superword Level Parallelism
> analysis" but that doesn't help me very much.  Only AArch64 and x86
> set this option.
>
> Thanks,
>
> Andrew.
>
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: SuperWordLoopUnrollAnalysis and loop unrolling

Yang Zhang
In reply to this post by Andrew Haley
Hi Andrew

Now I also investigate the loop unroll mechanism in aarch64/arm64 C2 too.

Take the vectAddInt as an example, my debug information is listed as follow:

Java example:
public static void vectAddInt(
          int[] a,
          int[] b,
          int c) {
    for (int i = 0; i < LENGTH; i++) {
      a[i] = b[i] + c;
    }
  }

For aarch64 C2 with SuperWordLoopUnrollAnalysis=false:
loop unroll is controlled by comparing body_size and LoopUnrollLimit
("Unroll loop bodies with node count less than this") in function
policy_unroll. When loop unroll is 8 times, body_size is big enough
and it stops.

For aarch64 C2 with SuperWordLoopUnrollAnalysis=true:
First loop unroll is controlled by policy_unroll_slp_analysis. When
loop unroll is 4 times, vectorization happens.
Then loop unroll is controlled by comparing body_size and
LoopUnrollLimit. When loop unroll is 32 times, body_size is big enough
and it stops.

My test result is just opposite with your description. Could you
provide your test case?

Regards
Yang

On 13 April 2017 at 01:22, Andrew Haley <[hidden email]> wrote:

> If I set SuperWordLoopUnrollAnalysis=true, then AArch64 C2 stops after
> unrolling a simple loop 4 times.  If I set
> SuperWordLoopUnrollAnalysis=false, it stops after unrolling 16 times.
> Why is is that SuperWordLoopUnrollAnalysis limits unrolling in this
> way?
>
> The info says
> "Map number of unrolls for main loop via Superword Level Parallelism
> analysis" but that doesn't help me very much.  Only AArch64 and x86
> set this option.
>
> Thanks,
>
> Andrew.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: SuperWordLoopUnrollAnalysis and loop unrolling

Andrew Haley
On 14/04/17 10:21, Yang Zhang wrote:
> My test result is just opposite with your description. Could you
> provide your test case?

    // @Benchmark
    public int[] sameArrayClass(BenchmarkState state) {
        for (int i = 0; i < INITSIZE; i++) {
            state.b[0] = state.b[1];
            state.b[1] = state.b[2];
            state.b[2] = state.b[3];
            state.b[3] = state.b[0];

            state.b[0] = state.b[1];
            state.b[1] = state.b[2];
            state.b[2] = state.b[3];
            state.b[3] = state.b[0];

            state.b[0] = state.b[1];
            state.b[1] = state.b[2];
            state.b[2] = state.b[3];
            state.b[3] = state.b[0];

            state.b[0] = state.b[1];
            state.b[1] = state.b[2];
            state.b[2] = state.b[3];
            state.b[3] = state.b[0];
        }
        return state.b;
    }

This is not vectorizable, but SuperWordLoopUnrollAnalysis=true disables
the unrolling which would make it faster.

Andrew.

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: SuperWordLoopUnrollAnalysis and loop unrolling

Yang Zhang
Hi Andrew

I have run this test case.

For aarch64 C2 with SuperWordLoopUnrollAnalysis=false:
Loop unroll is still controlled by comparing body_size. When loop
unroll is 16 times, body_size is big enough and it stops.

For aarch64 C2 with SuperWordLoopUnrollAnalysis=true:
First loop unroll is controlled by policy_unroll_slp_analysis. When
loop unroll is 4 times, vectorization happens.
If vectorization succeeds, loop unroll would be instigated more by
set_major_progress until body_size is big enough.
if vectorization fails, loop unroll stops.

I haven't run the performance test. But I think in modern CPUs
instructions are run out of order. Loop unroll doesn't always bring
performance improvement.

Regards
Yang

On 15 April 2017 at 00:41, Andrew Haley <[hidden email]> wrote:

> On 14/04/17 10:21, Yang Zhang wrote:
>> My test result is just opposite with your description. Could you
>> provide your test case?
>
>     // @Benchmark
>     public int[] sameArrayClass(BenchmarkState state) {
>         for (int i = 0; i < INITSIZE; i++) {
>             state.b[0] = state.b[1];
>             state.b[1] = state.b[2];
>             state.b[2] = state.b[3];
>             state.b[3] = state.b[0];
>
>             state.b[0] = state.b[1];
>             state.b[1] = state.b[2];
>             state.b[2] = state.b[3];
>             state.b[3] = state.b[0];
>
>             state.b[0] = state.b[1];
>             state.b[1] = state.b[2];
>             state.b[2] = state.b[3];
>             state.b[3] = state.b[0];
>
>             state.b[0] = state.b[1];
>             state.b[1] = state.b[2];
>             state.b[2] = state.b[3];
>             state.b[3] = state.b[0];
>         }
>         return state.b;
>     }
>
> This is not vectorizable, but SuperWordLoopUnrollAnalysis=true disables
> the unrolling which would make it faster.
>
> Andrew.
>
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: SuperWordLoopUnrollAnalysis and loop unrolling

Andrew Haley
On 18/04/17 07:42, Yang Zhang wrote:

> I have run this test case.
>
> For aarch64 C2 with SuperWordLoopUnrollAnalysis=false:
> Loop unroll is still controlled by comparing body_size. When loop
> unroll is 16 times, body_size is big enough and it stops.
>
> For aarch64 C2 with SuperWordLoopUnrollAnalysis=true:
> First loop unroll is controlled by policy_unroll_slp_analysis. When
> loop unroll is 4 times, vectorization happens.
> If vectorization succeeds, loop unroll would be instigated more by
> set_major_progress until body_size is big enough.
> if vectorization fails, loop unroll stops.
>
> I haven't run the performance test.

I don't really understand what your point is.  How can you have run
the test case but not the performance test?  I think that you should
look at the performance in order to understand the issue.

Andrew.

Loading...