RFR(s): 8205582: PPC64: RTM: Fix counter for aborts on nested transactions

classic Classic list List threaded Threaded
13 messages Options
Reply | Threaded
Open this post in threaded view
|

RFR(s): 8205582: PPC64: RTM: Fix counter for aborts on nested transactions

Gustavo Romero
Hi,

Could the following change be reviewed please?

bug   : https://bugs.openjdk.java.net/browse/JDK-8205582
webrev: http://cr.openjdk.java.net/~gromero/8205582/v1/

It fixes the RTM counter for nested aborts (rtm lock aborts type 5) by
extracting and checking bits in the Transactional Level field of TEXASR
register.

It also fixes the memory conflict counter (rtm lock aborts type 2). Power TM
status register supports two bits to inform two different types of memory
conflict between threads: non-transactional and transactional. According to how
the jtreg RTM tests are designed the memory conflict counter counts
non-transactional conflicts: on TestPrintPreciseRTMLockingStatistics a RTM lock
is held on a static variable while another thread without any synchronization
(non-trasactional) tries to modify the same variable. Hence that small
adjustment satisfies the TestPrintPreciseRTMLockingStatistics making it pass on
Power. The memory conflict counter is not used in any other place besides by the
RTM precise statistics (no decision is made by the JVM based on that amount).

This change partially fixes some failures in
compiler/rtm/print/TestPrintPreciseRTMLockingStatistics.java regarding the
nested and memory conflict abort counters. The remaining issue will be fixed by
aborting on calling JNI (next RFR).


Thank you and best regards,
Gustavo

Reply | Threaded
Open this post in threaded view
|

RE: RFR(s): 8205582: PPC64: RTM: Fix counter for aborts on nested transactions

Doerr, Martin
Hi Gustavo,

thanks for addressing this issue.

I think it would be better to ignore bit 63 in the macroAssembler code and use the definition from the spec in assembler_ppc.hpp.
Somebody may want to use the definition for other purposes.

I wonder if Assembler::tm_trans_cf | Assembler::tm_non_trans_cf would be a better match for x86's description for tm_failure_bit[2]. It's also a little unfortunate to print the same bit twice as tm_failure_bit[4].

Best regards,
Martin


-----Original Message-----
From: Gustavo Romero [mailto:[hidden email]]
Sent: Montag, 25. Juni 2018 10:19
To: Lindenmaier, Goetz <[hidden email]>; Doerr, Martin <[hidden email]>; [hidden email]
Cc: [hidden email]
Subject: RFR(s): 8205582: PPC64: RTM: Fix counter for aborts on nested transactions

Hi,

Could the following change be reviewed please?

bug   : https://bugs.openjdk.java.net/browse/JDK-8205582
webrev: http://cr.openjdk.java.net/~gromero/8205582/v1/

It fixes the RTM counter for nested aborts (rtm lock aborts type 5) by
extracting and checking bits in the Transactional Level field of TEXASR
register.

It also fixes the memory conflict counter (rtm lock aborts type 2). Power TM
status register supports two bits to inform two different types of memory
conflict between threads: non-transactional and transactional. According to how
the jtreg RTM tests are designed the memory conflict counter counts
non-transactional conflicts: on TestPrintPreciseRTMLockingStatistics a RTM lock
is held on a static variable while another thread without any synchronization
(non-trasactional) tries to modify the same variable. Hence that small
adjustment satisfies the TestPrintPreciseRTMLockingStatistics making it pass on
Power. The memory conflict counter is not used in any other place besides by the
RTM precise statistics (no decision is made by the JVM based on that amount).

This change partially fixes some failures in
compiler/rtm/print/TestPrintPreciseRTMLockingStatistics.java regarding the
nested and memory conflict abort counters. The remaining issue will be fixed by
aborting on calling JNI (next RFR).


Thank you and best regards,
Gustavo

Reply | Threaded
Open this post in threaded view
|

Re: RFR(s): 8205582: PPC64: RTM: Fix counter for aborts on nested transactions

Gustavo Romero
Hi Martin,

On 06/25/2018 01:31 PM, Doerr, Martin wrote:
> Hi Gustavo,
>
> thanks for addressing this issue.

Thanks lot for reviewing.


> I think it would be better to ignore bit 63 in the macroAssembler code and use the definition from the spec in assembler_ppc.hpp.
> Somebody may want to use the definition for other purposes.

Sure.


> I wonder if Assembler::tm_trans_cf | Assembler::tm_non_trans_cf would be a better match for x86's description for tm_failure_bit[2]. It's also a little unfortunate to print the same bit twice as tm_failure_bit[4].

Absolutely. Matching (Assembler::tm_trans_cf | Assembler::tm_non_trans_cf)
is the most correct way to address it on Power (even if jtreg tests don't
seem to stress the tm_trans_cf cases).

It's indeed unfortunate to generate an asm match rule for tm_failure_bit[4]
twice also. I'm fixing that right now:

The first idea that come up to me is to iterate over the state bits rather
than over the counters. Since it was first designed on Intel counters and
abort_status match 1-by-1 perfectly on x64, so iterating over counters is
just fine. But on Power we can have n (state bits)-to-1 (counter), like
(tm_trans_cf | Assembler::tm_non_trans_cf) -> 1 (abort type 2 counter).
Iterating over state bits also avoids the unfortunate tm_failure_bit[4]
double hit. I'm thinking to use a matrix like the following to track these
relations:


   // RTM counters                             confl.,  aborts, nests, footprint // bits in TEXASR
   bool tm_counter[][counter::NUM_COUNTER] = {{ true ,  false , false, false },  // bit conflict_tm
                                              { true ,  false , false, false },  // bit conflict_non_tm
                                              { false,  true  , false, false },  // bit abort
                                              { false,  false , false, true  },  // bit footprint
                                              { true ,  false , true , false }}; // bit nested


That way we can expand and adapt the relations 'state bits vs counters vs
state bit size (when there is more than 1 bit to match)' easily.


Best regards,
Gustavo
 

> Best regards,
> Martin
>
>
> -----Original Message-----
> From: Gustavo Romero [mailto:[hidden email]]
> Sent: Montag, 25. Juni 2018 10:19
> To: Lindenmaier, Goetz <[hidden email]>; Doerr, Martin <[hidden email]>; [hidden email]
> Cc: [hidden email]
> Subject: RFR(s): 8205582: PPC64: RTM: Fix counter for aborts on nested transactions
>
> Hi,
>
> Could the following change be reviewed please?
>
> bug   : https://bugs.openjdk.java.net/browse/JDK-8205582
> webrev: http://cr.openjdk.java.net/~gromero/8205582/v1/
>
> It fixes the RTM counter for nested aborts (rtm lock aborts type 5) by
> extracting and checking bits in the Transactional Level field of TEXASR
> register.
>
> It also fixes the memory conflict counter (rtm lock aborts type 2). Power TM
> status register supports two bits to inform two different types of memory
> conflict between threads: non-transactional and transactional. According to how
> the jtreg RTM tests are designed the memory conflict counter counts
> non-transactional conflicts: on TestPrintPreciseRTMLockingStatistics a RTM lock
> is held on a static variable while another thread without any synchronization
> (non-trasactional) tries to modify the same variable. Hence that small
> adjustment satisfies the TestPrintPreciseRTMLockingStatistics making it pass on
> Power. The memory conflict counter is not used in any other place besides by the
> RTM precise statistics (no decision is made by the JVM based on that amount).
>
> This change partially fixes some failures in
> compiler/rtm/print/TestPrintPreciseRTMLockingStatistics.java regarding the
> nested and memory conflict abort counters. The remaining issue will be fixed by
> aborting on calling JNI (next RFR).
>
>
> Thank you and best regards,
> Gustavo
>

Reply | Threaded
Open this post in threaded view
|

Re: RFR(s): 8205582: PPC64: RTM: Fix counter for aborts on nested transactions

Gustavo Romero
In reply to this post by Doerr, Martin
Dear Martin,

On 06/25/2018 01:31 PM, Doerr, Martin wrote:
> I think it would be better to ignore bit 63 in the macroAssembler code and use the definition from the spec in assembler_ppc.hpp.
> Somebody may want to use the definition for other purposes.

Done. transactional_level bit is 52 now so bits 52:62 can easily be extracted
using 'rldicr' in macroAssembler.


> I wonder if Assembler::tm_trans_cf | Assembler::tm_non_trans_cf would be a better match for x86's description for tm_failure_bit[2]. It's also a little unfortunate to print the same bit twice as tm_failure_bit[4].

Done. Now both tm_trans_cf and tm_non_trans_cf failures will increment counter 2
(conflict). Duplicated check code for tm_failure_bit[4] was removed and now
counter 4 (debug) is mapped to count traps or syscalls caught in TM events,
which seems a reasonable approximation to the original semantics of the debug
counter on Intel. Unfortunately I could not confirm on AIX how these two events
(trap and syscall in TM) will set the failure code, so the counter will never
track any information on AIX. But with the current proposed change that failure
code can be easily added in the future.

I also realized that I used previously a wrong ME operand value in:

+        // Extract 11 bits
+        rldicr_(temp_Reg, abort_status, tm_failure_bit[i], 11);

It should be 10 to extract 11 bits actually, so all extractions must be correct
now.


I hope you don't find the array map of failure bits vs counters overkilling.

Finally, I replaced the comment:

tm_tabort, // Note: Seems like signal handler sets this, too.

by:

tm_tabort, // Signal handler will set this too.

because we just enable RTM support on Power if 'htm-nosc' is supported, so
treclaim. on aborting the syscall will indeed always set Abort bit in TEXASR
afaik. Since debug counter now tracks trap/syscall in TM it's possible to check
that counter to verify the number of aborts cause by the kernel (if any).

new webrev: http://cr.openjdk.java.net/~gromero/8205582/v2/


Best regards,
Gustavo

> Best regards,
> Martin
>
>
> -----Original Message-----
> From: Gustavo Romero [mailto:[hidden email]]
> Sent: Montag, 25. Juni 2018 10:19
> To: Lindenmaier, Goetz <[hidden email]>; Doerr, Martin <[hidden email]>; [hidden email]
> Cc: [hidden email]
> Subject: RFR(s): 8205582: PPC64: RTM: Fix counter for aborts on nested transactions
>
> Hi,
>
> Could the following change be reviewed please?
>
> bug   : https://bugs.openjdk.java.net/browse/JDK-8205582
> webrev: http://cr.openjdk.java.net/~gromero/8205582/v1/
>
> It fixes the RTM counter for nested aborts (rtm lock aborts type 5) by
> extracting and checking bits in the Transactional Level field of TEXASR
> register.
>
> It also fixes the memory conflict counter (rtm lock aborts type 2). Power TM
> status register supports two bits to inform two different types of memory
> conflict between threads: non-transactional and transactional. According to how
> the jtreg RTM tests are designed the memory conflict counter counts
> non-transactional conflicts: on TestPrintPreciseRTMLockingStatistics a RTM lock
> is held on a static variable while another thread without any synchronization
> (non-trasactional) tries to modify the same variable. Hence that small
> adjustment satisfies the TestPrintPreciseRTMLockingStatistics making it pass on
> Power. The memory conflict counter is not used in any other place besides by the
> RTM precise statistics (no decision is made by the JVM based on that amount).
>
> This change partially fixes some failures in
> compiler/rtm/print/TestPrintPreciseRTMLockingStatistics.java regarding the
> nested and memory conflict abort counters. The remaining issue will be fixed by
> aborting on calling JNI (next RFR).
>
>
> Thank you and best regards,
> Gustavo
>

Reply | Threaded
Open this post in threaded view
|

RE: RFR(s): 8205582: PPC64: RTM: Fix counter for aborts on nested transactions

Doerr, Martin
Hi Gustavo,

I think your new code may increment a counter twice when 2 bits are specified for it. I think this should get fixed.
Iterating over the RTMLockingCounters in the outer loop and performing several checks before incrementing should fix this, right?

Besides that, I have some improvement proposals:

- I think using 3 constant tables is not so good to read. For example, inverting could be encoded in your new 2 dimensional table by using -1, 0 , +1 when using int instead of bool. Would this be better?

- I think you can get rid of rtm_counters_Reg increment and restoration by computing the abort_offs relative to the original value.

Best regards,
Martin


-----Original Message-----
From: Gustavo Romero [mailto:[hidden email]]
Sent: Dienstag, 10. Juli 2018 18:59
To: Doerr, Martin <[hidden email]>; Lindenmaier, Goetz <[hidden email]>; [hidden email]
Cc: [hidden email]
Subject: Re: RFR(s): 8205582: PPC64: RTM: Fix counter for aborts on nested transactions

Dear Martin,

On 06/25/2018 01:31 PM, Doerr, Martin wrote:
> I think it would be better to ignore bit 63 in the macroAssembler code and use the definition from the spec in assembler_ppc.hpp.
> Somebody may want to use the definition for other purposes.

Done. transactional_level bit is 52 now so bits 52:62 can easily be extracted
using 'rldicr' in macroAssembler.


> I wonder if Assembler::tm_trans_cf | Assembler::tm_non_trans_cf would be a better match for x86's description for tm_failure_bit[2]. It's also a little unfortunate to print the same bit twice as tm_failure_bit[4].

Done. Now both tm_trans_cf and tm_non_trans_cf failures will increment counter 2
(conflict). Duplicated check code for tm_failure_bit[4] was removed and now
counter 4 (debug) is mapped to count traps or syscalls caught in TM events,
which seems a reasonable approximation to the original semantics of the debug
counter on Intel. Unfortunately I could not confirm on AIX how these two events
(trap and syscall in TM) will set the failure code, so the counter will never
track any information on AIX. But with the current proposed change that failure
code can be easily added in the future.

I also realized that I used previously a wrong ME operand value in:

+        // Extract 11 bits
+        rldicr_(temp_Reg, abort_status, tm_failure_bit[i], 11);

It should be 10 to extract 11 bits actually, so all extractions must be correct
now.


I hope you don't find the array map of failure bits vs counters overkilling.

Finally, I replaced the comment:

tm_tabort, // Note: Seems like signal handler sets this, too.

by:

tm_tabort, // Signal handler will set this too.

because we just enable RTM support on Power if 'htm-nosc' is supported, so
treclaim. on aborting the syscall will indeed always set Abort bit in TEXASR
afaik. Since debug counter now tracks trap/syscall in TM it's possible to check
that counter to verify the number of aborts cause by the kernel (if any).

new webrev: http://cr.openjdk.java.net/~gromero/8205582/v2/


Best regards,
Gustavo

> Best regards,
> Martin
>
>
> -----Original Message-----
> From: Gustavo Romero [mailto:[hidden email]]
> Sent: Montag, 25. Juni 2018 10:19
> To: Lindenmaier, Goetz <[hidden email]>; Doerr, Martin <[hidden email]>; [hidden email]
> Cc: [hidden email]
> Subject: RFR(s): 8205582: PPC64: RTM: Fix counter for aborts on nested transactions
>
> Hi,
>
> Could the following change be reviewed please?
>
> bug   : https://bugs.openjdk.java.net/browse/JDK-8205582
> webrev: http://cr.openjdk.java.net/~gromero/8205582/v1/
>
> It fixes the RTM counter for nested aborts (rtm lock aborts type 5) by
> extracting and checking bits in the Transactional Level field of TEXASR
> register.
>
> It also fixes the memory conflict counter (rtm lock aborts type 2). Power TM
> status register supports two bits to inform two different types of memory
> conflict between threads: non-transactional and transactional. According to how
> the jtreg RTM tests are designed the memory conflict counter counts
> non-transactional conflicts: on TestPrintPreciseRTMLockingStatistics a RTM lock
> is held on a static variable while another thread without any synchronization
> (non-trasactional) tries to modify the same variable. Hence that small
> adjustment satisfies the TestPrintPreciseRTMLockingStatistics making it pass on
> Power. The memory conflict counter is not used in any other place besides by the
> RTM precise statistics (no decision is made by the JVM based on that amount).
>
> This change partially fixes some failures in
> compiler/rtm/print/TestPrintPreciseRTMLockingStatistics.java regarding the
> nested and memory conflict abort counters. The remaining issue will be fixed by
> aborting on calling JNI (next RFR).
>
>
> Thank you and best regards,
> Gustavo
>

Reply | Threaded
Open this post in threaded view
|

Re: RFR(s): 8205582: PPC64: RTM: Fix counter for aborts on nested transactions

Gustavo Romero
Hi Martin,

On 07/12/2018 12:30 PM, Doerr, Martin wrote:
> I think your new code may increment a counter twice when 2 bits are specified for it. I think this should get fixed.
> Iterating over the RTMLockingCounters in the outer loop and performing several checks before incrementing should fix this, right?

Do you mean increment twice counter #2 (conflict counter) or another counter?
non_trans_cf and trans_cf are mutually exclusive.
You probably spotted a case I'm missing so I ask.


> Besides that, I have some improvement proposals:
>
> - I think using 3 constant tables is not so good to read. For example, inverting could be encoded in your new 2 dimensional table by using -1, 0 , +1 when using int instead of bool. Would this be better?
>
> - I think you can get rid of rtm_counters_Reg increment and restoration by computing the abort_offs relative to the original value.

Cool :)

New (interim) webrev:
http://cr.openjdk.java.net/~gromero/8205582/v3/

Thanks.


Best regards,
Gustavo

> Best regards,
> Martin
>
>
> -----Original Message-----
> From: Gustavo Romero [mailto:[hidden email]]
> Sent: Dienstag, 10. Juli 2018 18:59
> To: Doerr, Martin <[hidden email]>; Lindenmaier, Goetz <[hidden email]>; [hidden email]
> Cc: [hidden email]
> Subject: Re: RFR(s): 8205582: PPC64: RTM: Fix counter for aborts on nested transactions
>
> Dear Martin,
>
> On 06/25/2018 01:31 PM, Doerr, Martin wrote:
>> I think it would be better to ignore bit 63 in the macroAssembler code and use the definition from the spec in assembler_ppc.hpp.
>> Somebody may want to use the definition for other purposes.
>
> Done. transactional_level bit is 52 now so bits 52:62 can easily be extracted
> using 'rldicr' in macroAssembler.
>
>
>> I wonder if Assembler::tm_trans_cf | Assembler::tm_non_trans_cf would be a better match for x86's description for tm_failure_bit[2]. It's also a little unfortunate to print the same bit twice as tm_failure_bit[4].
>
> Done. Now both tm_trans_cf and tm_non_trans_cf failures will increment counter 2
> (conflict). Duplicated check code for tm_failure_bit[4] was removed and now
> counter 4 (debug) is mapped to count traps or syscalls caught in TM events,
> which seems a reasonable approximation to the original semantics of the debug
> counter on Intel. Unfortunately I could not confirm on AIX how these two events
> (trap and syscall in TM) will set the failure code, so the counter will never
> track any information on AIX. But with the current proposed change that failure
> code can be easily added in the future.
>
> I also realized that I used previously a wrong ME operand value in:
>
> +        // Extract 11 bits
> +        rldicr_(temp_Reg, abort_status, tm_failure_bit[i], 11);
>
> It should be 10 to extract 11 bits actually, so all extractions must be correct
> now.
>
>
> I hope you don't find the array map of failure bits vs counters overkilling.
>
> Finally, I replaced the comment:
>
> tm_tabort, // Note: Seems like signal handler sets this, too.
>
> by:
>
> tm_tabort, // Signal handler will set this too.
>
> because we just enable RTM support on Power if 'htm-nosc' is supported, so
> treclaim. on aborting the syscall will indeed always set Abort bit in TEXASR
> afaik. Since debug counter now tracks trap/syscall in TM it's possible to check
> that counter to verify the number of aborts cause by the kernel (if any).
>
> new webrev: http://cr.openjdk.java.net/~gromero/8205582/v2/
>
>
> Best regards,
> Gustavo
>
>> Best regards,
>> Martin
>>
>>
>> -----Original Message-----
>> From: Gustavo Romero [mailto:[hidden email]]
>> Sent: Montag, 25. Juni 2018 10:19
>> To: Lindenmaier, Goetz <[hidden email]>; Doerr, Martin <[hidden email]>; [hidden email]
>> Cc: [hidden email]
>> Subject: RFR(s): 8205582: PPC64: RTM: Fix counter for aborts on nested transactions
>>
>> Hi,
>>
>> Could the following change be reviewed please?
>>
>> bug   : https://bugs.openjdk.java.net/browse/JDK-8205582
>> webrev: http://cr.openjdk.java.net/~gromero/8205582/v1/
>>
>> It fixes the RTM counter for nested aborts (rtm lock aborts type 5) by
>> extracting and checking bits in the Transactional Level field of TEXASR
>> register.
>>
>> It also fixes the memory conflict counter (rtm lock aborts type 2). Power TM
>> status register supports two bits to inform two different types of memory
>> conflict between threads: non-transactional and transactional. According to how
>> the jtreg RTM tests are designed the memory conflict counter counts
>> non-transactional conflicts: on TestPrintPreciseRTMLockingStatistics a RTM lock
>> is held on a static variable while another thread without any synchronization
>> (non-trasactional) tries to modify the same variable. Hence that small
>> adjustment satisfies the TestPrintPreciseRTMLockingStatistics making it pass on
>> Power. The memory conflict counter is not used in any other place besides by the
>> RTM precise statistics (no decision is made by the JVM based on that amount).
>>
>> This change partially fixes some failures in
>> compiler/rtm/print/TestPrintPreciseRTMLockingStatistics.java regarding the
>> nested and memory conflict abort counters. The remaining issue will be fixed by
>> aborting on calling JNI (next RFR).
>>
>>
>> Thank you and best regards,
>> Gustavo
>>
>

Reply | Threaded
Open this post in threaded view
|

RE: RFR(s): 8205582: PPC64: RTM: Fix counter for aborts on nested transactions

Doerr, Martin
Hi Gustavo,

thanks for the new webrev.

You're right, the two bits should be mutual exclusive, so the sum is equivalent to the logical or in this case.
However, the loop pretends to be generic, but it's not. The "or" only works for mutual exclusive bits.
If you want to keep the code with the loops, I think there should be a comment explaining this.

Please also fix indentation.

Alternatively, the loop could get replaced by some code for each bit. Given that the loop is not really generic, I think this wouldn't be worse.

Besides that, it looks good. Thanks for improving it.

Best regards,
Martin


-----Original Message-----
From: Gustavo Romero [mailto:[hidden email]]
Sent: Freitag, 13. Juli 2018 16:56
To: Doerr, Martin <[hidden email]>; Lindenmaier, Goetz <[hidden email]>; [hidden email]
Cc: [hidden email]
Subject: Re: RFR(s): 8205582: PPC64: RTM: Fix counter for aborts on nested transactions

Hi Martin,

On 07/12/2018 12:30 PM, Doerr, Martin wrote:
> I think your new code may increment a counter twice when 2 bits are specified for it. I think this should get fixed.
> Iterating over the RTMLockingCounters in the outer loop and performing several checks before incrementing should fix this, right?

Do you mean increment twice counter #2 (conflict counter) or another counter?
non_trans_cf and trans_cf are mutually exclusive.
You probably spotted a case I'm missing so I ask.


> Besides that, I have some improvement proposals:
>
> - I think using 3 constant tables is not so good to read. For example, inverting could be encoded in your new 2 dimensional table by using -1, 0 , +1 when using int instead of bool. Would this be better?
>
> - I think you can get rid of rtm_counters_Reg increment and restoration by computing the abort_offs relative to the original value.

Cool :)

New (interim) webrev:
http://cr.openjdk.java.net/~gromero/8205582/v3/

Thanks.


Best regards,
Gustavo

> Best regards,
> Martin
>
>
> -----Original Message-----
> From: Gustavo Romero [mailto:[hidden email]]
> Sent: Dienstag, 10. Juli 2018 18:59
> To: Doerr, Martin <[hidden email]>; Lindenmaier, Goetz <[hidden email]>; [hidden email]
> Cc: [hidden email]
> Subject: Re: RFR(s): 8205582: PPC64: RTM: Fix counter for aborts on nested transactions
>
> Dear Martin,
>
> On 06/25/2018 01:31 PM, Doerr, Martin wrote:
>> I think it would be better to ignore bit 63 in the macroAssembler code and use the definition from the spec in assembler_ppc.hpp.
>> Somebody may want to use the definition for other purposes.
>
> Done. transactional_level bit is 52 now so bits 52:62 can easily be extracted
> using 'rldicr' in macroAssembler.
>
>
>> I wonder if Assembler::tm_trans_cf | Assembler::tm_non_trans_cf would be a better match for x86's description for tm_failure_bit[2]. It's also a little unfortunate to print the same bit twice as tm_failure_bit[4].
>
> Done. Now both tm_trans_cf and tm_non_trans_cf failures will increment counter 2
> (conflict). Duplicated check code for tm_failure_bit[4] was removed and now
> counter 4 (debug) is mapped to count traps or syscalls caught in TM events,
> which seems a reasonable approximation to the original semantics of the debug
> counter on Intel. Unfortunately I could not confirm on AIX how these two events
> (trap and syscall in TM) will set the failure code, so the counter will never
> track any information on AIX. But with the current proposed change that failure
> code can be easily added in the future.
>
> I also realized that I used previously a wrong ME operand value in:
>
> +        // Extract 11 bits
> +        rldicr_(temp_Reg, abort_status, tm_failure_bit[i], 11);
>
> It should be 10 to extract 11 bits actually, so all extractions must be correct
> now.
>
>
> I hope you don't find the array map of failure bits vs counters overkilling.
>
> Finally, I replaced the comment:
>
> tm_tabort, // Note: Seems like signal handler sets this, too.
>
> by:
>
> tm_tabort, // Signal handler will set this too.
>
> because we just enable RTM support on Power if 'htm-nosc' is supported, so
> treclaim. on aborting the syscall will indeed always set Abort bit in TEXASR
> afaik. Since debug counter now tracks trap/syscall in TM it's possible to check
> that counter to verify the number of aborts cause by the kernel (if any).
>
> new webrev: http://cr.openjdk.java.net/~gromero/8205582/v2/
>
>
> Best regards,
> Gustavo
>
>> Best regards,
>> Martin
>>
>>
>> -----Original Message-----
>> From: Gustavo Romero [mailto:[hidden email]]
>> Sent: Montag, 25. Juni 2018 10:19
>> To: Lindenmaier, Goetz <[hidden email]>; Doerr, Martin <[hidden email]>; [hidden email]
>> Cc: [hidden email]
>> Subject: RFR(s): 8205582: PPC64: RTM: Fix counter for aborts on nested transactions
>>
>> Hi,
>>
>> Could the following change be reviewed please?
>>
>> bug   : https://bugs.openjdk.java.net/browse/JDK-8205582
>> webrev: http://cr.openjdk.java.net/~gromero/8205582/v1/
>>
>> It fixes the RTM counter for nested aborts (rtm lock aborts type 5) by
>> extracting and checking bits in the Transactional Level field of TEXASR
>> register.
>>
>> It also fixes the memory conflict counter (rtm lock aborts type 2). Power TM
>> status register supports two bits to inform two different types of memory
>> conflict between threads: non-transactional and transactional. According to how
>> the jtreg RTM tests are designed the memory conflict counter counts
>> non-transactional conflicts: on TestPrintPreciseRTMLockingStatistics a RTM lock
>> is held on a static variable while another thread without any synchronization
>> (non-trasactional) tries to modify the same variable. Hence that small
>> adjustment satisfies the TestPrintPreciseRTMLockingStatistics making it pass on
>> Power. The memory conflict counter is not used in any other place besides by the
>> RTM precise statistics (no decision is made by the JVM based on that amount).
>>
>> This change partially fixes some failures in
>> compiler/rtm/print/TestPrintPreciseRTMLockingStatistics.java regarding the
>> nested and memory conflict abort counters. The remaining issue will be fixed by
>> aborting on calling JNI (next RFR).
>>
>>
>> Thank you and best regards,
>> Gustavo
>>
>

Reply | Threaded
Open this post in threaded view
|

Re: RFR(s): 8205582: PPC64: RTM: Fix counter for aborts on nested transactions

Gustavo Romero
Hi Martin,

On 07/16/2018 05:17 AM, Doerr, Martin wrote:
> thanks for the new webrev.

Thanks a lot for the thorough review.


> You're right, the two bits should be mutual exclusive, so the sum is equivalent to the logical or in this case.
> However, the loop pretends to be generic, but it's not. The "or" only works for mutual exclusive bits.
> If you want to keep the code with the loops, I think there should be a comment explaining this.

I see. I've experimented a couple of options following your previous suggestion
of using the counter loop as the outer loop. Most difficult point was to work
around the constraint of having just R0 available as scratch. Bit/bitfield
extracting instrs did not help much since most of them can't perform an OR with
its destination operand, which gets worse with the R0 constraint. On the other,
afaics 'rldimi' which does an OR with its destination operand can't be used to
extract the bit/bitfields in a generic way. That best alternative I found was to
use both CCR0 and CCR1 and their EQ bits. I think all cases are covered now,
i.e. all bits/conditions for a given counter are ORed. failure_code logic is not
inverted any more in the map, which seems more natural. I added also more
information to the comment before the bit/counter map.
I think now the loop is generic.
Webrev for the final result:

http://cr.openjdk.java.net/~gromero/8205582/v4_A/


> Please also fix indentation.

Done.


> Alternatively, the loop could get replaced by some code for each bit. Given that the loop is not really generic, I think this wouldn't be worse.
>
> Besides that, it looks good. Thanks for improving it.

Due to the tight schedule I also provide the corrections for the last reviewed
version:

http://cr.openjdk.java.net/~gromero/8205582/v4_B/

If v4_A also looks good I vouch for pushing it instead of v4_B.


Best regards,
Gustavo

> Best regards,
> Martin
>
>
> -----Original Message-----
> From: Gustavo Romero [mailto:[hidden email]]
> Sent: Freitag, 13. Juli 2018 16:56
> To: Doerr, Martin <[hidden email]>; Lindenmaier, Goetz <[hidden email]>; [hidden email]
> Cc: [hidden email]
> Subject: Re: RFR(s): 8205582: PPC64: RTM: Fix counter for aborts on nested transactions
>
> Hi Martin,
>
> On 07/12/2018 12:30 PM, Doerr, Martin wrote:
>> I think your new code may increment a counter twice when 2 bits are specified for it. I think this should get fixed.
>> Iterating over the RTMLockingCounters in the outer loop and performing several checks before incrementing should fix this, right?
>
> Do you mean increment twice counter #2 (conflict counter) or another counter?
> non_trans_cf and trans_cf are mutually exclusive.
> You probably spotted a case I'm missing so I ask.
>
>
>> Besides that, I have some improvement proposals:
>>
>> - I think using 3 constant tables is not so good to read. For example, inverting could be encoded in your new 2 dimensional table by using -1, 0 , +1 when using int instead of bool. Would this be better?
>>
>> - I think you can get rid of rtm_counters_Reg increment and restoration by computing the abort_offs relative to the original value.
>
> Cool :)
>
> New (interim) webrev:
> http://cr.openjdk.java.net/~gromero/8205582/v3/
>
> Thanks.
>
>
> Best regards,
> Gustavo
>
>> Best regards,
>> Martin
>>
>>
>> -----Original Message-----
>> From: Gustavo Romero [mailto:[hidden email]]
>> Sent: Dienstag, 10. Juli 2018 18:59
>> To: Doerr, Martin <[hidden email]>; Lindenmaier, Goetz <[hidden email]>; [hidden email]
>> Cc: [hidden email]
>> Subject: Re: RFR(s): 8205582: PPC64: RTM: Fix counter for aborts on nested transactions
>>
>> Dear Martin,
>>
>> On 06/25/2018 01:31 PM, Doerr, Martin wrote:
>>> I think it would be better to ignore bit 63 in the macroAssembler code and use the definition from the spec in assembler_ppc.hpp.
>>> Somebody may want to use the definition for other purposes.
>>
>> Done. transactional_level bit is 52 now so bits 52:62 can easily be extracted
>> using 'rldicr' in macroAssembler.
>>
>>
>>> I wonder if Assembler::tm_trans_cf | Assembler::tm_non_trans_cf would be a better match for x86's description for tm_failure_bit[2]. It's also a little unfortunate to print the same bit twice as tm_failure_bit[4].
>>
>> Done. Now both tm_trans_cf and tm_non_trans_cf failures will increment counter 2
>> (conflict). Duplicated check code for tm_failure_bit[4] was removed and now
>> counter 4 (debug) is mapped to count traps or syscalls caught in TM events,
>> which seems a reasonable approximation to the original semantics of the debug
>> counter on Intel. Unfortunately I could not confirm on AIX how these two events
>> (trap and syscall in TM) will set the failure code, so the counter will never
>> track any information on AIX. But with the current proposed change that failure
>> code can be easily added in the future.
>>
>> I also realized that I used previously a wrong ME operand value in:
>>
>> +        // Extract 11 bits
>> +        rldicr_(temp_Reg, abort_status, tm_failure_bit[i], 11);
>>
>> It should be 10 to extract 11 bits actually, so all extractions must be correct
>> now.
>>
>>
>> I hope you don't find the array map of failure bits vs counters overkilling.
>>
>> Finally, I replaced the comment:
>>
>> tm_tabort, // Note: Seems like signal handler sets this, too.
>>
>> by:
>>
>> tm_tabort, // Signal handler will set this too.
>>
>> because we just enable RTM support on Power if 'htm-nosc' is supported, so
>> treclaim. on aborting the syscall will indeed always set Abort bit in TEXASR
>> afaik. Since debug counter now tracks trap/syscall in TM it's possible to check
>> that counter to verify the number of aborts cause by the kernel (if any).
>>
>> new webrev: http://cr.openjdk.java.net/~gromero/8205582/v2/
>>
>>
>> Best regards,
>> Gustavo
>>
>>> Best regards,
>>> Martin
>>>
>>>
>>> -----Original Message-----
>>> From: Gustavo Romero [mailto:[hidden email]]
>>> Sent: Montag, 25. Juni 2018 10:19
>>> To: Lindenmaier, Goetz <[hidden email]>; Doerr, Martin <[hidden email]>; [hidden email]
>>> Cc: [hidden email]
>>> Subject: RFR(s): 8205582: PPC64: RTM: Fix counter for aborts on nested transactions
>>>
>>> Hi,
>>>
>>> Could the following change be reviewed please?
>>>
>>> bug   : https://bugs.openjdk.java.net/browse/JDK-8205582
>>> webrev: http://cr.openjdk.java.net/~gromero/8205582/v1/
>>>
>>> It fixes the RTM counter for nested aborts (rtm lock aborts type 5) by
>>> extracting and checking bits in the Transactional Level field of TEXASR
>>> register.
>>>
>>> It also fixes the memory conflict counter (rtm lock aborts type 2). Power TM
>>> status register supports two bits to inform two different types of memory
>>> conflict between threads: non-transactional and transactional. According to how
>>> the jtreg RTM tests are designed the memory conflict counter counts
>>> non-transactional conflicts: on TestPrintPreciseRTMLockingStatistics a RTM lock
>>> is held on a static variable while another thread without any synchronization
>>> (non-trasactional) tries to modify the same variable. Hence that small
>>> adjustment satisfies the TestPrintPreciseRTMLockingStatistics making it pass on
>>> Power. The memory conflict counter is not used in any other place besides by the
>>> RTM precise statistics (no decision is made by the JVM based on that amount).
>>>
>>> This change partially fixes some failures in
>>> compiler/rtm/print/TestPrintPreciseRTMLockingStatistics.java regarding the
>>> nested and memory conflict abort counters. The remaining issue will be fixed by
>>> aborting on calling JNI (next RFR).
>>>
>>>
>>> Thank you and best regards,
>>> Gustavo
>>>
>>
>

Reply | Threaded
Open this post in threaded view
|

RE: RFR(s): 8205582: PPC64: RTM: Fix counter for aborts on nested transactions

Doerr, Martin
Hi Gustavo,

your webrev v4_B looks good. Reviewed.

I think v4_A wouldn't work appropriately when using +1 and -1 bits together.

A generic version could be something like (just to explain what I was thinking about):
for (int nbit = 0; nbit < num_failure_bits; nbit++) {
  Label do_increment, check_abort;

  int last_match = -1;
  for (ncounter = 0; ncounter < num_counters; ncounter++) {
    if (last_match >= 0) {
      rldicr_(temp_Reg, abort_status_R0, failure_bit[last_match], 0);
      int selection = bit_counter_map[last_match][ncounter];
      if (selection == 1) {
        bne(CCR1, do_increment);
      } else if (selection == -1) {
        beq(CCR1, do_increment);
      }
    }
    last_match = nbit;
  }

  assert(last_match >= 0, "should have at least one");
  rldicr_(temp_Reg, abort_status_R0, failure_bit[last_match], 0);
  int selection = bit_counter_map[last_match][ncounter];
  if (selection == 1) {
    beq(CCR1, check_abort);
  } else if (selection == -1) {
    bne(CCR1, check_abort);
  }
 
  bind(do_increment);
  ld(temp_Reg, abort_counter_offs, rtm_counters_Reg);
  addi(temp_Reg, temp_Reg, 1);
  std(temp_Reg, abort_counter_offs, rtm_counters_Reg);
  bind(check_abort);
}

But I'm fine with webrev v4_B with the comment you have added.

Thanks,
Martin


-----Original Message-----
From: Gustavo Romero [mailto:[hidden email]]
Sent: Dienstag, 17. Juli 2018 08:55
To: Doerr, Martin <[hidden email]>; Lindenmaier, Goetz <[hidden email]>; [hidden email]
Cc: [hidden email]
Subject: Re: RFR(s): 8205582: PPC64: RTM: Fix counter for aborts on nested transactions

Hi Martin,

On 07/16/2018 05:17 AM, Doerr, Martin wrote:
> thanks for the new webrev.

Thanks a lot for the thorough review.


> You're right, the two bits should be mutual exclusive, so the sum is equivalent to the logical or in this case.
> However, the loop pretends to be generic, but it's not. The "or" only works for mutual exclusive bits.
> If you want to keep the code with the loops, I think there should be a comment explaining this.

I see. I've experimented a couple of options following your previous suggestion
of using the counter loop as the outer loop. Most difficult point was to work
around the constraint of having just R0 available as scratch. Bit/bitfield
extracting instrs did not help much since most of them can't perform an OR with
its destination operand, which gets worse with the R0 constraint. On the other,
afaics 'rldimi' which does an OR with its destination operand can't be used to
extract the bit/bitfields in a generic way. That best alternative I found was to
use both CCR0 and CCR1 and their EQ bits. I think all cases are covered now,
i.e. all bits/conditions for a given counter are ORed. failure_code logic is not
inverted any more in the map, which seems more natural. I added also more
information to the comment before the bit/counter map.
I think now the loop is generic.
Webrev for the final result:

http://cr.openjdk.java.net/~gromero/8205582/v4_A/


> Please also fix indentation.

Done.


> Alternatively, the loop could get replaced by some code for each bit. Given that the loop is not really generic, I think this wouldn't be worse.
>
> Besides that, it looks good. Thanks for improving it.

Due to the tight schedule I also provide the corrections for the last reviewed
version:

http://cr.openjdk.java.net/~gromero/8205582/v4_B/

If v4_A also looks good I vouch for pushing it instead of v4_B.


Best regards,
Gustavo

> Best regards,
> Martin
>
>
> -----Original Message-----
> From: Gustavo Romero [mailto:[hidden email]]
> Sent: Freitag, 13. Juli 2018 16:56
> To: Doerr, Martin <[hidden email]>; Lindenmaier, Goetz <[hidden email]>; [hidden email]
> Cc: [hidden email]
> Subject: Re: RFR(s): 8205582: PPC64: RTM: Fix counter for aborts on nested transactions
>
> Hi Martin,
>
> On 07/12/2018 12:30 PM, Doerr, Martin wrote:
>> I think your new code may increment a counter twice when 2 bits are specified for it. I think this should get fixed.
>> Iterating over the RTMLockingCounters in the outer loop and performing several checks before incrementing should fix this, right?
>
> Do you mean increment twice counter #2 (conflict counter) or another counter?
> non_trans_cf and trans_cf are mutually exclusive.
> You probably spotted a case I'm missing so I ask.
>
>
>> Besides that, I have some improvement proposals:
>>
>> - I think using 3 constant tables is not so good to read. For example, inverting could be encoded in your new 2 dimensional table by using -1, 0 , +1 when using int instead of bool. Would this be better?
>>
>> - I think you can get rid of rtm_counters_Reg increment and restoration by computing the abort_offs relative to the original value.
>
> Cool :)
>
> New (interim) webrev:
> http://cr.openjdk.java.net/~gromero/8205582/v3/
>
> Thanks.
>
>
> Best regards,
> Gustavo
>
>> Best regards,
>> Martin
>>
>>
>> -----Original Message-----
>> From: Gustavo Romero [mailto:[hidden email]]
>> Sent: Dienstag, 10. Juli 2018 18:59
>> To: Doerr, Martin <[hidden email]>; Lindenmaier, Goetz <[hidden email]>; [hidden email]
>> Cc: [hidden email]
>> Subject: Re: RFR(s): 8205582: PPC64: RTM: Fix counter for aborts on nested transactions
>>
>> Dear Martin,
>>
>> On 06/25/2018 01:31 PM, Doerr, Martin wrote:
>>> I think it would be better to ignore bit 63 in the macroAssembler code and use the definition from the spec in assembler_ppc.hpp.
>>> Somebody may want to use the definition for other purposes.
>>
>> Done. transactional_level bit is 52 now so bits 52:62 can easily be extracted
>> using 'rldicr' in macroAssembler.
>>
>>
>>> I wonder if Assembler::tm_trans_cf | Assembler::tm_non_trans_cf would be a better match for x86's description for tm_failure_bit[2]. It's also a little unfortunate to print the same bit twice as tm_failure_bit[4].
>>
>> Done. Now both tm_trans_cf and tm_non_trans_cf failures will increment counter 2
>> (conflict). Duplicated check code for tm_failure_bit[4] was removed and now
>> counter 4 (debug) is mapped to count traps or syscalls caught in TM events,
>> which seems a reasonable approximation to the original semantics of the debug
>> counter on Intel. Unfortunately I could not confirm on AIX how these two events
>> (trap and syscall in TM) will set the failure code, so the counter will never
>> track any information on AIX. But with the current proposed change that failure
>> code can be easily added in the future.
>>
>> I also realized that I used previously a wrong ME operand value in:
>>
>> +        // Extract 11 bits
>> +        rldicr_(temp_Reg, abort_status, tm_failure_bit[i], 11);
>>
>> It should be 10 to extract 11 bits actually, so all extractions must be correct
>> now.
>>
>>
>> I hope you don't find the array map of failure bits vs counters overkilling.
>>
>> Finally, I replaced the comment:
>>
>> tm_tabort, // Note: Seems like signal handler sets this, too.
>>
>> by:
>>
>> tm_tabort, // Signal handler will set this too.
>>
>> because we just enable RTM support on Power if 'htm-nosc' is supported, so
>> treclaim. on aborting the syscall will indeed always set Abort bit in TEXASR
>> afaik. Since debug counter now tracks trap/syscall in TM it's possible to check
>> that counter to verify the number of aborts cause by the kernel (if any).
>>
>> new webrev: http://cr.openjdk.java.net/~gromero/8205582/v2/
>>
>>
>> Best regards,
>> Gustavo
>>
>>> Best regards,
>>> Martin
>>>
>>>
>>> -----Original Message-----
>>> From: Gustavo Romero [mailto:[hidden email]]
>>> Sent: Montag, 25. Juni 2018 10:19
>>> To: Lindenmaier, Goetz <[hidden email]>; Doerr, Martin <[hidden email]>; [hidden email]
>>> Cc: [hidden email]
>>> Subject: RFR(s): 8205582: PPC64: RTM: Fix counter for aborts on nested transactions
>>>
>>> Hi,
>>>
>>> Could the following change be reviewed please?
>>>
>>> bug   : https://bugs.openjdk.java.net/browse/JDK-8205582
>>> webrev: http://cr.openjdk.java.net/~gromero/8205582/v1/
>>>
>>> It fixes the RTM counter for nested aborts (rtm lock aborts type 5) by
>>> extracting and checking bits in the Transactional Level field of TEXASR
>>> register.
>>>
>>> It also fixes the memory conflict counter (rtm lock aborts type 2). Power TM
>>> status register supports two bits to inform two different types of memory
>>> conflict between threads: non-transactional and transactional. According to how
>>> the jtreg RTM tests are designed the memory conflict counter counts
>>> non-transactional conflicts: on TestPrintPreciseRTMLockingStatistics a RTM lock
>>> is held on a static variable while another thread without any synchronization
>>> (non-trasactional) tries to modify the same variable. Hence that small
>>> adjustment satisfies the TestPrintPreciseRTMLockingStatistics making it pass on
>>> Power. The memory conflict counter is not used in any other place besides by the
>>> RTM precise statistics (no decision is made by the JVM based on that amount).
>>>
>>> This change partially fixes some failures in
>>> compiler/rtm/print/TestPrintPreciseRTMLockingStatistics.java regarding the
>>> nested and memory conflict abort counters. The remaining issue will be fixed by
>>> aborting on calling JNI (next RFR).
>>>
>>>
>>> Thank you and best regards,
>>> Gustavo
>>>
>>
>

Reply | Threaded
Open this post in threaded view
|

Re: RFR(s): 8205582: PPC64: RTM: Fix counter for aborts on nested transactions

Gustavo Romero
Hi Martin,

OK. Let's continue with v4_B [1] so.

Thanks for reviewing it!


Best regards,
Gustavo

[1] http://cr.openjdk.java.net/~gromero/8205582/v4_B

On 07/17/2018 06:02 AM, Doerr, Martin wrote:

> Hi Gustavo,
>
> your webrev v4_B looks good. Reviewed.
>
> I think v4_A wouldn't work appropriately when using +1 and -1 bits together.
>
> A generic version could be something like (just to explain what I was thinking about):
> for (int nbit = 0; nbit < num_failure_bits; nbit++) {
>    Label do_increment, check_abort;
>
>    int last_match = -1;
>    for (ncounter = 0; ncounter < num_counters; ncounter++) {
>      if (last_match >= 0) {
>        rldicr_(temp_Reg, abort_status_R0, failure_bit[last_match], 0);
>        int selection = bit_counter_map[last_match][ncounter];
>        if (selection == 1) {
>          bne(CCR1, do_increment);
>        } else if (selection == -1) {
>          beq(CCR1, do_increment);
>        }
>      }
>      last_match = nbit;
>    }
>
>    assert(last_match >= 0, "should have at least one");
>    rldicr_(temp_Reg, abort_status_R0, failure_bit[last_match], 0);
>    int selection = bit_counter_map[last_match][ncounter];
>    if (selection == 1) {
>      beq(CCR1, check_abort);
>    } else if (selection == -1) {
>      bne(CCR1, check_abort);
>    }
>  
>    bind(do_increment);
>    ld(temp_Reg, abort_counter_offs, rtm_counters_Reg);
>    addi(temp_Reg, temp_Reg, 1);
>    std(temp_Reg, abort_counter_offs, rtm_counters_Reg);
>    bind(check_abort);
> }
>
> But I'm fine with webrev v4_B with the comment you have added.
>
> Thanks,
> Martin
>
>
> -----Original Message-----
> From: Gustavo Romero [mailto:[hidden email]]
> Sent: Dienstag, 17. Juli 2018 08:55
> To: Doerr, Martin <[hidden email]>; Lindenmaier, Goetz <[hidden email]>; [hidden email]
> Cc: [hidden email]
> Subject: Re: RFR(s): 8205582: PPC64: RTM: Fix counter for aborts on nested transactions
>
> Hi Martin,
>
> On 07/16/2018 05:17 AM, Doerr, Martin wrote:
>> thanks for the new webrev.
>
> Thanks a lot for the thorough review.
>
>
>> You're right, the two bits should be mutual exclusive, so the sum is equivalent to the logical or in this case.
>> However, the loop pretends to be generic, but it's not. The "or" only works for mutual exclusive bits.
>> If you want to keep the code with the loops, I think there should be a comment explaining this.
>
> I see. I've experimented a couple of options following your previous suggestion
> of using the counter loop as the outer loop. Most difficult point was to work
> around the constraint of having just R0 available as scratch. Bit/bitfield
> extracting instrs did not help much since most of them can't perform an OR with
> its destination operand, which gets worse with the R0 constraint. On the other,
> afaics 'rldimi' which does an OR with its destination operand can't be used to
> extract the bit/bitfields in a generic way. That best alternative I found was to
> use both CCR0 and CCR1 and their EQ bits. I think all cases are covered now,
> i.e. all bits/conditions for a given counter are ORed. failure_code logic is not
> inverted any more in the map, which seems more natural. I added also more
> information to the comment before the bit/counter map.
> I think now the loop is generic.
> Webrev for the final result:
>
> http://cr.openjdk.java.net/~gromero/8205582/v4_A/
>
>
>> Please also fix indentation.
>
> Done.
>
>
>> Alternatively, the loop could get replaced by some code for each bit. Given that the loop is not really generic, I think this wouldn't be worse.
>>
>> Besides that, it looks good. Thanks for improving it.
>
> Due to the tight schedule I also provide the corrections for the last reviewed
> version:
>
> http://cr.openjdk.java.net/~gromero/8205582/v4_B/
>
> If v4_A also looks good I vouch for pushing it instead of v4_B.
>
>
> Best regards,
> Gustavo
>
>> Best regards,
>> Martin
>>
>>
>> -----Original Message-----
>> From: Gustavo Romero [mailto:[hidden email]]
>> Sent: Freitag, 13. Juli 2018 16:56
>> To: Doerr, Martin <[hidden email]>; Lindenmaier, Goetz <[hidden email]>; [hidden email]
>> Cc: [hidden email]
>> Subject: Re: RFR(s): 8205582: PPC64: RTM: Fix counter for aborts on nested transactions
>>
>> Hi Martin,
>>
>> On 07/12/2018 12:30 PM, Doerr, Martin wrote:
>>> I think your new code may increment a counter twice when 2 bits are specified for it. I think this should get fixed.
>>> Iterating over the RTMLockingCounters in the outer loop and performing several checks before incrementing should fix this, right?
>>
>> Do you mean increment twice counter #2 (conflict counter) or another counter?
>> non_trans_cf and trans_cf are mutually exclusive.
>> You probably spotted a case I'm missing so I ask.
>>
>>
>>> Besides that, I have some improvement proposals:
>>>
>>> - I think using 3 constant tables is not so good to read. For example, inverting could be encoded in your new 2 dimensional table by using -1, 0 , +1 when using int instead of bool. Would this be better?
>>>
>>> - I think you can get rid of rtm_counters_Reg increment and restoration by computing the abort_offs relative to the original value.
>>
>> Cool :)
>>
>> New (interim) webrev:
>> http://cr.openjdk.java.net/~gromero/8205582/v3/
>>
>> Thanks.
>>
>>
>> Best regards,
>> Gustavo
>>
>>> Best regards,
>>> Martin
>>>
>>>
>>> -----Original Message-----
>>> From: Gustavo Romero [mailto:[hidden email]]
>>> Sent: Dienstag, 10. Juli 2018 18:59
>>> To: Doerr, Martin <[hidden email]>; Lindenmaier, Goetz <[hidden email]>; [hidden email]
>>> Cc: [hidden email]
>>> Subject: Re: RFR(s): 8205582: PPC64: RTM: Fix counter for aborts on nested transactions
>>>
>>> Dear Martin,
>>>
>>> On 06/25/2018 01:31 PM, Doerr, Martin wrote:
>>>> I think it would be better to ignore bit 63 in the macroAssembler code and use the definition from the spec in assembler_ppc.hpp.
>>>> Somebody may want to use the definition for other purposes.
>>>
>>> Done. transactional_level bit is 52 now so bits 52:62 can easily be extracted
>>> using 'rldicr' in macroAssembler.
>>>
>>>
>>>> I wonder if Assembler::tm_trans_cf | Assembler::tm_non_trans_cf would be a better match for x86's description for tm_failure_bit[2]. It's also a little unfortunate to print the same bit twice as tm_failure_bit[4].
>>>
>>> Done. Now both tm_trans_cf and tm_non_trans_cf failures will increment counter 2
>>> (conflict). Duplicated check code for tm_failure_bit[4] was removed and now
>>> counter 4 (debug) is mapped to count traps or syscalls caught in TM events,
>>> which seems a reasonable approximation to the original semantics of the debug
>>> counter on Intel. Unfortunately I could not confirm on AIX how these two events
>>> (trap and syscall in TM) will set the failure code, so the counter will never
>>> track any information on AIX. But with the current proposed change that failure
>>> code can be easily added in the future.
>>>
>>> I also realized that I used previously a wrong ME operand value in:
>>>
>>> +        // Extract 11 bits
>>> +        rldicr_(temp_Reg, abort_status, tm_failure_bit[i], 11);
>>>
>>> It should be 10 to extract 11 bits actually, so all extractions must be correct
>>> now.
>>>
>>>
>>> I hope you don't find the array map of failure bits vs counters overkilling.
>>>
>>> Finally, I replaced the comment:
>>>
>>> tm_tabort, // Note: Seems like signal handler sets this, too.
>>>
>>> by:
>>>
>>> tm_tabort, // Signal handler will set this too.
>>>
>>> because we just enable RTM support on Power if 'htm-nosc' is supported, so
>>> treclaim. on aborting the syscall will indeed always set Abort bit in TEXASR
>>> afaik. Since debug counter now tracks trap/syscall in TM it's possible to check
>>> that counter to verify the number of aborts cause by the kernel (if any).
>>>
>>> new webrev: http://cr.openjdk.java.net/~gromero/8205582/v2/
>>>
>>>
>>> Best regards,
>>> Gustavo
>>>
>>>> Best regards,
>>>> Martin
>>>>
>>>>
>>>> -----Original Message-----
>>>> From: Gustavo Romero [mailto:[hidden email]]
>>>> Sent: Montag, 25. Juni 2018 10:19
>>>> To: Lindenmaier, Goetz <[hidden email]>; Doerr, Martin <[hidden email]>; [hidden email]
>>>> Cc: [hidden email]
>>>> Subject: RFR(s): 8205582: PPC64: RTM: Fix counter for aborts on nested transactions
>>>>
>>>> Hi,
>>>>
>>>> Could the following change be reviewed please?
>>>>
>>>> bug   : https://bugs.openjdk.java.net/browse/JDK-8205582
>>>> webrev: http://cr.openjdk.java.net/~gromero/8205582/v1/
>>>>
>>>> It fixes the RTM counter for nested aborts (rtm lock aborts type 5) by
>>>> extracting and checking bits in the Transactional Level field of TEXASR
>>>> register.
>>>>
>>>> It also fixes the memory conflict counter (rtm lock aborts type 2). Power TM
>>>> status register supports two bits to inform two different types of memory
>>>> conflict between threads: non-transactional and transactional. According to how
>>>> the jtreg RTM tests are designed the memory conflict counter counts
>>>> non-transactional conflicts: on TestPrintPreciseRTMLockingStatistics a RTM lock
>>>> is held on a static variable while another thread without any synchronization
>>>> (non-trasactional) tries to modify the same variable. Hence that small
>>>> adjustment satisfies the TestPrintPreciseRTMLockingStatistics making it pass on
>>>> Power. The memory conflict counter is not used in any other place besides by the
>>>> RTM precise statistics (no decision is made by the JVM based on that amount).
>>>>
>>>> This change partially fixes some failures in
>>>> compiler/rtm/print/TestPrintPreciseRTMLockingStatistics.java regarding the
>>>> nested and memory conflict abort counters. The remaining issue will be fixed by
>>>> aborting on calling JNI (next RFR).
>>>>
>>>>
>>>> Thank you and best regards,
>>>> Gustavo
>>>>
>>>
>>
>

Reply | Threaded
Open this post in threaded view
|

Re: RFR(s): 8205582: PPC64: RTM: Fix counter for aborts on nested transactions

Gustavo Romero
Hi,

Could I get a second review for that change please?

Best regards,
Gustavo

On 07/17/2018 08:18 AM, Gustavo Romero wrote:

> Hi Martin,
>
> OK. Let's continue with v4_B [1] so.
>
> Thanks for reviewing it!
>
>
> Best regards,
> Gustavo
>
> [1] http://cr.openjdk.java.net/~gromero/8205582/v4_B
>
> On 07/17/2018 06:02 AM, Doerr, Martin wrote:
>> Hi Gustavo,
>>
>> your webrev v4_B looks good. Reviewed.
>>
>> I think v4_A wouldn't work appropriately when using +1 and -1 bits together.
>>
>> A generic version could be something like (just to explain what I was thinking about):
>> for (int nbit = 0; nbit < num_failure_bits; nbit++) {
>>    Label do_increment, check_abort;
>>
>>    int last_match = -1;
>>    for (ncounter = 0; ncounter < num_counters; ncounter++) {
>>      if (last_match >= 0) {
>>        rldicr_(temp_Reg, abort_status_R0, failure_bit[last_match], 0);
>>        int selection = bit_counter_map[last_match][ncounter];
>>        if (selection == 1) {
>>          bne(CCR1, do_increment);
>>        } else if (selection == -1) {
>>          beq(CCR1, do_increment);
>>        }
>>      }
>>      last_match = nbit;
>>    }
>>
>>    assert(last_match >= 0, "should have at least one");
>>    rldicr_(temp_Reg, abort_status_R0, failure_bit[last_match], 0);
>>    int selection = bit_counter_map[last_match][ncounter];
>>    if (selection == 1) {
>>      beq(CCR1, check_abort);
>>    } else if (selection == -1) {
>>      bne(CCR1, check_abort);
>>    }
>>    bind(do_increment);
>>    ld(temp_Reg, abort_counter_offs, rtm_counters_Reg);
>>    addi(temp_Reg, temp_Reg, 1);
>>    std(temp_Reg, abort_counter_offs, rtm_counters_Reg);
>>    bind(check_abort);
>> }
>>
>> But I'm fine with webrev v4_B with the comment you have added.
>>
>> Thanks,
>> Martin
>>
>>
>> -----Original Message-----
>> From: Gustavo Romero [mailto:[hidden email]]
>> Sent: Dienstag, 17. Juli 2018 08:55
>> To: Doerr, Martin <[hidden email]>; Lindenmaier, Goetz <[hidden email]>; [hidden email]
>> Cc: [hidden email]
>> Subject: Re: RFR(s): 8205582: PPC64: RTM: Fix counter for aborts on nested transactions
>>
>> Hi Martin,
>>
>> On 07/16/2018 05:17 AM, Doerr, Martin wrote:
>>> thanks for the new webrev.
>>
>> Thanks a lot for the thorough review.
>>
>>
>>> You're right, the two bits should be mutual exclusive, so the sum is equivalent to the logical or in this case.
>>> However, the loop pretends to be generic, but it's not. The "or" only works for mutual exclusive bits.
>>> If you want to keep the code with the loops, I think there should be a comment explaining this.
>>
>> I see. I've experimented a couple of options following your previous suggestion
>> of using the counter loop as the outer loop. Most difficult point was to work
>> around the constraint of having just R0 available as scratch. Bit/bitfield
>> extracting instrs did not help much since most of them can't perform an OR with
>> its destination operand, which gets worse with the R0 constraint. On the other,
>> afaics 'rldimi' which does an OR with its destination operand can't be used to
>> extract the bit/bitfields in a generic way. That best alternative I found was to
>> use both CCR0 and CCR1 and their EQ bits. I think all cases are covered now,
>> i.e. all bits/conditions for a given counter are ORed. failure_code logic is not
>> inverted any more in the map, which seems more natural. I added also more
>> information to the comment before the bit/counter map.
>> I think now the loop is generic.
>> Webrev for the final result:
>>
>> http://cr.openjdk.java.net/~gromero/8205582/v4_A/
>>
>>
>>> Please also fix indentation.
>>
>> Done.
>>
>>
>>> Alternatively, the loop could get replaced by some code for each bit. Given that the loop is not really generic, I think this wouldn't be worse.
>>>
>>> Besides that, it looks good. Thanks for improving it.
>>
>> Due to the tight schedule I also provide the corrections for the last reviewed
>> version:
>>
>> http://cr.openjdk.java.net/~gromero/8205582/v4_B/
>>
>> If v4_A also looks good I vouch for pushing it instead of v4_B.
>>
>>
>> Best regards,
>> Gustavo
>>
>>> Best regards,
>>> Martin
>>>
>>>
>>> -----Original Message-----
>>> From: Gustavo Romero [mailto:[hidden email]]
>>> Sent: Freitag, 13. Juli 2018 16:56
>>> To: Doerr, Martin <[hidden email]>; Lindenmaier, Goetz <[hidden email]>; [hidden email]
>>> Cc: [hidden email]
>>> Subject: Re: RFR(s): 8205582: PPC64: RTM: Fix counter for aborts on nested transactions
>>>
>>> Hi Martin,
>>>
>>> On 07/12/2018 12:30 PM, Doerr, Martin wrote:
>>>> I think your new code may increment a counter twice when 2 bits are specified for it. I think this should get fixed.
>>>> Iterating over the RTMLockingCounters in the outer loop and performing several checks before incrementing should fix this, right?
>>>
>>> Do you mean increment twice counter #2 (conflict counter) or another counter?
>>> non_trans_cf and trans_cf are mutually exclusive.
>>> You probably spotted a case I'm missing so I ask.
>>>
>>>
>>>> Besides that, I have some improvement proposals:
>>>>
>>>> - I think using 3 constant tables is not so good to read. For example, inverting could be encoded in your new 2 dimensional table by using -1, 0 , +1 when using int instead of bool. Would this be better?
>>>>
>>>> - I think you can get rid of rtm_counters_Reg increment and restoration by computing the abort_offs relative to the original value.
>>>
>>> Cool :)
>>>
>>> New (interim) webrev:
>>> http://cr.openjdk.java.net/~gromero/8205582/v3/
>>>
>>> Thanks.
>>>
>>>
>>> Best regards,
>>> Gustavo
>>>
>>>> Best regards,
>>>> Martin
>>>>
>>>>
>>>> -----Original Message-----
>>>> From: Gustavo Romero [mailto:[hidden email]]
>>>> Sent: Dienstag, 10. Juli 2018 18:59
>>>> To: Doerr, Martin <[hidden email]>; Lindenmaier, Goetz <[hidden email]>; [hidden email]
>>>> Cc: [hidden email]
>>>> Subject: Re: RFR(s): 8205582: PPC64: RTM: Fix counter for aborts on nested transactions
>>>>
>>>> Dear Martin,
>>>>
>>>> On 06/25/2018 01:31 PM, Doerr, Martin wrote:
>>>>> I think it would be better to ignore bit 63 in the macroAssembler code and use the definition from the spec in assembler_ppc.hpp.
>>>>> Somebody may want to use the definition for other purposes.
>>>>
>>>> Done. transactional_level bit is 52 now so bits 52:62 can easily be extracted
>>>> using 'rldicr' in macroAssembler.
>>>>
>>>>
>>>>> I wonder if Assembler::tm_trans_cf | Assembler::tm_non_trans_cf would be a better match for x86's description for tm_failure_bit[2]. It's also a little unfortunate to print the same bit twice as tm_failure_bit[4].
>>>>
>>>> Done. Now both tm_trans_cf and tm_non_trans_cf failures will increment counter 2
>>>> (conflict). Duplicated check code for tm_failure_bit[4] was removed and now
>>>> counter 4 (debug) is mapped to count traps or syscalls caught in TM events,
>>>> which seems a reasonable approximation to the original semantics of the debug
>>>> counter on Intel. Unfortunately I could not confirm on AIX how these two events
>>>> (trap and syscall in TM) will set the failure code, so the counter will never
>>>> track any information on AIX. But with the current proposed change that failure
>>>> code can be easily added in the future.
>>>>
>>>> I also realized that I used previously a wrong ME operand value in:
>>>>
>>>> +        // Extract 11 bits
>>>> +        rldicr_(temp_Reg, abort_status, tm_failure_bit[i], 11);
>>>>
>>>> It should be 10 to extract 11 bits actually, so all extractions must be correct
>>>> now.
>>>>
>>>>
>>>> I hope you don't find the array map of failure bits vs counters overkilling.
>>>>
>>>> Finally, I replaced the comment:
>>>>
>>>> tm_tabort, // Note: Seems like signal handler sets this, too.
>>>>
>>>> by:
>>>>
>>>> tm_tabort, // Signal handler will set this too.
>>>>
>>>> because we just enable RTM support on Power if 'htm-nosc' is supported, so
>>>> treclaim. on aborting the syscall will indeed always set Abort bit in TEXASR
>>>> afaik. Since debug counter now tracks trap/syscall in TM it's possible to check
>>>> that counter to verify the number of aborts cause by the kernel (if any).
>>>>
>>>> new webrev: http://cr.openjdk.java.net/~gromero/8205582/v2/
>>>>
>>>>
>>>> Best regards,
>>>> Gustavo
>>>>
>>>>> Best regards,
>>>>> Martin
>>>>>
>>>>>
>>>>> -----Original Message-----
>>>>> From: Gustavo Romero [mailto:[hidden email]]
>>>>> Sent: Montag, 25. Juni 2018 10:19
>>>>> To: Lindenmaier, Goetz <[hidden email]>; Doerr, Martin <[hidden email]>; [hidden email]
>>>>> Cc: [hidden email]
>>>>> Subject: RFR(s): 8205582: PPC64: RTM: Fix counter for aborts on nested transactions
>>>>>
>>>>> Hi,
>>>>>
>>>>> Could the following change be reviewed please?
>>>>>
>>>>> bug   : https://bugs.openjdk.java.net/browse/JDK-8205582
>>>>> webrev: http://cr.openjdk.java.net/~gromero/8205582/v1/
>>>>>
>>>>> It fixes the RTM counter for nested aborts (rtm lock aborts type 5) by
>>>>> extracting and checking bits in the Transactional Level field of TEXASR
>>>>> register.
>>>>>
>>>>> It also fixes the memory conflict counter (rtm lock aborts type 2). Power TM
>>>>> status register supports two bits to inform two different types of memory
>>>>> conflict between threads: non-transactional and transactional. According to how
>>>>> the jtreg RTM tests are designed the memory conflict counter counts
>>>>> non-transactional conflicts: on TestPrintPreciseRTMLockingStatistics a RTM lock
>>>>> is held on a static variable while another thread without any synchronization
>>>>> (non-trasactional) tries to modify the same variable. Hence that small
>>>>> adjustment satisfies the TestPrintPreciseRTMLockingStatistics making it pass on
>>>>> Power. The memory conflict counter is not used in any other place besides by the
>>>>> RTM precise statistics (no decision is made by the JVM based on that amount).
>>>>>
>>>>> This change partially fixes some failures in
>>>>> compiler/rtm/print/TestPrintPreciseRTMLockingStatistics.java regarding the
>>>>> nested and memory conflict abort counters. The remaining issue will be fixed by
>>>>> aborting on calling JNI (next RFR).
>>>>>
>>>>>
>>>>> Thank you and best regards,
>>>>> Gustavo
>>>>>
>>>>
>>>
>>
>

Reply | Threaded
Open this post in threaded view
|

RE: RFR(s): 8205582: PPC64: RTM: Fix counter for aborts on nested transactions

Lindenmaier, Goetz
Hi Gustavo,

I had a look at your change. Basically looks good.

Some smaller things:
2440      Otherwise, counter will be increment
2441   // more than once.
This should read:
Otherwise, the counter will be incremented more than once.

Can you please put declaration and assignments on one line?
2472     int abortX_offs;
2473     abortX_offs = RTMLockingCounters::abortX_count_offset();
This should read:
2472     int abortX_offs = RTMLockingCounters::abortX_count_offset();
Similar 2479+2481.

I don't need a new webrev for these fixes. Reviewed.

Best regards,
  Goetz.

> -----Original Message-----
> From: Gustavo Romero [mailto:[hidden email]]
> Sent: Dienstag, 17. Juli 2018 15:45
> To: Lindenmaier, Goetz <[hidden email]>
> Cc: Doerr, Martin <[hidden email]>; hotspot-compiler-
> [hidden email]; [hidden email]
> Subject: Re: RFR(s): 8205582: PPC64: RTM: Fix counter for aborts on nested
> transactions
>
> Hi,
>
> Could I get a second review for that change please?
>
> Best regards,
> Gustavo
>
> On 07/17/2018 08:18 AM, Gustavo Romero wrote:
> > Hi Martin,
> >
> > OK. Let's continue with v4_B [1] so.
> >
> > Thanks for reviewing it!
> >
> >
> > Best regards,
> > Gustavo
> >
> > [1] http://cr.openjdk.java.net/~gromero/8205582/v4_B
> >
> > On 07/17/2018 06:02 AM, Doerr, Martin wrote:
> >> Hi Gustavo,
> >>
> >> your webrev v4_B looks good. Reviewed.
> >>
> >> I think v4_A wouldn't work appropriately when using +1 and -1 bits
> together.
> >>
> >> A generic version could be something like (just to explain what I was
> thinking about):
> >> for (int nbit = 0; nbit < num_failure_bits; nbit++) {
> >>    Label do_increment, check_abort;
> >>
> >>    int last_match = -1;
> >>    for (ncounter = 0; ncounter < num_counters; ncounter++) {
> >>      if (last_match >= 0) {
> >>        rldicr_(temp_Reg, abort_status_R0, failure_bit[last_match], 0);
> >>        int selection = bit_counter_map[last_match][ncounter];
> >>        if (selection == 1) {
> >>          bne(CCR1, do_increment);
> >>        } else if (selection == -1) {
> >>          beq(CCR1, do_increment);
> >>        }
> >>      }
> >>      last_match = nbit;
> >>    }
> >>
> >>    assert(last_match >= 0, "should have at least one");
> >>    rldicr_(temp_Reg, abort_status_R0, failure_bit[last_match], 0);
> >>    int selection = bit_counter_map[last_match][ncounter];
> >>    if (selection == 1) {
> >>      beq(CCR1, check_abort);
> >>    } else if (selection == -1) {
> >>      bne(CCR1, check_abort);
> >>    }
> >>    bind(do_increment);
> >>    ld(temp_Reg, abort_counter_offs, rtm_counters_Reg);
> >>    addi(temp_Reg, temp_Reg, 1);
> >>    std(temp_Reg, abort_counter_offs, rtm_counters_Reg);
> >>    bind(check_abort);
> >> }
> >>
> >> But I'm fine with webrev v4_B with the comment you have added.
> >>
> >> Thanks,
> >> Martin
> >>
> >>
> >> -----Original Message-----
> >> From: Gustavo Romero [mailto:[hidden email]]
> >> Sent: Dienstag, 17. Juli 2018 08:55
> >> To: Doerr, Martin <[hidden email]>; Lindenmaier, Goetz
> <[hidden email]>; [hidden email]
> >> Cc: [hidden email]
> >> Subject: Re: RFR(s): 8205582: PPC64: RTM: Fix counter for aborts on
> nested transactions
> >>
> >> Hi Martin,
> >>
> >> On 07/16/2018 05:17 AM, Doerr, Martin wrote:
> >>> thanks for the new webrev.
> >>
> >> Thanks a lot for the thorough review.
> >>
> >>
> >>> You're right, the two bits should be mutual exclusive, so the sum is
> equivalent to the logical or in this case.
> >>> However, the loop pretends to be generic, but it's not. The "or" only
> works for mutual exclusive bits.
> >>> If you want to keep the code with the loops, I think there should be a
> comment explaining this.
> >>
> >> I see. I've experimented a couple of options following your previous
> suggestion
> >> of using the counter loop as the outer loop. Most difficult point was to
> work
> >> around the constraint of having just R0 available as scratch. Bit/bitfield
> >> extracting instrs did not help much since most of them can't perform an
> OR with
> >> its destination operand, which gets worse with the R0 constraint. On the
> other,
> >> afaics 'rldimi' which does an OR with its destination operand can't be used
> to
> >> extract the bit/bitfields in a generic way. That best alternative I found was
> to
> >> use both CCR0 and CCR1 and their EQ bits. I think all cases are covered
> now,
> >> i.e. all bits/conditions for a given counter are ORed. failure_code logic is
> not
> >> inverted any more in the map, which seems more natural. I added also
> more
> >> information to the comment before the bit/counter map.
> >> I think now the loop is generic.
> >> Webrev for the final result:
> >>
> >> http://cr.openjdk.java.net/~gromero/8205582/v4_A/
> >>
> >>
> >>> Please also fix indentation.
> >>
> >> Done.
> >>
> >>
> >>> Alternatively, the loop could get replaced by some code for each bit.
> Given that the loop is not really generic, I think this wouldn't be worse.
> >>>
> >>> Besides that, it looks good. Thanks for improving it.
> >>
> >> Due to the tight schedule I also provide the corrections for the last
> reviewed
> >> version:
> >>
> >> http://cr.openjdk.java.net/~gromero/8205582/v4_B/
> >>
> >> If v4_A also looks good I vouch for pushing it instead of v4_B.
> >>
> >>
> >> Best regards,
> >> Gustavo
> >>
> >>> Best regards,
> >>> Martin
> >>>
> >>>
> >>> -----Original Message-----
> >>> From: Gustavo Romero [mailto:[hidden email]]
> >>> Sent: Freitag, 13. Juli 2018 16:56
> >>> To: Doerr, Martin <[hidden email]>; Lindenmaier, Goetz
> <[hidden email]>; [hidden email]
> >>> Cc: [hidden email]
> >>> Subject: Re: RFR(s): 8205582: PPC64: RTM: Fix counter for aborts on
> nested transactions
> >>>
> >>> Hi Martin,
> >>>
> >>> On 07/12/2018 12:30 PM, Doerr, Martin wrote:
> >>>> I think your new code may increment a counter twice when 2 bits are
> specified for it. I think this should get fixed.
> >>>> Iterating over the RTMLockingCounters in the outer loop and
> performing several checks before incrementing should fix this, right?
> >>>
> >>> Do you mean increment twice counter #2 (conflict counter) or another
> counter?
> >>> non_trans_cf and trans_cf are mutually exclusive.
> >>> You probably spotted a case I'm missing so I ask.
> >>>
> >>>
> >>>> Besides that, I have some improvement proposals:
> >>>>
> >>>> - I think using 3 constant tables is not so good to read. For example,
> inverting could be encoded in your new 2 dimensional table by using -1, 0 , +1
> when using int instead of bool. Would this be better?
> >>>>
> >>>> - I think you can get rid of rtm_counters_Reg increment and restoration
> by computing the abort_offs relative to the original value.
> >>>
> >>> Cool :)
> >>>
> >>> New (interim) webrev:
> >>> http://cr.openjdk.java.net/~gromero/8205582/v3/
> >>>
> >>> Thanks.
> >>>
> >>>
> >>> Best regards,
> >>> Gustavo
> >>>
> >>>> Best regards,
> >>>> Martin
> >>>>
> >>>>
> >>>> -----Original Message-----
> >>>> From: Gustavo Romero [mailto:[hidden email]]
> >>>> Sent: Dienstag, 10. Juli 2018 18:59
> >>>> To: Doerr, Martin <[hidden email]>; Lindenmaier, Goetz
> <[hidden email]>; [hidden email]
> >>>> Cc: [hidden email]
> >>>> Subject: Re: RFR(s): 8205582: PPC64: RTM: Fix counter for aborts on
> nested transactions
> >>>>
> >>>> Dear Martin,
> >>>>
> >>>> On 06/25/2018 01:31 PM, Doerr, Martin wrote:
> >>>>> I think it would be better to ignore bit 63 in the macroAssembler code
> and use the definition from the spec in assembler_ppc.hpp.
> >>>>> Somebody may want to use the definition for other purposes.
> >>>>
> >>>> Done. transactional_level bit is 52 now so bits 52:62 can easily be
> extracted
> >>>> using 'rldicr' in macroAssembler.
> >>>>
> >>>>
> >>>>> I wonder if Assembler::tm_trans_cf | Assembler::tm_non_trans_cf
> would be a better match for x86's description for tm_failure_bit[2]. It's also a
> little unfortunate to print the same bit twice as tm_failure_bit[4].
> >>>>
> >>>> Done. Now both tm_trans_cf and tm_non_trans_cf failures will
> increment counter 2
> >>>> (conflict). Duplicated check code for tm_failure_bit[4] was removed
> and now
> >>>> counter 4 (debug) is mapped to count traps or syscalls caught in TM
> events,
> >>>> which seems a reasonable approximation to the original semantics of
> the debug
> >>>> counter on Intel. Unfortunately I could not confirm on AIX how these
> two events
> >>>> (trap and syscall in TM) will set the failure code, so the counter will
> never
> >>>> track any information on AIX. But with the current proposed change
> that failure
> >>>> code can be easily added in the future.
> >>>>
> >>>> I also realized that I used previously a wrong ME operand value in:
> >>>>
> >>>> +        // Extract 11 bits
> >>>> +        rldicr_(temp_Reg, abort_status, tm_failure_bit[i], 11);
> >>>>
> >>>> It should be 10 to extract 11 bits actually, so all extractions must be
> correct
> >>>> now.
> >>>>
> >>>>
> >>>> I hope you don't find the array map of failure bits vs counters
> overkilling.
> >>>>
> >>>> Finally, I replaced the comment:
> >>>>
> >>>> tm_tabort, // Note: Seems like signal handler sets this, too.
> >>>>
> >>>> by:
> >>>>
> >>>> tm_tabort, // Signal handler will set this too.
> >>>>
> >>>> because we just enable RTM support on Power if 'htm-nosc' is
> supported, so
> >>>> treclaim. on aborting the syscall will indeed always set Abort bit in
> TEXASR
> >>>> afaik. Since debug counter now tracks trap/syscall in TM it's possible to
> check
> >>>> that counter to verify the number of aborts cause by the kernel (if
> any).
> >>>>
> >>>> new webrev: http://cr.openjdk.java.net/~gromero/8205582/v2/
> >>>>
> >>>>
> >>>> Best regards,
> >>>> Gustavo
> >>>>
> >>>>> Best regards,
> >>>>> Martin
> >>>>>
> >>>>>
> >>>>> -----Original Message-----
> >>>>> From: Gustavo Romero [mailto:[hidden email]]
> >>>>> Sent: Montag, 25. Juni 2018 10:19
> >>>>> To: Lindenmaier, Goetz <[hidden email]>; Doerr, Martin
> <[hidden email]>; [hidden email]
> >>>>> Cc: [hidden email]
> >>>>> Subject: RFR(s): 8205582: PPC64: RTM: Fix counter for aborts on
> nested transactions
> >>>>>
> >>>>> Hi,
> >>>>>
> >>>>> Could the following change be reviewed please?
> >>>>>
> >>>>> bug   : https://bugs.openjdk.java.net/browse/JDK-8205582
> >>>>> webrev: http://cr.openjdk.java.net/~gromero/8205582/v1/
> >>>>>
> >>>>> It fixes the RTM counter for nested aborts (rtm lock aborts type 5) by
> >>>>> extracting and checking bits in the Transactional Level field of TEXASR
> >>>>> register.
> >>>>>
> >>>>> It also fixes the memory conflict counter (rtm lock aborts type 2).
> Power TM
> >>>>> status register supports two bits to inform two different types of
> memory
> >>>>> conflict between threads: non-transactional and transactional.
> According to how
> >>>>> the jtreg RTM tests are designed the memory conflict counter counts
> >>>>> non-transactional conflicts: on TestPrintPreciseRTMLockingStatistics a
> RTM lock
> >>>>> is held on a static variable while another thread without any
> synchronization
> >>>>> (non-trasactional) tries to modify the same variable. Hence that small
> >>>>> adjustment satisfies the TestPrintPreciseRTMLockingStatistics making
> it pass on
> >>>>> Power. The memory conflict counter is not used in any other place
> besides by the
> >>>>> RTM precise statistics (no decision is made by the JVM based on that
> amount).
> >>>>>
> >>>>> This change partially fixes some failures in
> >>>>> compiler/rtm/print/TestPrintPreciseRTMLockingStatistics.java
> regarding the
> >>>>> nested and memory conflict abort counters. The remaining issue will
> be fixed by
> >>>>> aborting on calling JNI (next RFR).
> >>>>>
> >>>>>
> >>>>> Thank you and best regards,
> >>>>> Gustavo
> >>>>>
> >>>>
> >>>
> >>
> >

Reply | Threaded
Open this post in threaded view
|

Re: RFR(s): 8205582: PPC64: RTM: Fix counter for aborts on nested transactions

Gustavo Romero
On 07/18/2018 04:34 AM, Lindenmaier, Goetz wrote:

> Hi Gustavo,
>
> I had a look at your change. Basically looks good.
>
> Some smaller things:
> 2440      Otherwise, counter will be increment
> 2441   // more than once.
> This should read:
> Otherwise, the counter will be incremented more than once.
>
> Can you please put declaration and assignments on one line?
> 2472     int abortX_offs;
> 2473     abortX_offs = RTMLockingCounters::abortX_count_offset();
> This should read:
> 2472     int abortX_offs = RTMLockingCounters::abortX_count_offset();
> Similar 2479+2481.
>
> I don't need a new webrev for these fixes. Reviewed.

Thanks, Goetz. I'll fix them and push the change to jdk/jdk11 today.


Best regards,
Gustavo
 

> Best regards,
>    Goetz.
>
>> -----Original Message-----
>> From: Gustavo Romero [mailto:[hidden email]]
>> Sent: Dienstag, 17. Juli 2018 15:45
>> To: Lindenmaier, Goetz <[hidden email]>
>> Cc: Doerr, Martin <[hidden email]>; hotspot-compiler-
>> [hidden email]; [hidden email]
>> Subject: Re: RFR(s): 8205582: PPC64: RTM: Fix counter for aborts on nested
>> transactions
>>
>> Hi,
>>
>> Could I get a second review for that change please?
>>
>> Best regards,
>> Gustavo
>>
>> On 07/17/2018 08:18 AM, Gustavo Romero wrote:
>>> Hi Martin,
>>>
>>> OK. Let's continue with v4_B [1] so.
>>>
>>> Thanks for reviewing it!
>>>
>>>
>>> Best regards,
>>> Gustavo
>>>
>>> [1] http://cr.openjdk.java.net/~gromero/8205582/v4_B
>>>
>>> On 07/17/2018 06:02 AM, Doerr, Martin wrote:
>>>> Hi Gustavo,
>>>>
>>>> your webrev v4_B looks good. Reviewed.
>>>>
>>>> I think v4_A wouldn't work appropriately when using +1 and -1 bits
>> together.
>>>>
>>>> A generic version could be something like (just to explain what I was
>> thinking about):
>>>> for (int nbit = 0; nbit < num_failure_bits; nbit++) {
>>>>     Label do_increment, check_abort;
>>>>
>>>>     int last_match = -1;
>>>>     for (ncounter = 0; ncounter < num_counters; ncounter++) {
>>>>       if (last_match >= 0) {
>>>>         rldicr_(temp_Reg, abort_status_R0, failure_bit[last_match], 0);
>>>>         int selection = bit_counter_map[last_match][ncounter];
>>>>         if (selection == 1) {
>>>>           bne(CCR1, do_increment);
>>>>         } else if (selection == -1) {
>>>>           beq(CCR1, do_increment);
>>>>         }
>>>>       }
>>>>       last_match = nbit;
>>>>     }
>>>>
>>>>     assert(last_match >= 0, "should have at least one");
>>>>     rldicr_(temp_Reg, abort_status_R0, failure_bit[last_match], 0);
>>>>     int selection = bit_counter_map[last_match][ncounter];
>>>>     if (selection == 1) {
>>>>       beq(CCR1, check_abort);
>>>>     } else if (selection == -1) {
>>>>       bne(CCR1, check_abort);
>>>>     }
>>>>     bind(do_increment);
>>>>     ld(temp_Reg, abort_counter_offs, rtm_counters_Reg);
>>>>     addi(temp_Reg, temp_Reg, 1);
>>>>     std(temp_Reg, abort_counter_offs, rtm_counters_Reg);
>>>>     bind(check_abort);
>>>> }
>>>>
>>>> But I'm fine with webrev v4_B with the comment you have added.
>>>>
>>>> Thanks,
>>>> Martin
>>>>
>>>>
>>>> -----Original Message-----
>>>> From: Gustavo Romero [mailto:[hidden email]]
>>>> Sent: Dienstag, 17. Juli 2018 08:55
>>>> To: Doerr, Martin <[hidden email]>; Lindenmaier, Goetz
>> <[hidden email]>; [hidden email]
>>>> Cc: [hidden email]
>>>> Subject: Re: RFR(s): 8205582: PPC64: RTM: Fix counter for aborts on
>> nested transactions
>>>>
>>>> Hi Martin,
>>>>
>>>> On 07/16/2018 05:17 AM, Doerr, Martin wrote:
>>>>> thanks for the new webrev.
>>>>
>>>> Thanks a lot for the thorough review.
>>>>
>>>>
>>>>> You're right, the two bits should be mutual exclusive, so the sum is
>> equivalent to the logical or in this case.
>>>>> However, the loop pretends to be generic, but it's not. The "or" only
>> works for mutual exclusive bits.
>>>>> If you want to keep the code with the loops, I think there should be a
>> comment explaining this.
>>>>
>>>> I see. I've experimented a couple of options following your previous
>> suggestion
>>>> of using the counter loop as the outer loop. Most difficult point was to
>> work
>>>> around the constraint of having just R0 available as scratch. Bit/bitfield
>>>> extracting instrs did not help much since most of them can't perform an
>> OR with
>>>> its destination operand, which gets worse with the R0 constraint. On the
>> other,
>>>> afaics 'rldimi' which does an OR with its destination operand can't be used
>> to
>>>> extract the bit/bitfields in a generic way. That best alternative I found was
>> to
>>>> use both CCR0 and CCR1 and their EQ bits. I think all cases are covered
>> now,
>>>> i.e. all bits/conditions for a given counter are ORed. failure_code logic is
>> not
>>>> inverted any more in the map, which seems more natural. I added also
>> more
>>>> information to the comment before the bit/counter map.
>>>> I think now the loop is generic.
>>>> Webrev for the final result:
>>>>
>>>> http://cr.openjdk.java.net/~gromero/8205582/v4_A/
>>>>
>>>>
>>>>> Please also fix indentation.
>>>>
>>>> Done.
>>>>
>>>>
>>>>> Alternatively, the loop could get replaced by some code for each bit.
>> Given that the loop is not really generic, I think this wouldn't be worse.
>>>>>
>>>>> Besides that, it looks good. Thanks for improving it.
>>>>
>>>> Due to the tight schedule I also provide the corrections for the last
>> reviewed
>>>> version:
>>>>
>>>> http://cr.openjdk.java.net/~gromero/8205582/v4_B/
>>>>
>>>> If v4_A also looks good I vouch for pushing it instead of v4_B.
>>>>
>>>>
>>>> Best regards,
>>>> Gustavo
>>>>
>>>>> Best regards,
>>>>> Martin
>>>>>
>>>>>
>>>>> -----Original Message-----
>>>>> From: Gustavo Romero [mailto:[hidden email]]
>>>>> Sent: Freitag, 13. Juli 2018 16:56
>>>>> To: Doerr, Martin <[hidden email]>; Lindenmaier, Goetz
>> <[hidden email]>; [hidden email]
>>>>> Cc: [hidden email]
>>>>> Subject: Re: RFR(s): 8205582: PPC64: RTM: Fix counter for aborts on
>> nested transactions
>>>>>
>>>>> Hi Martin,
>>>>>
>>>>> On 07/12/2018 12:30 PM, Doerr, Martin wrote:
>>>>>> I think your new code may increment a counter twice when 2 bits are
>> specified for it. I think this should get fixed.
>>>>>> Iterating over the RTMLockingCounters in the outer loop and
>> performing several checks before incrementing should fix this, right?
>>>>>
>>>>> Do you mean increment twice counter #2 (conflict counter) or another
>> counter?
>>>>> non_trans_cf and trans_cf are mutually exclusive.
>>>>> You probably spotted a case I'm missing so I ask.
>>>>>
>>>>>
>>>>>> Besides that, I have some improvement proposals:
>>>>>>
>>>>>> - I think using 3 constant tables is not so good to read. For example,
>> inverting could be encoded in your new 2 dimensional table by using -1, 0 , +1
>> when using int instead of bool. Would this be better?
>>>>>>
>>>>>> - I think you can get rid of rtm_counters_Reg increment and restoration
>> by computing the abort_offs relative to the original value.
>>>>>
>>>>> Cool :)
>>>>>
>>>>> New (interim) webrev:
>>>>> http://cr.openjdk.java.net/~gromero/8205582/v3/
>>>>>
>>>>> Thanks.
>>>>>
>>>>>
>>>>> Best regards,
>>>>> Gustavo
>>>>>
>>>>>> Best regards,
>>>>>> Martin
>>>>>>
>>>>>>
>>>>>> -----Original Message-----
>>>>>> From: Gustavo Romero [mailto:[hidden email]]
>>>>>> Sent: Dienstag, 10. Juli 2018 18:59
>>>>>> To: Doerr, Martin <[hidden email]>; Lindenmaier, Goetz
>> <[hidden email]>; [hidden email]
>>>>>> Cc: [hidden email]
>>>>>> Subject: Re: RFR(s): 8205582: PPC64: RTM: Fix counter for aborts on
>> nested transactions
>>>>>>
>>>>>> Dear Martin,
>>>>>>
>>>>>> On 06/25/2018 01:31 PM, Doerr, Martin wrote:
>>>>>>> I think it would be better to ignore bit 63 in the macroAssembler code
>> and use the definition from the spec in assembler_ppc.hpp.
>>>>>>> Somebody may want to use the definition for other purposes.
>>>>>>
>>>>>> Done. transactional_level bit is 52 now so bits 52:62 can easily be
>> extracted
>>>>>> using 'rldicr' in macroAssembler.
>>>>>>
>>>>>>
>>>>>>> I wonder if Assembler::tm_trans_cf | Assembler::tm_non_trans_cf
>> would be a better match for x86's description for tm_failure_bit[2]. It's also a
>> little unfortunate to print the same bit twice as tm_failure_bit[4].
>>>>>>
>>>>>> Done. Now both tm_trans_cf and tm_non_trans_cf failures will
>> increment counter 2
>>>>>> (conflict). Duplicated check code for tm_failure_bit[4] was removed
>> and now
>>>>>> counter 4 (debug) is mapped to count traps or syscalls caught in TM
>> events,
>>>>>> which seems a reasonable approximation to the original semantics of
>> the debug
>>>>>> counter on Intel. Unfortunately I could not confirm on AIX how these
>> two events
>>>>>> (trap and syscall in TM) will set the failure code, so the counter will
>> never
>>>>>> track any information on AIX. But with the current proposed change
>> that failure
>>>>>> code can be easily added in the future.
>>>>>>
>>>>>> I also realized that I used previously a wrong ME operand value in:
>>>>>>
>>>>>> +        // Extract 11 bits
>>>>>> +        rldicr_(temp_Reg, abort_status, tm_failure_bit[i], 11);
>>>>>>
>>>>>> It should be 10 to extract 11 bits actually, so all extractions must be
>> correct
>>>>>> now.
>>>>>>
>>>>>>
>>>>>> I hope you don't find the array map of failure bits vs counters
>> overkilling.
>>>>>>
>>>>>> Finally, I replaced the comment:
>>>>>>
>>>>>> tm_tabort, // Note: Seems like signal handler sets this, too.
>>>>>>
>>>>>> by:
>>>>>>
>>>>>> tm_tabort, // Signal handler will set this too.
>>>>>>
>>>>>> because we just enable RTM support on Power if 'htm-nosc' is
>> supported, so
>>>>>> treclaim. on aborting the syscall will indeed always set Abort bit in
>> TEXASR
>>>>>> afaik. Since debug counter now tracks trap/syscall in TM it's possible to
>> check
>>>>>> that counter to verify the number of aborts cause by the kernel (if
>> any).
>>>>>>
>>>>>> new webrev: http://cr.openjdk.java.net/~gromero/8205582/v2/
>>>>>>
>>>>>>
>>>>>> Best regards,
>>>>>> Gustavo
>>>>>>
>>>>>>> Best regards,
>>>>>>> Martin
>>>>>>>
>>>>>>>
>>>>>>> -----Original Message-----
>>>>>>> From: Gustavo Romero [mailto:[hidden email]]
>>>>>>> Sent: Montag, 25. Juni 2018 10:19
>>>>>>> To: Lindenmaier, Goetz <[hidden email]>; Doerr, Martin
>> <[hidden email]>; [hidden email]
>>>>>>> Cc: [hidden email]
>>>>>>> Subject: RFR(s): 8205582: PPC64: RTM: Fix counter for aborts on
>> nested transactions
>>>>>>>
>>>>>>> Hi,
>>>>>>>
>>>>>>> Could the following change be reviewed please?
>>>>>>>
>>>>>>> bug   : https://bugs.openjdk.java.net/browse/JDK-8205582
>>>>>>> webrev: http://cr.openjdk.java.net/~gromero/8205582/v1/
>>>>>>>
>>>>>>> It fixes the RTM counter for nested aborts (rtm lock aborts type 5) by
>>>>>>> extracting and checking bits in the Transactional Level field of TEXASR
>>>>>>> register.
>>>>>>>
>>>>>>> It also fixes the memory conflict counter (rtm lock aborts type 2).
>> Power TM
>>>>>>> status register supports two bits to inform two different types of
>> memory
>>>>>>> conflict between threads: non-transactional and transactional.
>> According to how
>>>>>>> the jtreg RTM tests are designed the memory conflict counter counts
>>>>>>> non-transactional conflicts: on TestPrintPreciseRTMLockingStatistics a
>> RTM lock
>>>>>>> is held on a static variable while another thread without any
>> synchronization
>>>>>>> (non-trasactional) tries to modify the same variable. Hence that small
>>>>>>> adjustment satisfies the TestPrintPreciseRTMLockingStatistics making
>> it pass on
>>>>>>> Power. The memory conflict counter is not used in any other place
>> besides by the
>>>>>>> RTM precise statistics (no decision is made by the JVM based on that
>> amount).
>>>>>>>
>>>>>>> This change partially fixes some failures in
>>>>>>> compiler/rtm/print/TestPrintPreciseRTMLockingStatistics.java
>> regarding the
>>>>>>> nested and memory conflict abort counters. The remaining issue will
>> be fixed by
>>>>>>> aborting on calling JNI (next RFR).
>>>>>>>
>>>>>>>
>>>>>>> Thank you and best regards,
>>>>>>> Gustavo
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>