RFR 6913047: SunPKCS11 memory leak

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

RFR 6913047: SunPKCS11 memory leak

Martin Balao
Hi,

I'd like to propose a fix for bug JDK-6913047: "Long term memory leak when using PKCS11 and JCE exceeds 32 bit process address space" [1]. This fix does not contain changes in the GC and is SunPKCS11 internal only.

PROBLEM
........................................................

When using the SunPKCS11 crypto provider (for cipher, signature, mac, key generation or any other operation), multiple key objects may be created. I.e.: every time a TLS session is established, a unique master key (derived from the pre-master-key) has to be created and then used for encryption and decryption operations. This is a legitimate use case in which key caching does not make sense as each key is unique per session. These keys are of P11Key type and have a corresponding native key object created. In the case of NSS SunPKCS11 backend (PKCS11 software token), this native key object is temporarily stored in the process native heap. The interface is simple: a JNI call is done to create a native key object (C_CreateObject, C_CopyObject, C_DeriveKey, C_GenerateKeys, etc., according to the PKCS11 interface) and an integer handler is kept in the Java side (P11Key). When the P11Key object is destroyed, a finalizer code is executed to free the native key object (through C_DestroyObject JNI call). The problem is that finalizer code execution happens only if the JVM garbage-collector cleans up the P11Key object. That may be delayed or not done at all, depending on different GC algorithms, parameters and environment conditions. As a result, the native heap may be exhausted with not freed native key objects, and the JVM will then crash -this is particularly true for 32 bits VMs where the virtual address space can be exhausted-.


SCOPE
........................................................

The fix is proposed for SunPKCS11 with NSS backend only. Other PKCS11 backends are not currently under scope. It's likely that hardware PKCS11 backends store native key objects in their own memory, preventing a native heap exhaustion and a JVM crash. However, it might be possible to cause an exhaustion on their own memory blocking key objects creation at some point. In any case, this is speculative as no tests were done on our side with real hardware.


SOLUTION
........................................................

Assuming that native keys are extractable, the idea is to hold native key data in the Java heap while keys are not in use. When a P11Key is created, every CK_ATTRIBUTE (PKCS11) value for the native key is queried, data stored in an opaque Java byte[] (inside the P11Key object) and native key destroyed. Every time the P11Key is about to be used, the native key is created with the stored data. After usage, the native key is again destroyed. Thus, it's not necessary to wait for a finalizer execution to cleanup native resources: cleanup is done at deterministic and previously-known points. This comes with a resposibility for key users -which are all SunPKCS11 internal services like P11Signature, P11Cipher, P11KeyGenerator, etc.-: create and destroy native keys through a reference counting scheme exposed by P11Key class. There are two kind of usages:

 1) stateless: the native key is "atomically" created, used and destroyed. I.e.: MAC calculation, getEncodedInternal operation (on P11Key objects), signature operations, TLS key derivation, etc.

 2) statefull: the native key is created, one or multiple intermediate actions are performed by the key user, a final action is performed and finally the native key is destroyed. I.e.: cipher operations.

For keys that are extractable but sensitive (CKA_SENSITIVE attribute is true), as the case when operating in FIPS mode, wrapping/unwrapping is used as a workaround to extract session keys. Wrapper key is global and lives forever.

There are no interface changes for SunPKCS11 external users.

If keys are not extractable or the feature cannot be enabled for any other reason, the previous finalizer scheme is used as a fallback.


ADDITIONAL IMPLEMENTATION NOTES
........................................................

When a P11Key is created, a constructor parameter exists to indicate if the feature is enabled for that specific key. For this feature to be enabled, 2 additional conditions apply: 1) SunPKCS11 backend has to be NSS, and 2) key has to be extractable. If the feature is not enabled for a key, behavior is as previous to this patch (native key destruction relies on finalizer execution).

The only P11Key user that explicitly does not use this feature is P11KeyStore. This is because these keys (token keys) are managed by alias names and makes no sense to remove them from the key store (they need to be accessible by an alias at any time).

Because P11Key objects can be used by multiple threads at a time, there is a thread-safe reference counting scheme in order to decide when a native key object has to be created or destroyed. The SunPKCS11 internal API to use a P11Key is as follows: 1) increment the reference counter (which will eventually create the native key object if it doesn't exist), 2) use the key and 3) decrement the reference counter (which will eventually destroy the native key if there it's not being used by anyone else).

The reason why an opaque byte[] is used in P11Key objects to store native keys data (instead of a CK_ATTRIBUTE[] Java objects, queried by Java's C_GetAttributeValue function) is performance. My prototypes show a difference of 4x in speed. 2 functions were added to libj2pkcs11 library: getNativeKeyInfo (to extract the opaque byte[] from a native key object) and createNativeKey (to create a native key object from an opaque byte[]).


CHANGESET
........................................................

This changeset is JDK-10 (at jdk c8796a577885 rev) based:



TESTING
........................................................


 * Suite (Bug6913047.java)
  * Tests JVM memory exhaustion while using keys for different services: P11Cipher, P11Signature, P11KeyAgreement, P11Mac, P11Digest, P11KeyGenerator, P11KeyFactory, etc.
  * Tests functional regression.
   * Including Key Stores (P11KeyStore)

Parameters to run the reproducer (on JDK-10): 
 * javac: --add-modules jdk.crypto.cryptoki --add-exports java.base/sun.security.internal.spec=ALL-UNNAMED 
 * java: -XX:+UseParallelGC -Xmx3500m --add-modules jdk.crypto.cryptoki --add-opens java.base/javax.crypto=ALL-UNNAMED --add-opens jdk.crypto.cryptoki/sun.security.pkcs11=ALL-UNNAMED

You can also use jtreg.


PERFORMANCE
........................................................

For a quick reproducer previously developed (which looped 100000 times creating P11Cipher and P11Key objects to encrypt a plaintext), these are the figures I got:

 * real 1m11.328s (without fix)
 * real 1m12.795s (with fix)

Performance penalty seems to be low in current state.

OTHER
.......................................................

My employer has an OCA agreement with Oracle and this work has been done in that context.

Look forward to your comments.

Kind regards,
Martin.-

--
Reply | Threaded
Open this post in threaded view
|

Re: RFR 6913047: SunPKCS11 memory leak

Martin Balao
SCOPE clarification:

 * The proposed patch should be solving the issue in HSMs also. Real hardware testing would be needed to check that, but it should be working.

On Wed, Oct 11, 2017 at 10:31 AM, Martin Balao <[hidden email]> wrote:
Hi,

I'd like to propose a fix for bug JDK-6913047: "Long term memory leak when using PKCS11 and JCE exceeds 32 bit process address space" [1]. This fix does not contain changes in the GC and is SunPKCS11 internal only.

PROBLEM
........................................................

When using the SunPKCS11 crypto provider (for cipher, signature, mac, key generation or any other operation), multiple key objects may be created. I.e.: every time a TLS session is established, a unique master key (derived from the pre-master-key) has to be created and then used for encryption and decryption operations. This is a legitimate use case in which key caching does not make sense as each key is unique per session. These keys are of P11Key type and have a corresponding native key object created. In the case of NSS SunPKCS11 backend (PKCS11 software token), this native key object is temporarily stored in the process native heap. The interface is simple: a JNI call is done to create a native key object (C_CreateObject, C_CopyObject, C_DeriveKey, C_GenerateKeys, etc., according to the PKCS11 interface) and an integer handler is kept in the Java side (P11Key). When the P11Key object is destroyed, a finalizer code is executed to free the native key object (through C_DestroyObject JNI call). The problem is that finalizer code execution happens only if the JVM garbage-collector cleans up the P11Key object. That may be delayed or not done at all, depending on different GC algorithms, parameters and environment conditions. As a result, the native heap may be exhausted with not freed native key objects, and the JVM will then crash -this is particularly true for 32 bits VMs where the virtual address space can be exhausted-.


SCOPE
........................................................

The fix is proposed for SunPKCS11 with NSS backend only. Other PKCS11 backends are not currently under scope. It's likely that hardware PKCS11 backends store native key objects in their own memory, preventing a native heap exhaustion and a JVM crash. However, it might be possible to cause an exhaustion on their own memory blocking key objects creation at some point. In any case, this is speculative as no tests were done on our side with real hardware.


SOLUTION
........................................................

Assuming that native keys are extractable, the idea is to hold native key data in the Java heap while keys are not in use. When a P11Key is created, every CK_ATTRIBUTE (PKCS11) value for the native key is queried, data stored in an opaque Java byte[] (inside the P11Key object) and native key destroyed. Every time the P11Key is about to be used, the native key is created with the stored data. After usage, the native key is again destroyed. Thus, it's not necessary to wait for a finalizer execution to cleanup native resources: cleanup is done at deterministic and previously-known points. This comes with a resposibility for key users -which are all SunPKCS11 internal services like P11Signature, P11Cipher, P11KeyGenerator, etc.-: create and destroy native keys through a reference counting scheme exposed by P11Key class. There are two kind of usages:

 1) stateless: the native key is "atomically" created, used and destroyed. I.e.: MAC calculation, getEncodedInternal operation (on P11Key objects), signature operations, TLS key derivation, etc.

 2) statefull: the native key is created, one or multiple intermediate actions are performed by the key user, a final action is performed and finally the native key is destroyed. I.e.: cipher operations.

For keys that are extractable but sensitive (CKA_SENSITIVE attribute is true), as the case when operating in FIPS mode, wrapping/unwrapping is used as a workaround to extract session keys. Wrapper key is global and lives forever.

There are no interface changes for SunPKCS11 external users.

If keys are not extractable or the feature cannot be enabled for any other reason, the previous finalizer scheme is used as a fallback.


ADDITIONAL IMPLEMENTATION NOTES
........................................................

When a P11Key is created, a constructor parameter exists to indicate if the feature is enabled for that specific key. For this feature to be enabled, 2 additional conditions apply: 1) SunPKCS11 backend has to be NSS, and 2) key has to be extractable. If the feature is not enabled for a key, behavior is as previous to this patch (native key destruction relies on finalizer execution).

The only P11Key user that explicitly does not use this feature is P11KeyStore. This is because these keys (token keys) are managed by alias names and makes no sense to remove them from the key store (they need to be accessible by an alias at any time).

Because P11Key objects can be used by multiple threads at a time, there is a thread-safe reference counting scheme in order to decide when a native key object has to be created or destroyed. The SunPKCS11 internal API to use a P11Key is as follows: 1) increment the reference counter (which will eventually create the native key object if it doesn't exist), 2) use the key and 3) decrement the reference counter (which will eventually destroy the native key if there it's not being used by anyone else).

The reason why an opaque byte[] is used in P11Key objects to store native keys data (instead of a CK_ATTRIBUTE[] Java objects, queried by Java's C_GetAttributeValue function) is performance. My prototypes show a difference of 4x in speed. 2 functions were added to libj2pkcs11 library: getNativeKeyInfo (to extract the opaque byte[] from a native key object) and createNativeKey (to create a native key object from an opaque byte[]).


CHANGESET
........................................................

This changeset is JDK-10 (at jdk c8796a577885 rev) based:



TESTING
........................................................


 * Suite (Bug6913047.java)
  * Tests JVM memory exhaustion while using keys for different services: P11Cipher, P11Signature, P11KeyAgreement, P11Mac, P11Digest, P11KeyGenerator, P11KeyFactory, etc.
  * Tests functional regression.
   * Including Key Stores (P11KeyStore)

Parameters to run the reproducer (on JDK-10): 
 * javac: --add-modules jdk.crypto.cryptoki --add-exports java.base/sun.security.internal.spec=ALL-UNNAMED 
 * java: -XX:+UseParallelGC -Xmx3500m --add-modules jdk.crypto.cryptoki --add-opens java.base/javax.crypto=ALL-UNNAMED --add-opens jdk.crypto.cryptoki/sun.security.pkcs11=ALL-UNNAMED

You can also use jtreg.


PERFORMANCE
........................................................

For a quick reproducer previously developed (which looped 100000 times creating P11Cipher and P11Key objects to encrypt a plaintext), these are the figures I got:

 * real 1m11.328s (without fix)
 * real 1m12.795s (with fix)

Performance penalty seems to be low in current state.

OTHER
.......................................................

My employer has an OCA agreement with Oracle and this work has been done in that context.

Look forward to your comments.

Kind regards,
Martin.-

--

Reply | Threaded
Open this post in threaded view
|

Re: RFR 6913047: SunPKCS11 memory leak

Martin Balao
In reply to this post by Martin Balao

On Wed, Oct 11, 2017 at 10:31 AM, Martin Balao <[hidden email]> wrote:
Hi,

I'd like to propose a fix for bug JDK-6913047: "Long term memory leak when using PKCS11 and JCE exceeds 32 bit process address space" [1]. This fix does not contain changes in the GC and is SunPKCS11 internal only.

PROBLEM
........................................................

When using the SunPKCS11 crypto provider (for cipher, signature, mac, key generation or any other operation), multiple key objects may be created. I.e.: every time a TLS session is established, a unique master key (derived from the pre-master-key) has to be created and then used for encryption and decryption operations. This is a legitimate use case in which key caching does not make sense as each key is unique per session. These keys are of P11Key type and have a corresponding native key object created. In the case of NSS SunPKCS11 backend (PKCS11 software token), this native key object is temporarily stored in the process native heap. The interface is simple: a JNI call is done to create a native key object (C_CreateObject, C_CopyObject, C_DeriveKey, C_GenerateKeys, etc., according to the PKCS11 interface) and an integer handler is kept in the Java side (P11Key). When the P11Key object is destroyed, a finalizer code is executed to free the native key object (through C_DestroyObject JNI call). The problem is that finalizer code execution happens only if the JVM garbage-collector cleans up the P11Key object. That may be delayed or not done at all, depending on different GC algorithms, parameters and environment conditions. As a result, the native heap may be exhausted with not freed native key objects, and the JVM will then crash -this is particularly true for 32 bits VMs where the virtual address space can be exhausted-.


SCOPE
........................................................

The fix is proposed for SunPKCS11 with NSS backend only. Other PKCS11 backends are not currently under scope. It's likely that hardware PKCS11 backends store native key objects in their own memory, preventing a native heap exhaustion and a JVM crash. However, it might be possible to cause an exhaustion on their own memory blocking key objects creation at some point. In any case, this is speculative as no tests were done on our side with real hardware.


SOLUTION
........................................................

Assuming that native keys are extractable, the idea is to hold native key data in the Java heap while keys are not in use. When a P11Key is created, every CK_ATTRIBUTE (PKCS11) value for the native key is queried, data stored in an opaque Java byte[] (inside the P11Key object) and native key destroyed. Every time the P11Key is about to be used, the native key is created with the stored data. After usage, the native key is again destroyed. Thus, it's not necessary to wait for a finalizer execution to cleanup native resources: cleanup is done at deterministic and previously-known points. This comes with a resposibility for key users -which are all SunPKCS11 internal services like P11Signature, P11Cipher, P11KeyGenerator, etc.-: create and destroy native keys through a reference counting scheme exposed by P11Key class. There are two kind of usages:

 1) stateless: the native key is "atomically" created, used and destroyed. I.e.: MAC calculation, getEncodedInternal operation (on P11Key objects), signature operations, TLS key derivation, etc.

 2) statefull: the native key is created, one or multiple intermediate actions are performed by the key user, a final action is performed and finally the native key is destroyed. I.e.: cipher operations.

For keys that are extractable but sensitive (CKA_SENSITIVE attribute is true), as the case when operating in FIPS mode, wrapping/unwrapping is used as a workaround to extract session keys. Wrapper key is global and lives forever.

There are no interface changes for SunPKCS11 external users.

If keys are not extractable or the feature cannot be enabled for any other reason, the previous finalizer scheme is used as a fallback.


ADDITIONAL IMPLEMENTATION NOTES
........................................................

When a P11Key is created, a constructor parameter exists to indicate if the feature is enabled for that specific key. For this feature to be enabled, 2 additional conditions apply: 1) SunPKCS11 backend has to be NSS, and 2) key has to be extractable. If the feature is not enabled for a key, behavior is as previous to this patch (native key destruction relies on finalizer execution).

The only P11Key user that explicitly does not use this feature is P11KeyStore. This is because these keys (token keys) are managed by alias names and makes no sense to remove them from the key store (they need to be accessible by an alias at any time).

Because P11Key objects can be used by multiple threads at a time, there is a thread-safe reference counting scheme in order to decide when a native key object has to be created or destroyed. The SunPKCS11 internal API to use a P11Key is as follows: 1) increment the reference counter (which will eventually create the native key object if it doesn't exist), 2) use the key and 3) decrement the reference counter (which will eventually destroy the native key if there it's not being used by anyone else).

The reason why an opaque byte[] is used in P11Key objects to store native keys data (instead of a CK_ATTRIBUTE[] Java objects, queried by Java's C_GetAttributeValue function) is performance. My prototypes show a difference of 4x in speed. 2 functions were added to libj2pkcs11 library: getNativeKeyInfo (to extract the opaque byte[] from a native key object) and createNativeKey (to create a native key object from an opaque byte[]).


CHANGESET
........................................................

This changeset is JDK-10 (at jdk c8796a577885 rev) based:



TESTING
........................................................


 * Suite (Bug6913047.java)
  * Tests JVM memory exhaustion while using keys for different services: P11Cipher, P11Signature, P11KeyAgreement, P11Mac, P11Digest, P11KeyGenerator, P11KeyFactory, etc.
  * Tests functional regression.
   * Including Key Stores (P11KeyStore)

Parameters to run the reproducer (on JDK-10): 
 * javac: --add-modules jdk.crypto.cryptoki --add-exports java.base/sun.security.internal.spec=ALL-UNNAMED 
 * java: -XX:+UseParallelGC -Xmx3500m --add-modules jdk.crypto.cryptoki --add-opens java.base/javax.crypto=ALL-UNNAMED --add-opens jdk.crypto.cryptoki/sun.security.pkcs11=ALL-UNNAMED

You can also use jtreg.


PERFORMANCE
........................................................

For a quick reproducer previously developed (which looped 100000 times creating P11Cipher and P11Key objects to encrypt a plaintext), these are the figures I got:

 * real 1m11.328s (without fix)
 * real 1m12.795s (with fix)

Performance penalty seems to be low in current state.

OTHER
.......................................................

My employer has an OCA agreement with Oracle and this work has been done in that context.

Look forward to your comments.

Kind regards,
Martin.-

--