RFR: 8179302: Pre-resolve constant pool string entries and cache resolved_reference arrays in CDS archive

classic Classic list List threaded Threaded
21 messages Options
12
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

RFR: 8179302: Pre-resolve constant pool string entries and cache resolved_reference arrays in CDS archive

Jiangli Zhou
Hi,

Please help review the changes for JDK-8179302 <https://bugs.openjdk.java.net/browse/JDK-8179302> (Pre-resolve constant pool string entries and cache resolved_reference arrays in CDS archive). Currently, the CDS archive can contain cached class metadata, interned java.lang.String objects. This RFE adds the constant pool ‘resolved_references’ arrays (hotspot specific) to the archive for startup/runtime performance enhancement. The ‘resolved_references' arrays are used to hold references of resolved constant pool entries including Strings, mirrors, etc. With the 'resolved_references’ being cached, string constants in shared classes can now be resolved to existing interned java.lang.Strings at CDS dump time. G1 and 64-bit platforms are required.

The GC changes in the RFE were discussed and guided by Thomas Schatzl and GC team. Part of the changes were contributed by Thomas himself.

RFE:        JDK-8179302 <https://bugs.openjdk.java.net/browse/JDK-8179302>
hotspot:   http://cr.openjdk.java.net/~jiangli/8179302/webrev.hotspot.01/ <http://cr.openjdk.java.net/~jiangli/8179302/webrev.hotspot.01/>
whitebox: http://cr.openjdk.java.net/~jiangli/8179302/webrev.whitebox.01/ <http://cr.openjdk.java.net/~jiangli/8179302/webrev.whitebox.01/>

Please see below for details of supporting cached ‘resolved_references’ and pre-resolving string constants.
Types of Pinned G1 Heap Regions
The pinned region type is a super type of all archive region types, which include the open archive type and the closed archive type.
00100 0 [ 8] Pinned Mask
01000 0 [16] Old Mask
10000 0 [32] Archive Mask
11100 0 [56] Open Archive:  ArchiveMask | PinnedMask | OldMask
11100 1 [57] Archive           : ArchiveMask | PinnedMask | OldMask + 1
               
Pinned Regions
Objects within the region are 'pinned', which means GC does not move any live objects. GC scans and marks objects in the pinned region as normal, but skips forwarding live objects. Pointers in live objects are updated. Dead objects (unreachable) can be collected and freed.
Archive Regions
The archive types are sub-types of 'pinned'. There are two types of archive region currently, open archive and closed archive. Both can support caching java heap objects via the CDS archive.
An archive region is also an old region by design.
Open Archive (GC-RW) Regions
Open archive region is GC writable. GC scans & marks objects within the region and adjusts (updates) pointers in live objects the same way as a pinned region. Live objects (reachable) are pinned and not forwarded by GC.
Open archive region does not have 'dead' objects. Unreachable objects are 'dormant' objects. Dormant objects are not collected and freed by GC.
Adjustable Outgoing Pointers
As GC can adjust pointers within the live objects in open archive heap region, objects can have outgoing pointers to another java heap region, including closed archive region, open archive region, pinned (or humongous) region, and normal generational region. When a referenced object is moved by GC, the pointer within the open archive region is updated accordingly.
Closed Archive (GC-RO) Regions
The closed archive region is GC read-only region. GC cannot write into the region. Objects are not scanned and marked by GC. Objects are pinned and not forwarded. Pointers are not updated by GC either. Hence, objects within the archive region cannot have any outgoing pointers to another java heap region. Objects however can still have pointers to other objects within the closed archive regions (we might allow pointers to open archive regions in the future). That restricts the type of java objects that can be supported by the archive region.
In JDK 9 we support archive Strings with the archive regions.
The GC-readonly archive region makes java heap memory sharable among different JVM processes. NOTE: synchronization on the objects within the archive heap region can still cause writes to the memory page.
Dormant Objects
Dormant objects are unreachable java objects within the open archive heap region.
A java object in the open archive heap region is a live object if it can be reached during scanning. Some of the java objects in the region may not be reachable during scanning. Those objects are considered as dormant, but not dead. For example, a constant pool 'resolved_references' array is reachable via the klass root if its container klass (shared) is already loaded at the time during GC scanning. If a shared klass is not yet loaded, the klass root is not scanned and it's constant pool 'resolved_reference' array (A) in the open archive region is not reachable. Then A is a dormant object.
Object State Transition
All java objects are initially dormant objects when open archive heap regions are mapped to the runtime java heap. A dormant object becomes live object when the associated shared class is loaded at runtime. Explicit call to G1SATBCardTableModRefBS::enqueue() needs to be made when a dormant object becomes live. That should be the case for cached objects with strong roots as well, since strong roots are only scanned at the start of GC marking (the initial marking) but not during Remarking/Final marking. If a cached object becomes live during concurrent marking phase, G1 may not find it and mark it live unless a call to G1SATBCardTableModRefBS::enqueue() is made for the object.
Currently, a live object in the open archive heap region cannot become dormant again. This restriction simplifies GC requirement and guarantees all outgoing pointers are updated by GC correctly. Only objects for shared classes from the builtin class loaders (boot, PlatformClassLoaders, and AppClassLoaders) are supported for caching.
Caching Java Objects at Archive Dump Time
The closed archive and open archive regions are allocated near the top of the dump time java heap. Archived java objects are copied into the designated archive heap regions. For example, String objects and the underlying 'value' arrays are copied into the closed archive regions. All references to the archived objects (from shared class metadata, string table, etc) are set to the new heap locations. A hash table is used to keep track of all archived java objects during the copying process to make sure java object is not archived more than once if reached from different roots. It also makes sure references to the same archived object are updated using the same new address location.
Caching Constant Pool resolved_reference Array
The 'resolved_references' is an array that holds references of resolved constant pool entries including Strings, mirrors and methodTypes, etc. Each loaded class has one 'resolved_references' array (in ConstantPoolCache). The 'resolved_references' arrays are copied into the open archive regions during dump process. Prior to copying the 'resolved_references' arrays, JVM iterates through constant pool entries and resolves all JVM_CONSTANT_String entries to existing interned Strings for all archived classes. When resolving, JVM only looks up the string table and finds existing interned Strings without inserting new ones. If a string entry cannot be resolved to an existing interned String, the constant pool entry remain as unresolved. That prevents memory waste if a constant pool string entry is never used at runtime.
All String objects referenced by the string table are copied first into the closed archive regions. The string table entry is updated with the new location when each String object is archived. The JVM updates the resolved constant pool string entries with the new object locations when copying the 'resolved_references' arrays to the open archive regions. References to the 'resolved_references' arrays in the ConstantPoolCache are also updated.
At runtime as part of ConstantPool::restore_unshareable_info() work, call G1SATBCardTableModRefBS::enqueue() to let GC know the 'resolved_references' is becoming live. A handle is created for the cached object and added to the loader_data's handles.
Runtime Java Heap With Cached Java Objects
 

The closed archive regions (the string regions) and open archive regions are mapped to the runtime java heap at the same offsets as the dump time offsets from the runtime java heap base.

Preliminary test execution and status:

JPRT: passed
Tier2-rt: passed
Tier2-gc: passed
Tier2-comp: passed
Tier3-rt: passed
Tier3-gc: passed
Tier3-comp: passed
Tier4-rt: passed
Tier4-gc: passed
Tier4-comp:6 jobs timed out, all other tests passed
Tier5-rt: one test failed but passed when running locally, all other tests passed
Tier5-gc: passed
Tier5-comp: running
hotspot_gc: two jobs timed out, all other tests passed
hotspot_gc in CDS mode: two jobs timed out, all other tests passed
vm.gc: passed
vm.gc in CDS mode: passed
Kichensink: passed
Kichensink in CDS mode: passed

Thanks,
Jiangli
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: RFR: 8179302: Pre-resolve constant pool string entries and cache resolved_reference arrays in CDS archive

Jiangli Zhou
Sorry, the mail didn’t handle the rich text well. I fixed the format below.

Please help review the changes for JDK-8179302 (Pre-resolve constant pool string entries and cache resolved_reference arrays in CDS archive). Currently, the CDS archive can contain cached class metadata, interned java.lang.String objects. This RFE adds the constant pool ‘resolved_references’ arrays (hotspot specific) to the archive for startup/runtime performance enhancement. The ‘resolved_references' arrays are used to hold references of resolved constant pool entries including Strings, mirrors, etc. With the 'resolved_references’ being cached, string constants in shared classes can now be resolved to existing interned java.lang.Strings at CDS dump time. G1 and 64-bit platforms are required.

The GC changes in the RFE were discussed and guided by Thomas Schatzl and GC team. Part of the changes were contributed by Thomas himself.
RFE:        https://bugs.openjdk.java.net/browse/JDK-8179302
hotspot:   http://cr.openjdk.java.net/~jiangli/8179302/webrev.hotspot.01/
whitebox: http://cr.openjdk.java.net/~jiangli/8179302/webrev.whitebox.01/

Please see below for details of supporting cached ‘resolved_references’ and pre-resolving string constants.

Types of Pinned G1 Heap Regions

The pinned region type is a super type of all archive region types, which include the open archive type and the closed archive type.

00100 0 [ 8] Pinned Mask
01000 0 [16] Old Mask
10000 0 [32] Archive Mask
11100 0 [56] Open Archive:   ArchiveMask | PinnedMask | OldMask
11100 1 [57] Closed Archive: ArchiveMask | PinnedMask | OldMask + 1


Pinned Regions

Objects within the region are 'pinned', which means GC does not move any live objects. GC scans and marks objects in the pinned region as normal, but skips forwarding live objects. Pointers in live objects are updated. Dead objects (unreachable) can be collected and freed.

Archive Regions

The archive types are sub-types of 'pinned'. There are two types of archive region currently, open archive and closed archive. Both can support caching java heap objects via the CDS archive.

An archive region is also an old region by design.

Open Archive (GC-RW) Regions

Open archive region is GC writable. GC scans & marks objects within the region and adjusts (updates) pointers in live objects the same way as a pinned region. Live objects (reachable) are pinned and not forwarded by GC.
Open archive region does not have 'dead' objects. Unreachable objects are 'dormant' objects. Dormant objects are not collected and freed by GC.

Adjustable Outgoing Pointers

As GC can adjust pointers within the live objects in open archive heap region, objects can have outgoing pointers to another java heap region, including closed archive region, open archive region, pinned (or humongous) region, and normal generational region. When a referenced object is moved by GC, the pointer within the open archive region is updated accordingly.

Closed Archive (GC-RO) Regions

The closed archive region is GC read-only region. GC cannot write into the region. Objects are not scanned and marked by GC. Objects are pinned and not forwarded. Pointers are not updated by GC either. Hence, objects within the archive region cannot have any outgoing pointers to another java heap region. Objects however can still have pointers to other objects within the closed archive regions (we might allow pointers to open archive regions in the future). That restricts the type of java objects that can be supported by the archive region.
In JDK 9 we support archive Strings with the archive regions.

The GC-readonly archive region makes java heap memory sharable among different JVM processes. NOTE: synchronization on the objects within the archive heap region can still cause writes to the memory page.

Dormant Objects

Dormant objects are unreachable java objects within the open archive heap region.
A java object in the open archive heap region is a live object if it can be reached during scanning. Some of the java objects in the region may not be reachable during scanning. Those objects are considered as dormant, but not dead. For example, a constant pool 'resolved_references' array is reachable via the klass root if its container klass (shared) is already loaded at the time during GC scanning. If a shared klass is not yet loaded, the klass root is not scanned and it's constant pool 'resolved_reference' array (A) in the open archive region is not reachable. Then A is a dormant object.

Object State Transition

All java objects are initially dormant objects when open archive heap regions are mapped to the runtime java heap. A dormant object becomes live object when the associated shared class is loaded at runtime. Explicit call to G1SATBCardTableModRefBS::enqueue() needs to be made when a dormant object becomes live. That should be the case for cached objects with strong roots as well, since strong roots are only scanned at the start of GC marking (the initial marking) but not during Remarking/Final marking. If a cached object becomes live during concurrent marking phase, G1 may not find it and mark it live unless a call to G1SATBCardTableModRefBS::enqueue() is made for the object.

Currently, a live object in the open archive heap region cannot become dormant again. This restriction simplifies GC requirement and guarantees all outgoing pointers are updated by GC correctly. Only objects for shared classes from the builtin class loaders (boot, PlatformClassLoaders, and AppClassLoaders) are supported for caching.

Caching Java Objects at Archive Dump Time

The closed archive and open archive regions are allocated near the top of the dump time java heap. Archived java objects are copied into the designated archive heap regions. For example, String objects and the underlying 'value' arrays are copied into the closed archive regions. All references to the archived objects (from shared class metadata, string table, etc) are set to the new heap locations. A hash table is used to keep track of all archived java objects during the copying process to make sure java object is not archived more than once if reached from different roots. It also makes sure references to the same archived object are updated using the same new address location.

Caching Constant Pool resolved_references Array

The 'resolved_references' is an array that holds references of resolved constant pool entries including Strings, mirrors and methodTypes, etc. Each loaded class has one 'resolved_references' array (in ConstantPoolCache). The 'resolved_references' arrays are copied into the open archive regions during dump process. Prior to copying the 'resolved_references' arrays, JVM iterates through constant pool entries and resolves all JVM_CONSTANT_String entries to existing interned Strings for all archived classes. When resolving, JVM only looks up the string table and finds existing interned Strings without inserting new ones. If a string entry cannot be resolved to an existing interned String, the constant pool entry remain as unresolved. That prevents memory waste if a constant pool string entry is never used at runtime.

All String objects referenced by the string table are copied first into the closed archive regions. The string table entry is updated with the new location when each String object is archived. The JVM updates the resolved constant pool string entries with the new object locations when copying the 'resolved_references' arrays to the open archive regions. References to the 'resolved_references' arrays in the ConstantPoolCache are also updated.
At runtime as part of ConstantPool::restore_unshareable_info() work, call G1SATBCardTableModRefBS::enqueue() to let GC know the 'resolved_references' is becoming live. A handle is created for the cached object and added to the loader_data's handles.

Runtime Java Heap With Cached Java Objects


The closed archive regions (the string regions) and open archive regions are mapped to the runtime java heap at the same offsets as the dump time offsets from the runtime java heap base.

Preliminary test execution and status:

JPRT: passed
Tier2-rt: passed
Tier2-gc: passed
Tier2-comp: passed
Tier3-rt: passed
Tier3-gc: passed
Tier3-comp: passed
Tier4-rt: passed
Tier4-gc: passed
Tier4-comp:6 jobs timed out, all other tests passed
Tier5-rt: one test failed but passed when running locally, all other tests passed
Tier5-gc: passed
Tier5-comp: running
hotspot_gc: two jobs timed out, all other tests passed
hotspot_gc in CDS mode: two jobs timed out, all other tests passed
vm.gc: passed
vm.gc in CDS mode: passed
Kichensink: passed
Kichensink in CDS mode: passed

Thanks,
Jiangli
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: RFR: 8179302: Pre-resolve constant pool string entries and cache resolved_reference arrays in CDS archive

Ioi Lam
Hi Jiangli,

Here are my comments. I've not reviewed the GC code and I'll leave that
to the GC experts :-)

stringTable.cpp: StringTable::archive_string

     add assert for DumpSharedSpaces only

filemap.cpp

  525 void
FileMapInfo::write_archive_heap_regions(GrowableArray<MemRegion> *regions,
  526                                              int first_region, int
num_regions) {

When I first read this function, I found it hard to follow, especially
this part that coalesces the trailing regions:

  537         int len = regions->length();
  538         if (len > 1) {
  539           start = (char*)regions->at(1).start();
  540           size = (char*)regions->at(len - 1).end() - start;
  541         }
  542       }

The rest of filemap.cpp always perform identical operations on MemRegion
arrays, which are either 1 or 2 in size. However, this function doesn't
follow that pattern; it also has a very different notion of "region",
and the confusing part is regions->size() is not the same as num_regions.

How about we change the API to something like the following? Before
calling this API, the caller needs to coalesce the trailing G1 regions
into a single MemRegion.

      FileMapInfo::write_archive_heap_regions(MemRegion *regions, int
first, int num_regions) {
         if (first == MetaspaceShared::first_string) {
            assert(num_regons <=  MetaspaceShared::max_strings, "...");
         } else {
            assert(first ==
MetaspaceShared::first_open_archive_heap_region, "...");
            assert(num_regons <=
MetaspaceShared::max_open_archive_heap_region, "...");
}
         ....



  756   if (!string_data_mapped) {
  757     StringTable::ignore_shared_strings(true);
  758     assert(string_ranges == NULL && num_string_ranges == 0, "sanity");
  759   }
  760
  761   if (open_archive_heap_data_mapped) {
  762     MetaspaceShared::set_open_archive_heap_region_mapped();
  763   } else {
  764     assert(open_archive_heap_ranges == NULL &&
num_open_archive_heap_ranges == 0, "sanity");
  765   }

Maybe the two "if" statements should be more consistent? Instead of
StringTable::ignore_shared_strings, how about
StringTable::set_shared_strings_region_mapped()?

FileMapInfo::map_heap_data() --

  818     char* addr = (char*)regions[i].start();
  819     char* base = os::map_memory(_fd, _full_path, si->_file_offset,
  820                                 addr, regions[i].byte_size(),
si->_read_only,
  821                                 si->_allow_exec);

What happens when the first region succeeds to map but the second region
fails to map? Will both regions be unmapped? I don't see where you store
the return value (base) from os::map_memory(). Does it mean the code
assumes that (addr == base). If so, we need an assert here.

constantPool.cpp

      Handle refs_handle;
      ...
      refs_handle = Handle(THREAD, (oop)archived);

This will first create a NULL handle, then construct a temporary handle,
and then assign the temp handle back to the null handle. This means two
handles will be pushed onto THREAD->metadata_handles()

I think it's more efficient if you merge these into a single statement

      Handle refs_handle(THREAD, (oop)archived);

Is this experimental code? Maybe it should be removed?

  664     if (tag_at(index).is_unresolved_klass()) {
  665 #if 0
  666       CPSlot entry = cp->slot_at(index);
  667       Symbol* name = entry.get_symbol();
  668       Klass* k = SystemDictionary::find(name, NULL, NULL, THREAD);
  669       if (k != NULL) {
  670         klass_at_put(index, k);
  671       }
  672 #endif
  673     } else

cpCache.hpp:

      u8   _archived_references

shouldn't this be declared as an narrowOop to avoid the type casts when
it's used?

cpCache.cpp:

     add assert so that one of these is used only at dump time and the
other only at run time?

  610 oop ConstantPoolCache::archived_references() {
  611   return oopDesc::decode_heap_oop((narrowOop)_archived_references);
  612 }
  613
  614 void ConstantPoolCache::set_archived_references(oop o) {
  615   _archived_references = (u8)oopDesc::encode_heap_oop(o);
  616 }

Thanks!
- Ioi

On 7/27/17 1:37 PM, Jiangli Zhou wrote:

> Sorry, the mail didn’t handle the rich text well. I fixed the format below.
>
> Please help review the changes for JDK-8179302 (Pre-resolve constant pool string entries and cache resolved_reference arrays in CDS archive). Currently, the CDS archive can contain cached class metadata, interned java.lang.String objects. This RFE adds the constant pool ‘resolved_references’ arrays (hotspot specific) to the archive for startup/runtime performance enhancement. The ‘resolved_references' arrays are used to hold references of resolved constant pool entries including Strings, mirrors, etc. With the 'resolved_references’ being cached, string constants in shared classes can now be resolved to existing interned java.lang.Strings at CDS dump time. G1 and 64-bit platforms are required.
>
> The GC changes in the RFE were discussed and guided by Thomas Schatzl and GC team. Part of the changes were contributed by Thomas himself.
> RFE:        https://bugs.openjdk.java.net/browse/JDK-8179302
> hotspot:   http://cr.openjdk.java.net/~jiangli/8179302/webrev.hotspot.01/
> whitebox: http://cr.openjdk.java.net/~jiangli/8179302/webrev.whitebox.01/
>
> Please see below for details of supporting cached ‘resolved_references’ and pre-resolving string constants.
>
> Types of Pinned G1 Heap Regions
>
> The pinned region type is a super type of all archive region types, which include the open archive type and the closed archive type.
>
> 00100 0 [ 8] Pinned Mask
> 01000 0 [16] Old Mask
> 10000 0 [32] Archive Mask
> 11100 0 [56] Open Archive:   ArchiveMask | PinnedMask | OldMask
> 11100 1 [57] Closed Archive: ArchiveMask | PinnedMask | OldMask + 1
>
>
> Pinned Regions
>
> Objects within the region are 'pinned', which means GC does not move any live objects. GC scans and marks objects in the pinned region as normal, but skips forwarding live objects. Pointers in live objects are updated. Dead objects (unreachable) can be collected and freed.
>
> Archive Regions
>
> The archive types are sub-types of 'pinned'. There are two types of archive region currently, open archive and closed archive. Both can support caching java heap objects via the CDS archive.
>
> An archive region is also an old region by design.
>
> Open Archive (GC-RW) Regions
>
> Open archive region is GC writable. GC scans & marks objects within the region and adjusts (updates) pointers in live objects the same way as a pinned region. Live objects (reachable) are pinned and not forwarded by GC.
> Open archive region does not have 'dead' objects. Unreachable objects are 'dormant' objects. Dormant objects are not collected and freed by GC.
>
> Adjustable Outgoing Pointers
>
> As GC can adjust pointers within the live objects in open archive heap region, objects can have outgoing pointers to another java heap region, including closed archive region, open archive region, pinned (or humongous) region, and normal generational region. When a referenced object is moved by GC, the pointer within the open archive region is updated accordingly.
>
> Closed Archive (GC-RO) Regions
>
> The closed archive region is GC read-only region. GC cannot write into the region. Objects are not scanned and marked by GC. Objects are pinned and not forwarded. Pointers are not updated by GC either. Hence, objects within the archive region cannot have any outgoing pointers to another java heap region. Objects however can still have pointers to other objects within the closed archive regions (we might allow pointers to open archive regions in the future). That restricts the type of java objects that can be supported by the archive region.
> In JDK 9 we support archive Strings with the archive regions.
>
> The GC-readonly archive region makes java heap memory sharable among different JVM processes. NOTE: synchronization on the objects within the archive heap region can still cause writes to the memory page.
>
> Dormant Objects
>
> Dormant objects are unreachable java objects within the open archive heap region.
> A java object in the open archive heap region is a live object if it can be reached during scanning. Some of the java objects in the region may not be reachable during scanning. Those objects are considered as dormant, but not dead. For example, a constant pool 'resolved_references' array is reachable via the klass root if its container klass (shared) is already loaded at the time during GC scanning. If a shared klass is not yet loaded, the klass root is not scanned and it's constant pool 'resolved_reference' array (A) in the open archive region is not reachable. Then A is a dormant object.
>
> Object State Transition
>
> All java objects are initially dormant objects when open archive heap regions are mapped to the runtime java heap. A dormant object becomes live object when the associated shared class is loaded at runtime. Explicit call to G1SATBCardTableModRefBS::enqueue() needs to be made when a dormant object becomes live. That should be the case for cached objects with strong roots as well, since strong roots are only scanned at the start of GC marking (the initial marking) but not during Remarking/Final marking. If a cached object becomes live during concurrent marking phase, G1 may not find it and mark it live unless a call to G1SATBCardTableModRefBS::enqueue() is made for the object.
>
> Currently, a live object in the open archive heap region cannot become dormant again. This restriction simplifies GC requirement and guarantees all outgoing pointers are updated by GC correctly. Only objects for shared classes from the builtin class loaders (boot, PlatformClassLoaders, and AppClassLoaders) are supported for caching.
>
> Caching Java Objects at Archive Dump Time
>
> The closed archive and open archive regions are allocated near the top of the dump time java heap. Archived java objects are copied into the designated archive heap regions. For example, String objects and the underlying 'value' arrays are copied into the closed archive regions. All references to the archived objects (from shared class metadata, string table, etc) are set to the new heap locations. A hash table is used to keep track of all archived java objects during the copying process to make sure java object is not archived more than once if reached from different roots. It also makes sure references to the same archived object are updated using the same new address location.
>
> Caching Constant Pool resolved_references Array
>
> The 'resolved_references' is an array that holds references of resolved constant pool entries including Strings, mirrors and methodTypes, etc. Each loaded class has one 'resolved_references' array (in ConstantPoolCache). The 'resolved_references' arrays are copied into the open archive regions during dump process. Prior to copying the 'resolved_references' arrays, JVM iterates through constant pool entries and resolves all JVM_CONSTANT_String entries to existing interned Strings for all archived classes. When resolving, JVM only looks up the string table and finds existing interned Strings without inserting new ones. If a string entry cannot be resolved to an existing interned String, the constant pool entry remain as unresolved. That prevents memory waste if a constant pool string entry is never used at runtime.
>
> All String objects referenced by the string table are copied first into the closed archive regions. The string table entry is updated with the new location when each String object is archived. The JVM updates the resolved constant pool string entries with the new object locations when copying the 'resolved_references' arrays to the open archive regions. References to the 'resolved_references' arrays in the ConstantPoolCache are also updated.
> At runtime as part of ConstantPool::restore_unshareable_info() work, call G1SATBCardTableModRefBS::enqueue() to let GC know the 'resolved_references' is becoming live. A handle is created for the cached object and added to the loader_data's handles.
>
> Runtime Java Heap With Cached Java Objects
>
>
> The closed archive regions (the string regions) and open archive regions are mapped to the runtime java heap at the same offsets as the dump time offsets from the runtime java heap base.
>
> Preliminary test execution and status:
>
> JPRT: passed
> Tier2-rt: passed
> Tier2-gc: passed
> Tier2-comp: passed
> Tier3-rt: passed
> Tier3-gc: passed
> Tier3-comp: passed
> Tier4-rt: passed
> Tier4-gc: passed
> Tier4-comp:6 jobs timed out, all other tests passed
> Tier5-rt: one test failed but passed when running locally, all other tests passed
> Tier5-gc: passed
> Tier5-comp: running
> hotspot_gc: two jobs timed out, all other tests passed
> hotspot_gc in CDS mode: two jobs timed out, all other tests passed
> vm.gc: passed
> vm.gc in CDS mode: passed
> Kichensink: passed
> Kichensink in CDS mode: passed
>
> Thanks,
> Jiangli

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: RFR: 8179302: Pre-resolve constant pool string entries and cache resolved_reference arrays in CDS archive

Jiangli Zhou
Hi Ioi,

Thank you so much for reviewing this. I’ve addressed all your feedbacks. Please see details below. I’ll updated the webrev after addressing Coleen’s comments.

> On Jul 30, 2017, at 9:07 PM, Ioi Lam <[hidden email]> wrote:
>
> Hi Jiangli,
>
> Here are my comments. I've not reviewed the GC code and I'll leave that to the GC experts :-)
>
> stringTable.cpp: StringTable::archive_string
>
>    add assert for DumpSharedSpaces only

Ok.

>
> filemap.cpp
>
> 525 void FileMapInfo::write_archive_heap_regions(GrowableArray<MemRegion> *regions,
> 526                                              int first_region, int num_regions) {
>
> When I first read this function, I found it hard to follow, especially this part that coalesces the trailing regions:
>
> 537         int len = regions->length();
> 538         if (len > 1) {
> 539           start = (char*)regions->at(1).start();
> 540           size = (char*)regions->at(len - 1).end() - start;
> 541         }
> 542       }
>
> The rest of filemap.cpp always perform identical operations on MemRegion arrays, which are either 1 or 2 in size. However, this function doesn't follow that pattern; it also has a very different notion of "region", and the confusing part is regions->size() is not the same as num_regions.
>
> How about we change the API to something like the following? Before calling this API, the caller needs to coalesce the trailing G1 regions into a single MemRegion.
>
>     FileMapInfo::write_archive_heap_regions(MemRegion *regions, int first, int num_regions) {
>        if (first == MetaspaceShared::first_string) {
>           assert(num_regons <=  MetaspaceShared::max_strings, "...");
>        } else {
>           assert(first == MetaspaceShared::first_open_archive_heap_region, "...");
>           assert(num_regons <= MetaspaceShared::max_open_archive_heap_region, "...");
> }
>        ....
>
>

I’ve reworked the function and simplified the code.

>
> 756   if (!string_data_mapped) {
> 757     StringTable::ignore_shared_strings(true);
> 758     assert(string_ranges == NULL && num_string_ranges == 0, "sanity");
> 759   }
> 760
> 761   if (open_archive_heap_data_mapped) {
> 762     MetaspaceShared::set_open_archive_heap_region_mapped();
> 763   } else {
> 764     assert(open_archive_heap_ranges == NULL && num_open_archive_heap_ranges == 0, "sanity");
> 765   }
>
> Maybe the two "if" statements should be more consistent? Instead of StringTable::ignore_shared_strings, how about StringTable::set_shared_strings_region_mapped()?

Fixed.

>
> FileMapInfo::map_heap_data() --
>
> 818     char* addr = (char*)regions[i].start();
> 819     char* base = os::map_memory(_fd, _full_path, si->_file_offset,
> 820                                 addr, regions[i].byte_size(), si->_read_only,
> 821                                 si->_allow_exec);
>
> What happens when the first region succeeds to map but the second region fails to map? Will both regions be unmapped? I don't see where you store the return value (base) from os::map_memory(). Does it mean the code assumes that (addr == base). If so, we need an assert here.

If any of the region fails to map, we bail out and call dealloc_archive_heap_regions(), which handles the deallocation of any regions specified. If second region fails to map, all memory ranges specified by ‘regions’ array are deallocated. We don’t unmap the memory here since it is part of the java heap. Unmapping of heap memory are handled by GC code. The ‘if’ check below makes sure base == addr.

    if (base == NULL || base != addr) {
      // dealloc the regions from java heap
      dealloc_archive_heap_regions(regions, region_num);
      if (log_is_enabled(Info, cds)) {
        log_info(cds)("UseSharedSpaces: Unable to map at required address in java heap.");
      }
      return false;
    }


>
> constantPool.cpp
>
>     Handle refs_handle;
>     ...
>     refs_handle = Handle(THREAD, (oop)archived);
>
> This will first create a NULL handle, then construct a temporary handle, and then assign the temp handle back to the null handle. This means two handles will be pushed onto THREAD->metadata_handles()
>
> I think it's more efficient if you merge these into a single statement
>
>     Handle refs_handle(THREAD, (oop)archived);

Fixed.

>
> Is this experimental code? Maybe it should be removed?
>
> 664     if (tag_at(index).is_unresolved_klass()) {
> 665 #if 0
> 666       CPSlot entry = cp->slot_at(index);
> 667       Symbol* name = entry.get_symbol();
> 668       Klass* k = SystemDictionary::find(name, NULL, NULL, THREAD);
> 669       if (k != NULL) {
> 670         klass_at_put(index, k);
> 671       }
> 672 #endif
> 673     } else

Removed.

>
> cpCache.hpp:
>
>     u8   _archived_references
>
> shouldn't this be declared as an narrowOop to avoid the type casts when it's used?

Ok.

>
> cpCache.cpp:
>
>    add assert so that one of these is used only at dump time and the other only at run time?
>
> 610 oop ConstantPoolCache::archived_references() {
> 611   return oopDesc::decode_heap_oop((narrowOop)_archived_references);
> 612 }
> 613
> 614 void ConstantPoolCache::set_archived_references(oop o) {
> 615   _archived_references = (u8)oopDesc::encode_heap_oop(o);
> 616 }

Ok.

Thanks!

Jiangli

>
> Thanks!
> - Ioi
>
> On 7/27/17 1:37 PM, Jiangli Zhou wrote:
>> Sorry, the mail didn’t handle the rich text well. I fixed the format below.
>>
>> Please help review the changes for JDK-8179302 (Pre-resolve constant pool string entries and cache resolved_reference arrays in CDS archive). Currently, the CDS archive can contain cached class metadata, interned java.lang.String objects. This RFE adds the constant pool ‘resolved_references’ arrays (hotspot specific) to the archive for startup/runtime performance enhancement. The ‘resolved_references' arrays are used to hold references of resolved constant pool entries including Strings, mirrors, etc. With the 'resolved_references’ being cached, string constants in shared classes can now be resolved to existing interned java.lang.Strings at CDS dump time. G1 and 64-bit platforms are required.
>>
>> The GC changes in the RFE were discussed and guided by Thomas Schatzl and GC team. Part of the changes were contributed by Thomas himself.
>> RFE:        https://bugs.openjdk.java.net/browse/JDK-8179302
>> hotspot:   http://cr.openjdk.java.net/~jiangli/8179302/webrev.hotspot.01/
>> whitebox: http://cr.openjdk.java.net/~jiangli/8179302/webrev.whitebox.01/
>>
>> Please see below for details of supporting cached ‘resolved_references’ and pre-resolving string constants.
>>
>> Types of Pinned G1 Heap Regions
>>
>> The pinned region type is a super type of all archive region types, which include the open archive type and the closed archive type.
>>
>> 00100 0 [ 8] Pinned Mask
>> 01000 0 [16] Old Mask
>> 10000 0 [32] Archive Mask
>> 11100 0 [56] Open Archive:   ArchiveMask | PinnedMask | OldMask
>> 11100 1 [57] Closed Archive: ArchiveMask | PinnedMask | OldMask + 1
>>
>>
>> Pinned Regions
>>
>> Objects within the region are 'pinned', which means GC does not move any live objects. GC scans and marks objects in the pinned region as normal, but skips forwarding live objects. Pointers in live objects are updated. Dead objects (unreachable) can be collected and freed.
>>
>> Archive Regions
>>
>> The archive types are sub-types of 'pinned'. There are two types of archive region currently, open archive and closed archive. Both can support caching java heap objects via the CDS archive.
>>
>> An archive region is also an old region by design.
>>
>> Open Archive (GC-RW) Regions
>>
>> Open archive region is GC writable. GC scans & marks objects within the region and adjusts (updates) pointers in live objects the same way as a pinned region. Live objects (reachable) are pinned and not forwarded by GC.
>> Open archive region does not have 'dead' objects. Unreachable objects are 'dormant' objects. Dormant objects are not collected and freed by GC.
>>
>> Adjustable Outgoing Pointers
>>
>> As GC can adjust pointers within the live objects in open archive heap region, objects can have outgoing pointers to another java heap region, including closed archive region, open archive region, pinned (or humongous) region, and normal generational region. When a referenced object is moved by GC, the pointer within the open archive region is updated accordingly.
>>
>> Closed Archive (GC-RO) Regions
>>
>> The closed archive region is GC read-only region. GC cannot write into the region. Objects are not scanned and marked by GC. Objects are pinned and not forwarded. Pointers are not updated by GC either. Hence, objects within the archive region cannot have any outgoing pointers to another java heap region. Objects however can still have pointers to other objects within the closed archive regions (we might allow pointers to open archive regions in the future). That restricts the type of java objects that can be supported by the archive region.
>> In JDK 9 we support archive Strings with the archive regions.
>>
>> The GC-readonly archive region makes java heap memory sharable among different JVM processes. NOTE: synchronization on the objects within the archive heap region can still cause writes to the memory page.
>>
>> Dormant Objects
>>
>> Dormant objects are unreachable java objects within the open archive heap region.
>> A java object in the open archive heap region is a live object if it can be reached during scanning. Some of the java objects in the region may not be reachable during scanning. Those objects are considered as dormant, but not dead. For example, a constant pool 'resolved_references' array is reachable via the klass root if its container klass (shared) is already loaded at the time during GC scanning. If a shared klass is not yet loaded, the klass root is not scanned and it's constant pool 'resolved_reference' array (A) in the open archive region is not reachable. Then A is a dormant object.
>>
>> Object State Transition
>>
>> All java objects are initially dormant objects when open archive heap regions are mapped to the runtime java heap. A dormant object becomes live object when the associated shared class is loaded at runtime. Explicit call to G1SATBCardTableModRefBS::enqueue() needs to be made when a dormant object becomes live. That should be the case for cached objects with strong roots as well, since strong roots are only scanned at the start of GC marking (the initial marking) but not during Remarking/Final marking. If a cached object becomes live during concurrent marking phase, G1 may not find it and mark it live unless a call to G1SATBCardTableModRefBS::enqueue() is made for the object.
>>
>> Currently, a live object in the open archive heap region cannot become dormant again. This restriction simplifies GC requirement and guarantees all outgoing pointers are updated by GC correctly. Only objects for shared classes from the builtin class loaders (boot, PlatformClassLoaders, and AppClassLoaders) are supported for caching.
>>
>> Caching Java Objects at Archive Dump Time
>>
>> The closed archive and open archive regions are allocated near the top of the dump time java heap. Archived java objects are copied into the designated archive heap regions. For example, String objects and the underlying 'value' arrays are copied into the closed archive regions. All references to the archived objects (from shared class metadata, string table, etc) are set to the new heap locations. A hash table is used to keep track of all archived java objects during the copying process to make sure java object is not archived more than once if reached from different roots. It also makes sure references to the same archived object are updated using the same new address location.
>>
>> Caching Constant Pool resolved_references Array
>>
>> The 'resolved_references' is an array that holds references of resolved constant pool entries including Strings, mirrors and methodTypes, etc. Each loaded class has one 'resolved_references' array (in ConstantPoolCache). The 'resolved_references' arrays are copied into the open archive regions during dump process. Prior to copying the 'resolved_references' arrays, JVM iterates through constant pool entries and resolves all JVM_CONSTANT_String entries to existing interned Strings for all archived classes. When resolving, JVM only looks up the string table and finds existing interned Strings without inserting new ones. If a string entry cannot be resolved to an existing interned String, the constant pool entry remain as unresolved. That prevents memory waste if a constant pool string entry is never used at runtime.
>>
>> All String objects referenced by the string table are copied first into the closed archive regions. The string table entry is updated with the new location when each String object is archived. The JVM updates the resolved constant pool string entries with the new object locations when copying the 'resolved_references' arrays to the open archive regions. References to the 'resolved_references' arrays in the ConstantPoolCache are also updated.
>> At runtime as part of ConstantPool::restore_unshareable_info() work, call G1SATBCardTableModRefBS::enqueue() to let GC know the 'resolved_references' is becoming live. A handle is created for the cached object and added to the loader_data's handles.
>>
>> Runtime Java Heap With Cached Java Objects
>>
>>
>> The closed archive regions (the string regions) and open archive regions are mapped to the runtime java heap at the same offsets as the dump time offsets from the runtime java heap base.
>>
>> Preliminary test execution and status:
>>
>> JPRT: passed
>> Tier2-rt: passed
>> Tier2-gc: passed
>> Tier2-comp: passed
>> Tier3-rt: passed
>> Tier3-gc: passed
>> Tier3-comp: passed
>> Tier4-rt: passed
>> Tier4-gc: passed
>> Tier4-comp:6 jobs timed out, all other tests passed
>> Tier5-rt: one test failed but passed when running locally, all other tests passed
>> Tier5-gc: passed
>> Tier5-comp: running
>> hotspot_gc: two jobs timed out, all other tests passed
>> hotspot_gc in CDS mode: two jobs timed out, all other tests passed
>> vm.gc: passed
>> vm.gc in CDS mode: passed
>> Kichensink: passed
>> Kichensink in CDS mode: passed
>>
>> Thanks,
>> Jiangli
>

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: RFR: 8179302: Pre-resolve constant pool string entries and cache resolved_reference arrays in CDS archive

Jiangli Zhou
Here are the updated webrevs.

http://cr.openjdk.java.net/~jiangli/8179302/webrev.hotspot.02/
http://cr.openjdk.java.net/~jiangli/8179302/webrev.whitebox.02/

Changes in the updated webrevs include:
Merge with Ioi’s recent shared space auto-sizing change (8072061)
Addressed all feedbacks from Ioi and Coleen (Thanks for detailed review!)

Thanks,
Jiangli


> On Aug 1, 2017, at 5:29 PM, Jiangli Zhou <[hidden email]> wrote:
>
> Hi Ioi,
>
> Thank you so much for reviewing this. I’ve addressed all your feedbacks. Please see details below. I’ll updated the webrev after addressing Coleen’s comments.
>
>> On Jul 30, 2017, at 9:07 PM, Ioi Lam <[hidden email]> wrote:
>>
>> Hi Jiangli,
>>
>> Here are my comments. I've not reviewed the GC code and I'll leave that to the GC experts :-)
>>
>> stringTable.cpp: StringTable::archive_string
>>
>>    add assert for DumpSharedSpaces only
>
> Ok.
>
>>
>> filemap.cpp
>>
>> 525 void FileMapInfo::write_archive_heap_regions(GrowableArray<MemRegion> *regions,
>> 526                                              int first_region, int num_regions) {
>>
>> When I first read this function, I found it hard to follow, especially this part that coalesces the trailing regions:
>>
>> 537         int len = regions->length();
>> 538         if (len > 1) {
>> 539           start = (char*)regions->at(1).start();
>> 540           size = (char*)regions->at(len - 1).end() - start;
>> 541         }
>> 542       }
>>
>> The rest of filemap.cpp always perform identical operations on MemRegion arrays, which are either 1 or 2 in size. However, this function doesn't follow that pattern; it also has a very different notion of "region", and the confusing part is regions->size() is not the same as num_regions.
>>
>> How about we change the API to something like the following? Before calling this API, the caller needs to coalesce the trailing G1 regions into a single MemRegion.
>>
>>     FileMapInfo::write_archive_heap_regions(MemRegion *regions, int first, int num_regions) {
>>        if (first == MetaspaceShared::first_string) {
>>           assert(num_regons <=  MetaspaceShared::max_strings, "...");
>>        } else {
>>           assert(first == MetaspaceShared::first_open_archive_heap_region, "...");
>>           assert(num_regons <= MetaspaceShared::max_open_archive_heap_region, "...");
>> }
>>        ....
>>
>>
>
> I’ve reworked the function and simplified the code.
>
>>
>> 756   if (!string_data_mapped) {
>> 757     StringTable::ignore_shared_strings(true);
>> 758     assert(string_ranges == NULL && num_string_ranges == 0, "sanity");
>> 759   }
>> 760
>> 761   if (open_archive_heap_data_mapped) {
>> 762     MetaspaceShared::set_open_archive_heap_region_mapped();
>> 763   } else {
>> 764     assert(open_archive_heap_ranges == NULL && num_open_archive_heap_ranges == 0, "sanity");
>> 765   }
>>
>> Maybe the two "if" statements should be more consistent? Instead of StringTable::ignore_shared_strings, how about StringTable::set_shared_strings_region_mapped()?
>
> Fixed.
>
>>
>> FileMapInfo::map_heap_data() --
>>
>> 818     char* addr = (char*)regions[i].start();
>> 819     char* base = os::map_memory(_fd, _full_path, si->_file_offset,
>> 820                                 addr, regions[i].byte_size(), si->_read_only,
>> 821                                 si->_allow_exec);
>>
>> What happens when the first region succeeds to map but the second region fails to map? Will both regions be unmapped? I don't see where you store the return value (base) from os::map_memory(). Does it mean the code assumes that (addr == base). If so, we need an assert here.
>
> If any of the region fails to map, we bail out and call dealloc_archive_heap_regions(), which handles the deallocation of any regions specified. If second region fails to map, all memory ranges specified by ‘regions’ array are deallocated. We don’t unmap the memory here since it is part of the java heap. Unmapping of heap memory are handled by GC code. The ‘if’ check below makes sure base == addr.
>
>     if (base == NULL || base != addr) {
>       // dealloc the regions from java heap
>       dealloc_archive_heap_regions(regions, region_num);
>       if (log_is_enabled(Info, cds)) {
>         log_info(cds)("UseSharedSpaces: Unable to map at required address in java heap.");
>       }
>       return false;
>     }
>
>
>>
>> constantPool.cpp
>>
>>     Handle refs_handle;
>>     ...
>>     refs_handle = Handle(THREAD, (oop)archived);
>>
>> This will first create a NULL handle, then construct a temporary handle, and then assign the temp handle back to the null handle. This means two handles will be pushed onto THREAD->metadata_handles()
>>
>> I think it's more efficient if you merge these into a single statement
>>
>>     Handle refs_handle(THREAD, (oop)archived);
>
> Fixed.
>
>>
>> Is this experimental code? Maybe it should be removed?
>>
>> 664     if (tag_at(index).is_unresolved_klass()) {
>> 665 #if 0
>> 666       CPSlot entry = cp->slot_at(index);
>> 667       Symbol* name = entry.get_symbol();
>> 668       Klass* k = SystemDictionary::find(name, NULL, NULL, THREAD);
>> 669       if (k != NULL) {
>> 670         klass_at_put(index, k);
>> 671       }
>> 672 #endif
>> 673     } else
>
> Removed.
>
>>
>> cpCache.hpp:
>>
>>     u8   _archived_references
>>
>> shouldn't this be declared as an narrowOop to avoid the type casts when it's used?
>
> Ok.
>
>>
>> cpCache.cpp:
>>
>>    add assert so that one of these is used only at dump time and the other only at run time?
>>
>> 610 oop ConstantPoolCache::archived_references() {
>> 611   return oopDesc::decode_heap_oop((narrowOop)_archived_references);
>> 612 }
>> 613
>> 614 void ConstantPoolCache::set_archived_references(oop o) {
>> 615   _archived_references = (u8)oopDesc::encode_heap_oop(o);
>> 616 }
>
> Ok.
>
> Thanks!
>
> Jiangli
>
>>
>> Thanks!
>> - Ioi
>>
>> On 7/27/17 1:37 PM, Jiangli Zhou wrote:
>>> Sorry, the mail didn’t handle the rich text well. I fixed the format below.
>>>
>>> Please help review the changes for JDK-8179302 (Pre-resolve constant pool string entries and cache resolved_reference arrays in CDS archive). Currently, the CDS archive can contain cached class metadata, interned java.lang.String objects. This RFE adds the constant pool ‘resolved_references’ arrays (hotspot specific) to the archive for startup/runtime performance enhancement. The ‘resolved_references' arrays are used to hold references of resolved constant pool entries including Strings, mirrors, etc. With the 'resolved_references’ being cached, string constants in shared classes can now be resolved to existing interned java.lang.Strings at CDS dump time. G1 and 64-bit platforms are required.
>>>
>>> The GC changes in the RFE were discussed and guided by Thomas Schatzl and GC team. Part of the changes were contributed by Thomas himself.
>>> RFE:        https://bugs.openjdk.java.net/browse/JDK-8179302
>>> hotspot:   http://cr.openjdk.java.net/~jiangli/8179302/webrev.hotspot.01/
>>> whitebox: http://cr.openjdk.java.net/~jiangli/8179302/webrev.whitebox.01/
>>>
>>> Please see below for details of supporting cached ‘resolved_references’ and pre-resolving string constants.
>>>
>>> Types of Pinned G1 Heap Regions
>>>
>>> The pinned region type is a super type of all archive region types, which include the open archive type and the closed archive type.
>>>
>>> 00100 0 [ 8] Pinned Mask
>>> 01000 0 [16] Old Mask
>>> 10000 0 [32] Archive Mask
>>> 11100 0 [56] Open Archive:   ArchiveMask | PinnedMask | OldMask
>>> 11100 1 [57] Closed Archive: ArchiveMask | PinnedMask | OldMask + 1
>>>
>>>
>>> Pinned Regions
>>>
>>> Objects within the region are 'pinned', which means GC does not move any live objects. GC scans and marks objects in the pinned region as normal, but skips forwarding live objects. Pointers in live objects are updated. Dead objects (unreachable) can be collected and freed.
>>>
>>> Archive Regions
>>>
>>> The archive types are sub-types of 'pinned'. There are two types of archive region currently, open archive and closed archive. Both can support caching java heap objects via the CDS archive.
>>>
>>> An archive region is also an old region by design.
>>>
>>> Open Archive (GC-RW) Regions
>>>
>>> Open archive region is GC writable. GC scans & marks objects within the region and adjusts (updates) pointers in live objects the same way as a pinned region. Live objects (reachable) are pinned and not forwarded by GC.
>>> Open archive region does not have 'dead' objects. Unreachable objects are 'dormant' objects. Dormant objects are not collected and freed by GC.
>>>
>>> Adjustable Outgoing Pointers
>>>
>>> As GC can adjust pointers within the live objects in open archive heap region, objects can have outgoing pointers to another java heap region, including closed archive region, open archive region, pinned (or humongous) region, and normal generational region. When a referenced object is moved by GC, the pointer within the open archive region is updated accordingly.
>>>
>>> Closed Archive (GC-RO) Regions
>>>
>>> The closed archive region is GC read-only region. GC cannot write into the region. Objects are not scanned and marked by GC. Objects are pinned and not forwarded. Pointers are not updated by GC either. Hence, objects within the archive region cannot have any outgoing pointers to another java heap region. Objects however can still have pointers to other objects within the closed archive regions (we might allow pointers to open archive regions in the future). That restricts the type of java objects that can be supported by the archive region.
>>> In JDK 9 we support archive Strings with the archive regions.
>>>
>>> The GC-readonly archive region makes java heap memory sharable among different JVM processes. NOTE: synchronization on the objects within the archive heap region can still cause writes to the memory page.
>>>
>>> Dormant Objects
>>>
>>> Dormant objects are unreachable java objects within the open archive heap region.
>>> A java object in the open archive heap region is a live object if it can be reached during scanning. Some of the java objects in the region may not be reachable during scanning. Those objects are considered as dormant, but not dead. For example, a constant pool 'resolved_references' array is reachable via the klass root if its container klass (shared) is already loaded at the time during GC scanning. If a shared klass is not yet loaded, the klass root is not scanned and it's constant pool 'resolved_reference' array (A) in the open archive region is not reachable. Then A is a dormant object.
>>>
>>> Object State Transition
>>>
>>> All java objects are initially dormant objects when open archive heap regions are mapped to the runtime java heap. A dormant object becomes live object when the associated shared class is loaded at runtime. Explicit call to G1SATBCardTableModRefBS::enqueue() needs to be made when a dormant object becomes live. That should be the case for cached objects with strong roots as well, since strong roots are only scanned at the start of GC marking (the initial marking) but not during Remarking/Final marking. If a cached object becomes live during concurrent marking phase, G1 may not find it and mark it live unless a call to G1SATBCardTableModRefBS::enqueue() is made for the object.
>>>
>>> Currently, a live object in the open archive heap region cannot become dormant again. This restriction simplifies GC requirement and guarantees all outgoing pointers are updated by GC correctly. Only objects for shared classes from the builtin class loaders (boot, PlatformClassLoaders, and AppClassLoaders) are supported for caching.
>>>
>>> Caching Java Objects at Archive Dump Time
>>>
>>> The closed archive and open archive regions are allocated near the top of the dump time java heap. Archived java objects are copied into the designated archive heap regions. For example, String objects and the underlying 'value' arrays are copied into the closed archive regions. All references to the archived objects (from shared class metadata, string table, etc) are set to the new heap locations. A hash table is used to keep track of all archived java objects during the copying process to make sure java object is not archived more than once if reached from different roots. It also makes sure references to the same archived object are updated using the same new address location.
>>>
>>> Caching Constant Pool resolved_references Array
>>>
>>> The 'resolved_references' is an array that holds references of resolved constant pool entries including Strings, mirrors and methodTypes, etc. Each loaded class has one 'resolved_references' array (in ConstantPoolCache). The 'resolved_references' arrays are copied into the open archive regions during dump process. Prior to copying the 'resolved_references' arrays, JVM iterates through constant pool entries and resolves all JVM_CONSTANT_String entries to existing interned Strings for all archived classes. When resolving, JVM only looks up the string table and finds existing interned Strings without inserting new ones. If a string entry cannot be resolved to an existing interned String, the constant pool entry remain as unresolved. That prevents memory waste if a constant pool string entry is never used at runtime.
>>>
>>> All String objects referenced by the string table are copied first into the closed archive regions. The string table entry is updated with the new location when each String object is archived. The JVM updates the resolved constant pool string entries with the new object locations when copying the 'resolved_references' arrays to the open archive regions. References to the 'resolved_references' arrays in the ConstantPoolCache are also updated.
>>> At runtime as part of ConstantPool::restore_unshareable_info() work, call G1SATBCardTableModRefBS::enqueue() to let GC know the 'resolved_references' is becoming live. A handle is created for the cached object and added to the loader_data's handles.
>>>
>>> Runtime Java Heap With Cached Java Objects
>>>
>>>
>>> The closed archive regions (the string regions) and open archive regions are mapped to the runtime java heap at the same offsets as the dump time offsets from the runtime java heap base.
>>>
>>> Preliminary test execution and status:
>>>
>>> JPRT: passed
>>> Tier2-rt: passed
>>> Tier2-gc: passed
>>> Tier2-comp: passed
>>> Tier3-rt: passed
>>> Tier3-gc: passed
>>> Tier3-comp: passed
>>> Tier4-rt: passed
>>> Tier4-gc: passed
>>> Tier4-comp:6 jobs timed out, all other tests passed
>>> Tier5-rt: one test failed but passed when running locally, all other tests passed
>>> Tier5-gc: passed
>>> Tier5-comp: running
>>> hotspot_gc: two jobs timed out, all other tests passed
>>> hotspot_gc in CDS mode: two jobs timed out, all other tests passed
>>> vm.gc: passed
>>> vm.gc in CDS mode: passed
>>> Kichensink: passed
>>> Kichensink in CDS mode: passed
>>>
>>> Thanks,
>>> Jiangli
>>
>

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: RFR: 8179302: Pre-resolve constant pool string entries and cache resolved_reference arrays in CDS archive

coleen.phillimore

Hi, I think this change looks good except in metaspaceShared.cpp:

1632 NoSafepointVerifier nsv;


Belongs above when you create the archive cache.

1600 MetaspaceShared::create_archive_object_cache();


The call:

1646 int hash = obj->identity_hash();


Can safepoint though.   Not sure why NSV didn't have an assert. But
maybe it won't if you've eagerly installed the hashcode in these
objects, which I think you have.   Otherwise you could use
NoGCVerifier.  Or GCLocker, which the GC group won't like.

Thanks,
Coleen

On 8/3/17 8:15 PM, Jiangli Zhou wrote:

> Here are the updated webrevs.
>
> http://cr.openjdk.java.net/~jiangli/8179302/webrev.hotspot.02/
> http://cr.openjdk.java.net/~jiangli/8179302/webrev.whitebox.02/
>
> Changes in the updated webrevs include:
> Merge with Ioi’s recent shared space auto-sizing change (8072061)
> Addressed all feedbacks from Ioi and Coleen (Thanks for detailed review!)
>
> Thanks,
> Jiangli
>
>
>> On Aug 1, 2017, at 5:29 PM, Jiangli Zhou <[hidden email]> wrote:
>>
>> Hi Ioi,
>>
>> Thank you so much for reviewing this. I’ve addressed all your feedbacks. Please see details below. I’ll updated the webrev after addressing Coleen’s comments.
>>
>>> On Jul 30, 2017, at 9:07 PM, Ioi Lam <[hidden email]> wrote:
>>>
>>> Hi Jiangli,
>>>
>>> Here are my comments. I've not reviewed the GC code and I'll leave that to the GC experts :-)
>>>
>>> stringTable.cpp: StringTable::archive_string
>>>
>>>     add assert for DumpSharedSpaces only
>> Ok.
>>
>>> filemap.cpp
>>>
>>> 525 void FileMapInfo::write_archive_heap_regions(GrowableArray<MemRegion> *regions,
>>> 526                                              int first_region, int num_regions) {
>>>
>>> When I first read this function, I found it hard to follow, especially this part that coalesces the trailing regions:
>>>
>>> 537         int len = regions->length();
>>> 538         if (len > 1) {
>>> 539           start = (char*)regions->at(1).start();
>>> 540           size = (char*)regions->at(len - 1).end() - start;
>>> 541         }
>>> 542       }
>>>
>>> The rest of filemap.cpp always perform identical operations on MemRegion arrays, which are either 1 or 2 in size. However, this function doesn't follow that pattern; it also has a very different notion of "region", and the confusing part is regions->size() is not the same as num_regions.
>>>
>>> How about we change the API to something like the following? Before calling this API, the caller needs to coalesce the trailing G1 regions into a single MemRegion.
>>>
>>>      FileMapInfo::write_archive_heap_regions(MemRegion *regions, int first, int num_regions) {
>>>         if (first == MetaspaceShared::first_string) {
>>>            assert(num_regons <=  MetaspaceShared::max_strings, "...");
>>>         } else {
>>>            assert(first == MetaspaceShared::first_open_archive_heap_region, "...");
>>>            assert(num_regons <= MetaspaceShared::max_open_archive_heap_region, "...");
>>> }
>>>         ....
>>>
>>>
>> I’ve reworked the function and simplified the code.
>>
>>> 756   if (!string_data_mapped) {
>>> 757     StringTable::ignore_shared_strings(true);
>>> 758     assert(string_ranges == NULL && num_string_ranges == 0, "sanity");
>>> 759   }
>>> 760
>>> 761   if (open_archive_heap_data_mapped) {
>>> 762     MetaspaceShared::set_open_archive_heap_region_mapped();
>>> 763   } else {
>>> 764     assert(open_archive_heap_ranges == NULL && num_open_archive_heap_ranges == 0, "sanity");
>>> 765   }
>>>
>>> Maybe the two "if" statements should be more consistent? Instead of StringTable::ignore_shared_strings, how about StringTable::set_shared_strings_region_mapped()?
>> Fixed.
>>
>>> FileMapInfo::map_heap_data() --
>>>
>>> 818     char* addr = (char*)regions[i].start();
>>> 819     char* base = os::map_memory(_fd, _full_path, si->_file_offset,
>>> 820                                 addr, regions[i].byte_size(), si->_read_only,
>>> 821                                 si->_allow_exec);
>>>
>>> What happens when the first region succeeds to map but the second region fails to map? Will both regions be unmapped? I don't see where you store the return value (base) from os::map_memory(). Does it mean the code assumes that (addr == base). If so, we need an assert here.
>> If any of the region fails to map, we bail out and call dealloc_archive_heap_regions(), which handles the deallocation of any regions specified. If second region fails to map, all memory ranges specified by ‘regions’ array are deallocated. We don’t unmap the memory here since it is part of the java heap. Unmapping of heap memory are handled by GC code. The ‘if’ check below makes sure base == addr.
>>
>>      if (base == NULL || base != addr) {
>>        // dealloc the regions from java heap
>>        dealloc_archive_heap_regions(regions, region_num);
>>        if (log_is_enabled(Info, cds)) {
>>          log_info(cds)("UseSharedSpaces: Unable to map at required address in java heap.");
>>        }
>>        return false;
>>      }
>>
>>
>>> constantPool.cpp
>>>
>>>      Handle refs_handle;
>>>      ...
>>>      refs_handle = Handle(THREAD, (oop)archived);
>>>
>>> This will first create a NULL handle, then construct a temporary handle, and then assign the temp handle back to the null handle. This means two handles will be pushed onto THREAD->metadata_handles()
>>>
>>> I think it's more efficient if you merge these into a single statement
>>>
>>>      Handle refs_handle(THREAD, (oop)archived);
>> Fixed.
>>
>>> Is this experimental code? Maybe it should be removed?
>>>
>>> 664     if (tag_at(index).is_unresolved_klass()) {
>>> 665 #if 0
>>> 666       CPSlot entry = cp->slot_at(index);
>>> 667       Symbol* name = entry.get_symbol();
>>> 668       Klass* k = SystemDictionary::find(name, NULL, NULL, THREAD);
>>> 669       if (k != NULL) {
>>> 670         klass_at_put(index, k);
>>> 671       }
>>> 672 #endif
>>> 673     } else
>> Removed.
>>
>>> cpCache.hpp:
>>>
>>>      u8   _archived_references
>>>
>>> shouldn't this be declared as an narrowOop to avoid the type casts when it's used?
>> Ok.
>>
>>> cpCache.cpp:
>>>
>>>     add assert so that one of these is used only at dump time and the other only at run time?
>>>
>>> 610 oop ConstantPoolCache::archived_references() {
>>> 611   return oopDesc::decode_heap_oop((narrowOop)_archived_references);
>>> 612 }
>>> 613
>>> 614 void ConstantPoolCache::set_archived_references(oop o) {
>>> 615   _archived_references = (u8)oopDesc::encode_heap_oop(o);
>>> 616 }
>> Ok.
>>
>> Thanks!
>>
>> Jiangli
>>
>>> Thanks!
>>> - Ioi
>>>
>>> On 7/27/17 1:37 PM, Jiangli Zhou wrote:
>>>> Sorry, the mail didn’t handle the rich text well. I fixed the format below.
>>>>
>>>> Please help review the changes for JDK-8179302 (Pre-resolve constant pool string entries and cache resolved_reference arrays in CDS archive). Currently, the CDS archive can contain cached class metadata, interned java.lang.String objects. This RFE adds the constant pool ‘resolved_references’ arrays (hotspot specific) to the archive for startup/runtime performance enhancement. The ‘resolved_references' arrays are used to hold references of resolved constant pool entries including Strings, mirrors, etc. With the 'resolved_references’ being cached, string constants in shared classes can now be resolved to existing interned java.lang.Strings at CDS dump time. G1 and 64-bit platforms are required.
>>>>
>>>> The GC changes in the RFE were discussed and guided by Thomas Schatzl and GC team. Part of the changes were contributed by Thomas himself.
>>>> RFE:        https://bugs.openjdk.java.net/browse/JDK-8179302
>>>> hotspot:   http://cr.openjdk.java.net/~jiangli/8179302/webrev.hotspot.01/
>>>> whitebox: http://cr.openjdk.java.net/~jiangli/8179302/webrev.whitebox.01/
>>>>
>>>> Please see below for details of supporting cached ‘resolved_references’ and pre-resolving string constants.
>>>>
>>>> Types of Pinned G1 Heap Regions
>>>>
>>>> The pinned region type is a super type of all archive region types, which include the open archive type and the closed archive type.
>>>>
>>>> 00100 0 [ 8] Pinned Mask
>>>> 01000 0 [16] Old Mask
>>>> 10000 0 [32] Archive Mask
>>>> 11100 0 [56] Open Archive:   ArchiveMask | PinnedMask | OldMask
>>>> 11100 1 [57] Closed Archive: ArchiveMask | PinnedMask | OldMask + 1
>>>>
>>>>
>>>> Pinned Regions
>>>>
>>>> Objects within the region are 'pinned', which means GC does not move any live objects. GC scans and marks objects in the pinned region as normal, but skips forwarding live objects. Pointers in live objects are updated. Dead objects (unreachable) can be collected and freed.
>>>>
>>>> Archive Regions
>>>>
>>>> The archive types are sub-types of 'pinned'. There are two types of archive region currently, open archive and closed archive. Both can support caching java heap objects via the CDS archive.
>>>>
>>>> An archive region is also an old region by design.
>>>>
>>>> Open Archive (GC-RW) Regions
>>>>
>>>> Open archive region is GC writable. GC scans & marks objects within the region and adjusts (updates) pointers in live objects the same way as a pinned region. Live objects (reachable) are pinned and not forwarded by GC.
>>>> Open archive region does not have 'dead' objects. Unreachable objects are 'dormant' objects. Dormant objects are not collected and freed by GC.
>>>>
>>>> Adjustable Outgoing Pointers
>>>>
>>>> As GC can adjust pointers within the live objects in open archive heap region, objects can have outgoing pointers to another java heap region, including closed archive region, open archive region, pinned (or humongous) region, and normal generational region. When a referenced object is moved by GC, the pointer within the open archive region is updated accordingly.
>>>>
>>>> Closed Archive (GC-RO) Regions
>>>>
>>>> The closed archive region is GC read-only region. GC cannot write into the region. Objects are not scanned and marked by GC. Objects are pinned and not forwarded. Pointers are not updated by GC either. Hence, objects within the archive region cannot have any outgoing pointers to another java heap region. Objects however can still have pointers to other objects within the closed archive regions (we might allow pointers to open archive regions in the future). That restricts the type of java objects that can be supported by the archive region.
>>>> In JDK 9 we support archive Strings with the archive regions.
>>>>
>>>> The GC-readonly archive region makes java heap memory sharable among different JVM processes. NOTE: synchronization on the objects within the archive heap region can still cause writes to the memory page.
>>>>
>>>> Dormant Objects
>>>>
>>>> Dormant objects are unreachable java objects within the open archive heap region.
>>>> A java object in the open archive heap region is a live object if it can be reached during scanning. Some of the java objects in the region may not be reachable during scanning. Those objects are considered as dormant, but not dead. For example, a constant pool 'resolved_references' array is reachable via the klass root if its container klass (shared) is already loaded at the time during GC scanning. If a shared klass is not yet loaded, the klass root is not scanned and it's constant pool 'resolved_reference' array (A) in the open archive region is not reachable. Then A is a dormant object.
>>>>
>>>> Object State Transition
>>>>
>>>> All java objects are initially dormant objects when open archive heap regions are mapped to the runtime java heap. A dormant object becomes live object when the associated shared class is loaded at runtime. Explicit call to G1SATBCardTableModRefBS::enqueue() needs to be made when a dormant object becomes live. That should be the case for cached objects with strong roots as well, since strong roots are only scanned at the start of GC marking (the initial marking) but not during Remarking/Final marking. If a cached object becomes live during concurrent marking phase, G1 may not find it and mark it live unless a call to G1SATBCardTableModRefBS::enqueue() is made for the object.
>>>>
>>>> Currently, a live object in the open archive heap region cannot become dormant again. This restriction simplifies GC requirement and guarantees all outgoing pointers are updated by GC correctly. Only objects for shared classes from the builtin class loaders (boot, PlatformClassLoaders, and AppClassLoaders) are supported for caching.
>>>>
>>>> Caching Java Objects at Archive Dump Time
>>>>
>>>> The closed archive and open archive regions are allocated near the top of the dump time java heap. Archived java objects are copied into the designated archive heap regions. For example, String objects and the underlying 'value' arrays are copied into the closed archive regions. All references to the archived objects (from shared class metadata, string table, etc) are set to the new heap locations. A hash table is used to keep track of all archived java objects during the copying process to make sure java object is not archived more than once if reached from different roots. It also makes sure references to the same archived object are updated using the same new address location.
>>>>
>>>> Caching Constant Pool resolved_references Array
>>>>
>>>> The 'resolved_references' is an array that holds references of resolved constant pool entries including Strings, mirrors and methodTypes, etc. Each loaded class has one 'resolved_references' array (in ConstantPoolCache). The 'resolved_references' arrays are copied into the open archive regions during dump process. Prior to copying the 'resolved_references' arrays, JVM iterates through constant pool entries and resolves all JVM_CONSTANT_String entries to existing interned Strings for all archived classes. When resolving, JVM only looks up the string table and finds existing interned Strings without inserting new ones. If a string entry cannot be resolved to an existing interned String, the constant pool entry remain as unresolved. That prevents memory waste if a constant pool string entry is never used at runtime.
>>>>
>>>> All String objects referenced by the string table are copied first into the closed archive regions. The string table entry is updated with the new location when each String object is archived. The JVM updates the resolved constant pool string entries with the new object locations when copying the 'resolved_references' arrays to the open archive regions. References to the 'resolved_references' arrays in the ConstantPoolCache are also updated.
>>>> At runtime as part of ConstantPool::restore_unshareable_info() work, call G1SATBCardTableModRefBS::enqueue() to let GC know the 'resolved_references' is becoming live. A handle is created for the cached object and added to the loader_data's handles.
>>>>
>>>> Runtime Java Heap With Cached Java Objects
>>>>
>>>>
>>>> The closed archive regions (the string regions) and open archive regions are mapped to the runtime java heap at the same offsets as the dump time offsets from the runtime java heap base.
>>>>
>>>> Preliminary test execution and status:
>>>>
>>>> JPRT: passed
>>>> Tier2-rt: passed
>>>> Tier2-gc: passed
>>>> Tier2-comp: passed
>>>> Tier3-rt: passed
>>>> Tier3-gc: passed
>>>> Tier3-comp: passed
>>>> Tier4-rt: passed
>>>> Tier4-gc: passed
>>>> Tier4-comp:6 jobs timed out, all other tests passed
>>>> Tier5-rt: one test failed but passed when running locally, all other tests passed
>>>> Tier5-gc: passed
>>>> Tier5-comp: running
>>>> hotspot_gc: two jobs timed out, all other tests passed
>>>> hotspot_gc in CDS mode: two jobs timed out, all other tests passed
>>>> vm.gc: passed
>>>> vm.gc in CDS mode: passed
>>>> Kichensink: passed
>>>> Kichensink in CDS mode: passed
>>>>
>>>> Thanks,
>>>> Jiangli

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: RFR: 8179302: Pre-resolve constant pool string entries and cache resolved_reference arrays in CDS archive

Ioi Lam
In reply to this post by Jiangli Zhou
Hi Jiangli,

The code looks good in general. I just have a few pet peeves for
readability:


(1) stringTable.cpp and metaspaceShared.cpp have the same asserts

  704   assert(UseG1GC, "Only support G1 GC");
  705   assert(UseCompressedOops && UseCompressedClassPointers,
  706          "Only support UseCompressedOops and
UseCompressedClassPointers enabled");

1615   assert(UseG1GC, "Only support G1 GC");
1616   assert(UseCompressedOops && UseCompressedClassPointers,
1617          "Only support UseCompressedOops and
UseCompressedClassPointers enabled");

Maybe it's better to combine them into a single function like
MetaspaceShared::assert_vm_flags() so they don't get out of sync?



(2) FileMapInfo::write_archive_heap_regions()

I still find this code very hard to read, especially due to the loop.

First, the comments are not consistent with the code:

     498   assert(arr_len <= max_num_regions, "number of memory regions
exceeds maximum");

but the comments says: "The rest are consecutive full GC regions" which
means there's a chance for max_num_regions to be more than 2 (which will
be the case with Calvin's java-loader dumping changes using very small
heap size).So the code is actually wrong.

The word "region" is used in these parameters, but they don't mean the
same thing.

               GrowableArray<MemRegion> *regions
               int first_region, int max_num_regions,


How about regions      -> g1_regions_list
           first_region -> first_region_in_archive


In the comments, I find the phrase 'the current archive heap region'
ambiguous. It could be (erroneously) interpreted as "a region from the
currently mapped archive"

To make it unambiguous, how about changing


  464 // Write the current archive heap region, which contains one or
multiple GC(G1) regions.


to

     // Write the given list of G1 memory regions into the archive,
starting at
     // first_region_in_archive.


Also, for the explanation of how the G1 regions are written into the
archive, how about:

    // The G1 regions in the list are sorted in ascending address order.
When there are more objects
    // than the capacity of a single G1 region, the bottom-most G1
region may be partially filled, and the
    // remaining G1 region(s) are consecutively allocated and fully filled.
    //
    // Thus, the bottom-most G1 region (if not empty) is written into
first_region_in_archive.
    // The remaining G1 regions (if exist) are coalesced and written as
a single block
    // into (first_region_in_archive + 1)

    // Here's the mapping from (g1 regions) -> (archive regions).


All this function needs to do is to decide the values for

      r0_start, r0_top
      r1_start, r1_top

I think it would be much better to not use the loop, and not use the
max_num_regions parameter (it's always 2 anyway).

      *r0_start = *r0_top = NULL;
      *r1_start = *r1_top = NULL;

      if (arr_len >= 1) {
          *r0_start = regions->at(0).start();
          *r0_end = *r0_start + regions->at(0).byte_size();
      }
      if (arr_len >= 2) {
          int last = arr_len - 1;
          *r1_start = regions->at(1).start();
          *r1_end = regions->at(last).start() +
regions->at(last).byte_size();
}

what do you think?



(3) metaspace.cpp

3350         // Map the archived heap regions after compressed pointers
3351         // because it relies on compressed class pointers setting
to work

do you mean this?

     // Archived heap regions depend on the parameters of compressed
class pointers, so
     // they must be mapped after such parameters have been decided in
the above call.


(4) I found this name not strictly grammatical. How about this:

      allow_archive_heap_object -> is_heap_object_archiving_allowed

(5) in most of your code, 'archive' is used as a noun, except in
StringTable::archive_string() where it's used as a verb.

archive_string could also be interpreted erroneously as "return a string
that's already in the archive". So to be consistent and unambiguous, I
think it's better to rename it to StringTable::create_archived_string()


Thanks
- Ioi


On 8/3/17 5:15 PM, Jiangli Zhou wrote:

> Here are the updated webrevs.
>
> http://cr.openjdk.java.net/~jiangli/8179302/webrev.hotspot.02/ 
> <http://cr.openjdk.java.net/%7Ejiangli/8179302/webrev.hotspot.02/>
> http://cr.openjdk.java.net/~jiangli/8179302/webrev.whitebox.02/
>
> Changes in the updated webrevs include:
>
>   * Merge with Ioi’s recent shared space auto-sizing change (8072061)
>   * Addressed all feedbacks from Ioi and Coleen (Thanks for detailed
>     review!)
>
>
> Thanks,
> Jiangli
>
>
>> On Aug 1, 2017, at 5:29 PM, Jiangli Zhou <[hidden email]> wrote:
>>
>> Hi Ioi,
>>
>> Thank you so much for reviewing this. I’ve addressed all your
>> feedbacks. Please see details below. I’ll updated the webrev
>> after addressing Coleen’s comments.
>>
>>> On Jul 30, 2017, at 9:07 PM, Ioi Lam <[hidden email]> wrote:
>>>
>>> Hi Jiangli,
>>>
>>> Here are my comments. I've not reviewed the GC code and I'll leave
>>> that to the GC experts :-)
>>>
>>> stringTable.cpp: StringTable::archive_string
>>>
>>>    add assert for DumpSharedSpaces only
>>
>> Ok.
>>
>>>
>>> filemap.cpp
>>>
>>> 525 void
>>> FileMapInfo::write_archive_heap_regions(GrowableArray<MemRegion>
>>> *regions,
>>> 526                                              int first_region,
>>> int num_regions) {
>>>
>>> When I first read this function, I found it hard to follow,
>>> especially this part that coalesces the trailing regions:
>>>
>>> 537         int len = regions->length();
>>> 538         if (len > 1) {
>>> 539           start = (char*)regions->at(1).start();
>>> 540           size = (char*)regions->at(len - 1).end() - start;
>>> 541         }
>>> 542       }
>>>
>>> The rest of filemap.cpp always perform identical operations on
>>> MemRegion arrays, which are either 1 or 2 in size. However,
>>> this function doesn't follow that pattern; it also has a very
>>> different notion of "region", and the confusing part is
>>> regions->size() is not the same as num_regions.
>>>
>>> How about we change the API to something like the following? Before
>>> calling this API, the caller needs to coalesce the trailing
>>> G1 regions into a single MemRegion.
>>>
>>>     FileMapInfo::write_archive_heap_regions(MemRegion *regions, int
>>> first, int num_regions) {
>>>        if (first == MetaspaceShared::first_string) {
>>>           assert(num_regons <=  MetaspaceShared::max_strings, "...");
>>>        } else {
>>>           assert(first ==
>>> MetaspaceShared::first_open_archive_heap_region, "...");
>>>           assert(num_regons <=
>>> MetaspaceShared::max_open_archive_heap_region, "...");
>>> }
>>>        ....
>>>
>>>
>>
>> I’ve reworked the function and simplified the code.
>>
>>>
>>> 756   if (!string_data_mapped) {
>>> 757     StringTable::ignore_shared_strings(true);
>>> 758     assert(string_ranges == NULL && num_string_ranges == 0,
>>> "sanity");
>>> 759   }
>>> 760
>>> 761   if (open_archive_heap_data_mapped) {
>>> 762 MetaspaceShared::set_open_archive_heap_region_mapped();
>>> 763   } else {
>>> 764     assert(open_archive_heap_ranges == NULL &&
>>> num_open_archive_heap_ranges == 0, "sanity");
>>> 765   }
>>>
>>> Maybe the two "if" statements should be more consistent? Instead of
>>> StringTable::ignore_shared_strings, how
>>> about StringTable::set_shared_strings_region_mapped()?
>>
>> Fixed.
>>
>>>
>>> FileMapInfo::map_heap_data() --
>>>
>>> 818     char* addr = (char*)regions[i].start();
>>> 819     char* base = os::map_memory(_fd, _full_path, si->_file_offset,
>>> 820                                 addr, regions[i].byte_size(),
>>> si->_read_only,
>>> 821                                 si->_allow_exec);
>>>
>>> What happens when the first region succeeds to map but the second
>>> region fails to map? Will both regions be unmapped? I don't see
>>> where you store the return value (base) from os::map_memory(). Does
>>> it mean the code assumes that (addr == base). If so, we need an
>>> assert here.
>>
>> If any of the region fails to map, we bail out and call
>> dealloc_archive_heap_regions(), which handles the deallocation of any
>> regions specified. If second region fails to map, all memory ranges
>> specified by ‘regions’ array are deallocated. We don’t unmap the
>> memory here since it is part of the java heap. Unmapping of heap
>> memory are handled by GC code. The ‘if’ check below makes sure base
>> == addr.
>>
>>     if (base == NULL || base != addr) {
>>       // dealloc the regions from java heap
>>       dealloc_archive_heap_regions(regions, region_num);
>>       if (log_is_enabled(Info, cds)) {
>>         log_info(cds)("UseSharedSpaces: Unable to map at required
>> address in java heap.");
>>       }
>>       return false;
>>     }
>>
>>
>>>
>>> constantPool.cpp
>>>
>>>     Handle refs_handle;
>>>     ...
>>>     refs_handle = Handle(THREAD, (oop)archived);
>>>
>>> This will first create a NULL handle, then construct a temporary
>>> handle, and then assign the temp handle back to the null
>>> handle. This means two handles will be pushed onto
>>> THREAD->metadata_handles()
>>>
>>> I think it's more efficient if you merge these into a single statement
>>>
>>>     Handle refs_handle(THREAD, (oop)archived);
>>
>> Fixed.
>>
>>>
>>> Is this experimental code? Maybe it should be removed?
>>>
>>> 664     if (tag_at(index).is_unresolved_klass()) {
>>> 665 #if 0
>>> 666       CPSlot entry = cp->slot_at(index);
>>> 667       Symbol* name = entry.get_symbol();
>>> 668       Klass* k = SystemDictionary::find(name, NULL, NULL, THREAD);
>>> 669       if (k != NULL) {
>>> 670         klass_at_put(index, k);
>>> 671       }
>>> 672 #endif
>>> 673     } else
>>
>> Removed.
>>
>>>
>>> cpCache.hpp:
>>>
>>>     u8   _archived_references
>>>
>>> shouldn't this be declared as an narrowOop to avoid the type casts
>>> when it's used?
>>
>> Ok.
>>
>>>
>>> cpCache.cpp:
>>>
>>>    add assert so that one of these is used only at dump time and the
>>> other only at run time?
>>>
>>> 610 oop ConstantPoolCache::archived_references() {
>>> 611   return oopDesc::decode_heap_oop((narrowOop)_archived_references);
>>> 612 }
>>> 613
>>> 614 void ConstantPoolCache::set_archived_references(oop o) {
>>> 615   _archived_references = (u8)oopDesc::encode_heap_oop(o);
>>> 616 }
>>
>> Ok.
>>
>> Thanks!
>>
>> Jiangli
>>
>>>
>>> Thanks!
>>> - Ioi
>>>
>>> On 7/27/17 1:37 PM, Jiangli Zhou wrote:
>>>> Sorry, the mail didn’t handle the rich text well. I fixed the
>>>> format below.
>>>>
>>>> Please help review the changes for JDK-8179302 (Pre-resolve
>>>> constant pool string entries and cache resolved_reference arrays
>>>> in CDS archive). Currently, the CDS archive can contain cached
>>>> class metadata, interned java.lang.String objects. This RFE adds
>>>> the constant pool ‘resolved_references’ arrays (hotspot specific)
>>>> to the archive for startup/runtime performance enhancement.
>>>> The ‘resolved_references' arrays are used to hold references of
>>>> resolved constant pool entries including Strings, mirrors, etc.
>>>> With the 'resolved_references’ being cached, string constants in
>>>> shared classes can now be resolved to existing interned
>>>> java.lang.Strings at CDS dump time. G1 and 64-bit platforms are
>>>> required.
>>>>
>>>> The GC changes in the RFE were discussed and guided by Thomas
>>>> Schatzl and GC team. Part of the changes were contributed by Thomas
>>>> himself.
>>>> RFE:  https://bugs.openjdk.java.net/browse/JDK-8179302
>>>> hotspot:
>>>>   http://cr.openjdk.java.net/~jiangli/8179302/webrev.hotspot.01/
>>>> whitebox: http://cr.openjdk.java.net/~jiangli/8179302/webrev.whitebox.01/
>>>>
>>>> Please see below for details of supporting cached
>>>> ‘resolved_references’ and pre-resolving string constants.
>>>>
>>>> Types of Pinned G1 Heap Regions
>>>>
>>>> The pinned region type is a super type of all archive region types,
>>>> which include the open archive type and the closed archive type.
>>>>
>>>> 00100 0 [ 8] Pinned Mask
>>>> 01000 0 [16] Old Mask
>>>> 10000 0 [32] Archive Mask
>>>> 11100 0 [56] Open Archive:   ArchiveMask | PinnedMask | OldMask
>>>> 11100 1 [57] Closed Archive: ArchiveMask | PinnedMask | OldMask + 1
>>>>
>>>>
>>>> Pinned Regions
>>>>
>>>> Objects within the region are 'pinned', which means GC does not
>>>> move any live objects. GC scans and marks objects in the
>>>> pinned region as normal, but skips forwarding live objects.
>>>> Pointers in live objects are updated. Dead objects (unreachable)
>>>> can be collected and freed.
>>>>
>>>> Archive Regions
>>>>
>>>> The archive types are sub-types of 'pinned'. There are two types of
>>>> archive region currently, open archive and closed archive. Both can
>>>> support caching java heap objects via the CDS archive.
>>>>
>>>> An archive region is also an old region by design.
>>>>
>>>> Open Archive (GC-RW) Regions
>>>>
>>>> Open archive region is GC writable. GC scans & marks objects within
>>>> the region and adjusts (updates) pointers in live objects the same
>>>> way as a pinned region. Live objects (reachable) are pinned and not
>>>> forwarded by GC.
>>>> Open archive region does not have 'dead' objects. Unreachable
>>>> objects are 'dormant' objects. Dormant objects are not
>>>> collected and freed by GC.
>>>>
>>>> Adjustable Outgoing Pointers
>>>>
>>>> As GC can adjust pointers within the live objects in open archive
>>>> heap region, objects can have outgoing pointers to another
>>>> java heap region, including closed archive region, open archive
>>>> region, pinned (or humongous) region, and normal generational
>>>> region. When a referenced object is moved by GC, the pointer within
>>>> the open archive region is updated accordingly.
>>>>
>>>> Closed Archive (GC-RO) Regions
>>>>
>>>> The closed archive region is GC read-only region. GC cannot write
>>>> into the region. Objects are not scanned and marked by GC. Objects
>>>> are pinned and not forwarded. Pointers are not updated by GC
>>>> either. Hence, objects within the archive region cannot have any
>>>> outgoing pointers to another java heap region. Objects however can
>>>> still have pointers to other objects within the closed archive
>>>> regions (we might allow pointers to open archive regions in the
>>>> future). That restricts the type of java objects that can
>>>> be supported by the archive region.
>>>> In JDK 9 we support archive Strings with the archive regions.
>>>>
>>>> The GC-readonly archive region makes java heap memory sharable
>>>> among different JVM processes. NOTE: synchronization on the objects
>>>> within the archive heap region can still cause writes to the memory
>>>> page.
>>>>
>>>> Dormant Objects
>>>>
>>>> Dormant objects are unreachable java objects within the open
>>>> archive heap region.
>>>> A java object in the open archive heap region is a live object if
>>>> it can be reached during scanning. Some of the java objects in
>>>> the region may not be reachable during scanning. Those objects are
>>>> considered as dormant, but not dead. For example, a constant pool
>>>> 'resolved_references' array is reachable via the klass root if its
>>>> container klass (shared) is already loaded at the time during GC
>>>> scanning. If a shared klass is not yet loaded, the klass root is
>>>> not scanned and it's constant pool 'resolved_reference' array
>>>> (A) in the open archive region is not reachable. Then A is a
>>>> dormant object.
>>>>
>>>> Object State Transition
>>>>
>>>> All java objects are initially dormant objects when open archive
>>>> heap regions are mapped to the runtime java heap. A dormant object
>>>> becomes live object when the associated shared class is loaded at
>>>> runtime. Explicit call to G1SATBCardTableModRefBS::enqueue() needs
>>>> to be made when a dormant object becomes live. That should be the
>>>> case for cached objects with strong roots as well, since strong
>>>> roots are only scanned at the start of GC marking (the initial
>>>> marking) but not during Remarking/Final marking. If a cached object
>>>> becomes live during concurrent marking phase, G1 may not find it
>>>> and mark it live unless a call to
>>>> G1SATBCardTableModRefBS::enqueue() is made for the object.
>>>>
>>>> Currently, a live object in the open archive heap region cannot
>>>> become dormant again. This restriction simplifies GC
>>>> requirement and guarantees all outgoing pointers are updated by GC
>>>> correctly. Only objects for shared classes from the builtin class
>>>> loaders (boot, PlatformClassLoaders, and AppClassLoaders) are
>>>> supported for caching.
>>>>
>>>> Caching Java Objects at Archive Dump Time
>>>>
>>>> The closed archive and open archive regions are allocated near the
>>>> top of the dump time java heap. Archived java objects are copied
>>>> into the designated archive heap regions. For example, String
>>>> objects and the underlying 'value' arrays are copied into
>>>> the closed archive regions. All references to the archived objects
>>>> (from shared class metadata, string table, etc) are set to the
>>>> new heap locations. A hash table is used to keep track of all
>>>> archived java objects during the copying process to make sure java
>>>> object is not archived more than once if reached from different
>>>> roots. It also makes sure references to the same archived object
>>>> are updated using the same new address location.
>>>>
>>>> Caching Constant Pool resolved_references Array
>>>>
>>>> The 'resolved_references' is an array that holds references of
>>>> resolved constant pool entries including Strings, mirrors
>>>> and methodTypes, etc. Each loaded class has one
>>>> 'resolved_references' array (in ConstantPoolCache). The
>>>> 'resolved_references' arrays are copied into the open archive
>>>> regions during dump process. Prior to copying the
>>>> 'resolved_references' arrays, JVM iterates through constant pool
>>>> entries and resolves all JVM_CONSTANT_String entries to existing
>>>> interned Strings for all archived classes. When resolving, JVM only
>>>> looks up the string table and finds existing interned Strings
>>>> without inserting new ones. If a string entry cannot be resolved to
>>>> an existing interned String, the constant pool entry remain as
>>>> unresolved. That prevents memory waste if a constant pool string
>>>> entry is never used at runtime.
>>>>
>>>> All String objects referenced by the string table are copied first
>>>> into the closed archive regions. The string table entry is
>>>> updated with the new location when each String object is archived.
>>>> The JVM updates the resolved constant pool string entries with the
>>>> new object locations when copying the 'resolved_references' arrays
>>>> to the open archive regions. References to
>>>> the 'resolved_references' arrays in the ConstantPoolCache are also
>>>> updated.
>>>> At runtime as part of ConstantPool::restore_unshareable_info()
>>>> work, call G1SATBCardTableModRefBS::enqueue() to let GC know the
>>>> 'resolved_references' is becoming live. A handle is created for the
>>>> cached object and added to the loader_data's handles.
>>>>
>>>> Runtime Java Heap With Cached Java Objects
>>>>
>>>>
>>>> The closed archive regions (the string regions) and open archive
>>>> regions are mapped to the runtime java heap at the same offsets as
>>>> the dump time offsets from the runtime java heap base.
>>>>
>>>> Preliminary test execution and status:
>>>>
>>>> JPRT: passed
>>>> Tier2-rt: passed
>>>> Tier2-gc: passed
>>>> Tier2-comp: passed
>>>> Tier3-rt: passed
>>>> Tier3-gc: passed
>>>> Tier3-comp: passed
>>>> Tier4-rt: passed
>>>> Tier4-gc: passed
>>>> Tier4-comp:6 jobs timed out, all other tests passed
>>>> Tier5-rt: one test failed but passed when running locally, all
>>>> other tests passed
>>>> Tier5-gc: passed
>>>> Tier5-comp: running
>>>> hotspot_gc: two jobs timed out, all other tests passed
>>>> hotspot_gc in CDS mode: two jobs timed out, all other tests passed
>>>> vm.gc: passed
>>>> vm.gc in CDS mode: passed
>>>> Kichensink: passed
>>>> Kichensink in CDS mode: passed
>>>>
>>>> Thanks,
>>>> Jiangli
>>>
>>
>

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: RFR: 8179302: Pre-resolve constant pool string entries and cache resolved_reference arrays in CDS archive

Jiangli Zhou
Hi Ioi,

Thanks for looking again.

> On Aug 4, 2017, at 2:22 PM, Ioi Lam <[hidden email]> wrote:
>
> Hi Jiangli,
>
> The code looks good in general. I just have a few pet peeves for readability:
>
>
> (1) stringTable.cpp and metaspaceShared.cpp have the same asserts
>
>  704   assert(UseG1GC, "Only support G1 GC");
>  705   assert(UseCompressedOops && UseCompressedClassPointers,
>  706          "Only support UseCompressedOops and UseCompressedClassPointers enabled");
>
> 1615   assert(UseG1GC, "Only support G1 GC");
> 1616   assert(UseCompressedOops && UseCompressedClassPointers,
> 1617          "Only support UseCompressedOops and UseCompressedClassPointers enabled");
>
> Maybe it's better to combine them into a single function like MetaspaceShared::assert_vm_flags() so they don't get out of sync?

There is a MetaspaceShared::allow_archive_heap_object(), which checks for UseG1GC, UseCompressedOops and UseCompressedClassPointers combined. It does not seem to worth add another separate API for asserting the required flags. I’ll use that in the assert.

>
>
>
> (2) FileMapInfo::write_archive_heap_regions()
>
> I still find this code very hard to read, especially due to the loop.
>
> First, the comments are not consistent with the code:
>
>     498   assert(arr_len <= max_num_regions, "number of memory regions exceeds maximum");
>
> but the comments says: "The rest are consecutive full GC regions" which means there's a chance for max_num_regions to be more than 2 (which will be the case with Calvin's java-loader dumping changes using very small heap size). So the code is actually wrong.

The max_num_regions is the maximum number of region for each archived heap space (the string space, or open archive space). We only run into the case where the MemRegion array size is larger than max_num_regions with Calvin’s pending change. As part of Calvin’s change, he will change the assert into a check and bail out if the number of MemRegions are larger than max_num_regions due to heap fragmentation.


>
> The word "region" is used in these parameters, but they don't mean the same thing.
>
>               GrowableArray<MemRegion> *regions
>               int first_region, int max_num_regions,
>
>
> How about regions      -> g1_regions_list
>           first_region -> first_region_in_archive

The GrowableArray above is the MemRegions that GC code gives back to us. The GC code combines multiple G1 regions. The comments probably are over-explaining the details, which are hidden in the GC code. Probably that’s the confusing source. I’ll make the comment more clear.

Using g1_regions_list would also be confusing, since write_archive_heap_regions does not handle G1 regions directly. It processes the MemRegion array that GC code returns. How about changing ‘regions’ to ‘mem_regions’ or ‘archive_regions'?

>
>
> In the comments, I find the phrase 'the current archive heap region' ambiguous. It could be (erroneously) interpreted as "a region from the currently mapped archive”

>
> To make it unambiguous, how about changing
>
>
>  464 // Write the current archive heap region, which contains one or multiple GC(G1) regions.
>
>
> to
>
>     // Write the given list of G1 memory regions into the archive, starting at
>     // first_region_in_archive.


Ok. How about the following:

// Write the given list of java heap memory regions into the archive, starting at
// first_region_in_archive.

>
>
> Also, for the explanation of how the G1 regions are written into the archive, how about:
>
>    // The G1 regions in the list are sorted in ascending address order. When there are more objects
>    // than the capacity of a single G1 region, the bottom-most G1 region may be partially filled, and the
>    // remaining G1 region(s) are consecutively allocated and fully filled.
>    //
>    // Thus, the bottom-most G1 region (if not empty) is written into first_region_in_archive.
>    // The remaining G1 regions (if exist) are coalesced and written as a single block
>    // into (first_region_in_archive + 1)
>
>    // Here's the mapping from (g1 regions) -> (archive regions).
>
>
> All this function needs to do is to decide the values for
>
>      r0_start, r0_top
>      r1_start, r1_top
>
> I think it would be much better to not use the loop, and not use the max_num_regions parameter (it's always 2 anyway).
>
>      *r0_start = *r0_top = NULL;
>      *r1_start = *r1_top = NULL;
>
>      if (arr_len >= 1) {
>          *r0_start = regions->at(0).start();
>          *r0_end = *r0_start + regions->at(0).byte_size();
>      }
>      if (arr_len >= 2) {
>          int last = arr_len - 1;
>          *r1_start = regions->at(1).start();
>          *r1_end = regions->at(last).start() + regions->at(last).byte_size();
>       }
>
> what do you think?

We need to write out all archive regions including the empty ones. The loop using max_num_regions is the easiest way. I’d like to remove the code that deals with r0_* and r1_ explicitly. Let me try that.

>
>
>
> (3) metaspace.cpp
>
> 3350         // Map the archived heap regions after compressed pointers
> 3351         // because it relies on compressed class pointers setting to work
>
> do you mean this?
>
>     // Archived heap regions depend on the parameters of compressed class pointers, so
>     // they must be mapped after such parameters have been decided in the above call.

Hmmm, maybe use ‘arguments’ instead of ‘parameters’?
 
>
>
> (4) I found this name not strictly grammatical. How about this:
>
>      allow_archive_heap_object -> is_heap_object_archiving_allowed

Ok.

>
> (5) in most of your code, 'archive' is used as a noun, except in StringTable::archive_string() where it's used as a verb.
>
> archive_string could also be interpreted erroneously as "return a string that's already in the archive". So to be consistent and unambiguous, I think it's better to rename it to StringTable::create_archived_string()

Ok.

Thanks,
Jiangli

>
>
> Thanks
> - Ioi
>
>
> On 8/3/17 5:15 PM, Jiangli Zhou wrote:
>> Here are the updated webrevs.
>>
>> http://cr.openjdk.java.net/~jiangli/8179302/webrev.hotspot.02/ <http://cr.openjdk.java.net/%7Ejiangli/8179302/webrev.hotspot.02/>
>> http://cr.openjdk.java.net/~jiangli/8179302/webrev.whitebox.02/ <http://cr.openjdk.java.net/~jiangli/8179302/webrev.whitebox.02/>
>>
>> Changes in the updated webrevs include:
>> Merge with Ioi’s recent shared space auto-sizing change (8072061)
>> Addressed all feedbacks from Ioi and Coleen (Thanks for detailed review!)
>>
>> Thanks,
>> Jiangli
>>
>>
>>> On Aug 1, 2017, at 5:29 PM, Jiangli Zhou <[hidden email]> <mailto:[hidden email]> wrote:
>>>
>>> Hi Ioi,
>>>
>>> Thank you so much for reviewing this. I’ve addressed all your feedbacks. Please see details below. I’ll updated the webrev after addressing Coleen’s comments.
>>>
>>>> On Jul 30, 2017, at 9:07 PM, Ioi Lam <[hidden email]> <mailto:[hidden email]> wrote:
>>>>
>>>> Hi Jiangli,
>>>>
>>>> Here are my comments. I've not reviewed the GC code and I'll leave that to the GC experts :-)
>>>>
>>>> stringTable.cpp: StringTable::archive_string
>>>>
>>>>    add assert for DumpSharedSpaces only
>>>
>>> Ok.
>>>
>>>>
>>>> filemap.cpp
>>>>
>>>> 525 void FileMapInfo::write_archive_heap_regions(GrowableArray<MemRegion> *regions,
>>>> 526                                              int first_region, int num_regions) {
>>>>
>>>> When I first read this function, I found it hard to follow, especially this part that coalesces the trailing regions:
>>>>
>>>> 537         int len = regions->length();
>>>> 538         if (len > 1) {
>>>> 539           start = (char*)regions->at(1).start();
>>>> 540           size = (char*)regions->at(len - 1).end() - start;
>>>> 541         }
>>>> 542       }
>>>>
>>>> The rest of filemap.cpp always perform identical operations on MemRegion arrays, which are either 1 or 2 in size. However, this function doesn't follow that pattern; it also has a very different notion of "region", and the confusing part is regions->size() is not the same as num_regions.
>>>>
>>>> How about we change the API to something like the following? Before calling this API, the caller needs to coalesce the trailing G1 regions into a single MemRegion.
>>>>
>>>>     FileMapInfo::write_archive_heap_regions(MemRegion *regions, int first, int num_regions) {
>>>>        if (first == MetaspaceShared::first_string) {
>>>>           assert(num_regons <=  MetaspaceShared::max_strings, "...");
>>>>        } else {
>>>>           assert(first == MetaspaceShared::first_open_archive_heap_region, "...");
>>>>           assert(num_regons <= MetaspaceShared::max_open_archive_heap_region, "...");
>>>> }
>>>>        ....
>>>>
>>>>
>>>
>>> I’ve reworked the function and simplified the code.
>>>
>>>>
>>>> 756   if (!string_data_mapped) {
>>>> 757     StringTable::ignore_shared_strings(true);
>>>> 758     assert(string_ranges == NULL && num_string_ranges == 0, "sanity");
>>>> 759   }
>>>> 760
>>>> 761   if (open_archive_heap_data_mapped) {
>>>> 762     MetaspaceShared::set_open_archive_heap_region_mapped();
>>>> 763   } else {
>>>> 764     assert(open_archive_heap_ranges == NULL && num_open_archive_heap_ranges == 0, "sanity");
>>>> 765   }
>>>>
>>>> Maybe the two "if" statements should be more consistent? Instead of StringTable::ignore_shared_strings, how             about StringTable::set_shared_strings_region_mapped()?
>>>
>>> Fixed.
>>>
>>>>
>>>> FileMapInfo::map_heap_data() --
>>>>
>>>> 818     char* addr = (char*)regions[i].start();
>>>> 819     char* base = os::map_memory(_fd, _full_path, si->_file_offset,
>>>> 820                                 addr, regions[i].byte_size(), si->_read_only,
>>>> 821                                 si->_allow_exec);
>>>>
>>>> What happens when the first region succeeds to map but the second region fails to map? Will both regions be unmapped? I don't see where you store the return value (base) from os::map_memory(). Does it mean the code assumes that (addr == base). If so, we need an assert here.
>>>
>>> If any of the region fails to map, we bail out and call dealloc_archive_heap_regions(), which handles the deallocation of any regions specified. If second region fails to map, all memory ranges specified by ‘regions’ array are deallocated. We don’t unmap the memory here since it is part of the java heap. Unmapping of heap memory are handled by GC code. The ‘if’ check below makes sure base == addr.
>>>
>>>     if (base == NULL || base != addr) {
>>>       // dealloc the regions from java heap
>>>       dealloc_archive_heap_regions(regions, region_num);
>>>       if (log_is_enabled(Info, cds)) {
>>>         log_info(cds)("UseSharedSpaces: Unable to map at required address in java heap.");
>>>       }
>>>       return false;
>>>     }
>>>
>>>
>>>>
>>>> constantPool.cpp
>>>>
>>>>     Handle refs_handle;
>>>>     ...
>>>>     refs_handle = Handle(THREAD, (oop)archived);
>>>>
>>>> This will first create a NULL handle, then construct a temporary handle, and then assign the temp handle back to the null handle. This means two handles will be pushed onto THREAD->metadata_handles()
>>>>
>>>> I think it's more efficient if you merge these into a single statement
>>>>
>>>>     Handle refs_handle(THREAD, (oop)archived);
>>>
>>> Fixed.
>>>
>>>>
>>>> Is this experimental code? Maybe it should be removed?
>>>>
>>>> 664     if (tag_at(index).is_unresolved_klass()) {
>>>> 665 #if 0
>>>> 666       CPSlot entry = cp->slot_at(index);
>>>> 667       Symbol* name = entry.get_symbol();
>>>> 668       Klass* k = SystemDictionary::find(name, NULL, NULL, THREAD);
>>>> 669       if (k != NULL) {
>>>> 670         klass_at_put(index, k);
>>>> 671       }
>>>> 672 #endif
>>>> 673     } else
>>>
>>> Removed.
>>>
>>>>
>>>> cpCache.hpp:
>>>>
>>>>     u8   _archived_references
>>>>
>>>> shouldn't this be declared as an narrowOop to avoid the type casts when it's used?
>>>
>>> Ok.
>>>
>>>>
>>>> cpCache.cpp:
>>>>
>>>>    add assert so that one of these is used only at dump time and the other only at run time?
>>>>
>>>> 610 oop ConstantPoolCache::archived_references() {
>>>> 611   return oopDesc::decode_heap_oop((narrowOop)_archived_references);
>>>> 612 }
>>>> 613
>>>> 614 void ConstantPoolCache::set_archived_references(oop o) {
>>>> 615   _archived_references = (u8)oopDesc::encode_heap_oop(o);
>>>> 616 }
>>>
>>> Ok.
>>>
>>> Thanks!
>>>
>>> Jiangli
>>>
>>>>
>>>> Thanks!
>>>> - Ioi
>>>>
>>>> On 7/27/17 1:37 PM, Jiangli Zhou wrote:
>>>>> Sorry, the mail didn’t handle the rich text well. I fixed the format below.
>>>>>
>>>>> Please help review the changes for JDK-8179302 (Pre-resolve constant pool string entries and cache resolved_reference arrays in CDS archive). Currently, the CDS archive can contain cached class metadata, interned java.lang.String objects. This RFE adds the constant pool ‘resolved_references’ arrays (hotspot specific) to the archive for startup/runtime performance enhancement. The ‘resolved_references' arrays are used to hold references of resolved constant pool entries including Strings, mirrors, etc. With the 'resolved_references’ being cached, string constants in shared classes can now be resolved to existing interned java.lang.Strings at CDS dump time. G1 and 64-bit platforms are required.
>>>>>
>>>>> The GC changes in the RFE were discussed and guided by Thomas Schatzl and GC team. Part of the changes were contributed by Thomas himself.
>>>>> RFE:        https://bugs.openjdk.java.net/browse/JDK-8179302 <https://bugs.openjdk.java.net/browse/JDK-8179302>
>>>>> hotspot:   http://cr.openjdk.java.net/~jiangli/8179302/webrev.hotspot.01/ <http://cr.openjdk.java.net/~jiangli/8179302/webrev.hotspot.01/>
>>>>> whitebox: http://cr.openjdk.java.net/~jiangli/8179302/webrev.whitebox.01/ <http://cr.openjdk.java.net/~jiangli/8179302/webrev.whitebox.01/>
>>>>>
>>>>> Please see below for details of supporting cached ‘resolved_references’ and pre-resolving string constants.
>>>>>
>>>>> Types of Pinned G1 Heap Regions
>>>>>
>>>>> The pinned region type is a super type of all archive region types, which include the open archive type and the closed archive type.
>>>>>
>>>>> 00100 0 [ 8] Pinned Mask
>>>>> 01000 0 [16] Old Mask
>>>>> 10000 0 [32] Archive Mask
>>>>> 11100 0 [56] Open Archive:   ArchiveMask | PinnedMask | OldMask
>>>>> 11100 1 [57] Closed Archive: ArchiveMask | PinnedMask | OldMask + 1
>>>>>
>>>>>
>>>>> Pinned Regions
>>>>>
>>>>> Objects within the region are 'pinned', which means GC does not move any live objects. GC scans and marks objects in the pinned region as normal, but skips forwarding live objects. Pointers in live objects are updated. Dead objects (unreachable) can be collected and freed.
>>>>>
>>>>> Archive Regions
>>>>>
>>>>> The archive types are sub-types of 'pinned'. There are two types of archive region currently, open archive and closed archive. Both can support caching java heap objects via the CDS archive.
>>>>>
>>>>> An archive region is also an old region by design.
>>>>>
>>>>> Open Archive (GC-RW) Regions
>>>>>
>>>>> Open archive region is GC writable. GC scans & marks objects within the region and adjusts (updates) pointers in live objects the same way as a pinned region. Live objects (reachable) are pinned and not forwarded by GC.
>>>>> Open archive region does not have 'dead' objects. Unreachable objects are 'dormant' objects. Dormant objects are not collected and freed by GC.
>>>>>
>>>>> Adjustable Outgoing Pointers
>>>>>
>>>>> As GC can adjust pointers within the live objects in open archive heap region, objects can have outgoing pointers to another java heap region, including closed archive region, open archive region, pinned (or humongous) region, and normal generational region. When a referenced object is moved by GC, the pointer within the open archive region is updated accordingly.
>>>>>
>>>>> Closed Archive (GC-RO) Regions
>>>>>
>>>>> The closed archive region is GC read-only region. GC cannot write into the region. Objects are not scanned and marked by GC. Objects are pinned and not forwarded. Pointers are not updated by GC either. Hence, objects within the archive region cannot have any outgoing pointers to another java heap region. Objects however can still have pointers to other objects within the closed archive regions (we might allow pointers to open archive regions in the future). That restricts the type of java objects that can be supported by the archive region.
>>>>> In JDK 9 we support archive Strings with the archive regions.
>>>>>
>>>>> The GC-readonly archive region makes java heap memory sharable among different JVM processes. NOTE:               synchronization on the objects within the archive heap region can still cause writes to the memory page.
>>>>>
>>>>> Dormant Objects
>>>>>
>>>>> Dormant objects are unreachable java objects within the open archive heap region.
>>>>> A java object in the open archive heap region is a live object if it can be reached during scanning. Some of the java objects in the region may not be reachable during scanning. Those objects are considered as dormant, but not dead. For example, a constant pool 'resolved_references' array is reachable via the klass root if its container klass (shared) is already loaded at the time during GC scanning. If a shared klass is not yet loaded, the klass root is not scanned and it's constant pool 'resolved_reference' array (A) in the open archive region is not reachable. Then A is a dormant object.
>>>>>
>>>>> Object State Transition
>>>>>
>>>>> All java objects are initially dormant objects when open archive heap regions are mapped to the runtime java heap. A dormant object becomes live object when the associated shared class is loaded at runtime. Explicit call to G1SATBCardTableModRefBS::enqueue() needs to be made when a dormant object becomes live. That should be the case for cached objects with strong roots as well, since strong roots are only scanned at the start of GC marking (the initial marking) but not during Remarking/Final marking. If a cached object becomes live during concurrent marking phase, G1 may not find it and mark it live unless a call to G1SATBCardTableModRefBS::enqueue() is made for the object.
>>>>>
>>>>> Currently, a live object in the open archive heap region cannot become dormant again. This restriction simplifies GC requirement and guarantees all outgoing pointers are updated by GC correctly. Only objects for shared classes from the builtin class loaders (boot, PlatformClassLoaders, and AppClassLoaders) are supported for caching.
>>>>>
>>>>> Caching Java Objects at Archive Dump Time
>>>>>
>>>>> The closed archive and open archive regions are allocated near the top of the dump time java heap. Archived java objects are copied into the designated archive heap regions. For example, String objects and the underlying 'value' arrays are copied into the closed archive regions. All references to the archived objects (from shared class metadata, string table, etc) are set to the new heap locations. A hash table is used to keep track of all archived java objects during the copying process to make sure java object is not archived more than once if reached from different roots. It also makes sure references to the same archived object are updated using the same new address location.
>>>>>
>>>>> Caching Constant Pool resolved_references Array
>>>>>
>>>>> The 'resolved_references' is an array that holds references of resolved constant pool entries including Strings, mirrors and methodTypes, etc. Each loaded class has one 'resolved_references' array (in ConstantPoolCache). The 'resolved_references' arrays are copied into the open archive regions during dump process. Prior to copying the 'resolved_references' arrays, JVM iterates through constant pool entries and resolves all JVM_CONSTANT_String entries to existing interned Strings for all archived classes. When resolving, JVM only looks up the string table and finds existing interned Strings without inserting new ones. If a string entry cannot be resolved to an existing interned String, the constant pool entry remain as unresolved. That prevents memory waste if a constant pool string entry is never used at runtime.
>>>>>
>>>>> All String objects referenced by the string table are copied first into the closed archive regions. The string table entry is updated with the new location when each String object is archived. The JVM updates the resolved constant pool string entries with the new object locations when copying the 'resolved_references' arrays to the open archive regions. References to the 'resolved_references' arrays in the ConstantPoolCache are also updated.
>>>>> At runtime as part of ConstantPool::restore_unshareable_info() work, call G1SATBCardTableModRefBS::enqueue() to let GC know the 'resolved_references' is becoming live. A handle is created for the cached object and added to the loader_data's handles.
>>>>>
>>>>> Runtime Java Heap With Cached Java Objects
>>>>>
>>>>>
>>>>> The closed archive regions (the string regions) and open archive regions are mapped to the runtime java heap at the same offsets as the dump time offsets from the runtime java heap base.
>>>>>
>>>>> Preliminary test execution and status:
>>>>>
>>>>> JPRT: passed
>>>>> Tier2-rt: passed
>>>>> Tier2-gc: passed
>>>>> Tier2-comp: passed
>>>>> Tier3-rt: passed
>>>>> Tier3-gc: passed
>>>>> Tier3-comp: passed
>>>>> Tier4-rt: passed
>>>>> Tier4-gc: passed
>>>>> Tier4-comp:6 jobs timed out, all other tests passed
>>>>> Tier5-rt: one test failed but passed when running locally, all other tests passed
>>>>> Tier5-gc: passed
>>>>> Tier5-comp: running
>>>>> hotspot_gc: two jobs timed out, all other tests passed
>>>>> hotspot_gc in CDS mode: two jobs timed out, all other tests passed
>>>>> vm.gc: passed
>>>>> vm.gc in CDS mode: passed
>>>>> Kichensink: passed
>>>>> Kichensink in CDS mode: passed
>>>>>
>>>>> Thanks,
>>>>> Jiangli
>>>>
>>>
>>
>

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: RFR: 8179302: Pre-resolve constant pool string entries and cache resolved_reference arrays in CDS archive

Jiangli Zhou
In reply to this post by coleen.phillimore
After discussing with Coleen in separate emails (thanks Coleen!), we think the NoSafepointVerifier guard should be added in VM_PopulateDumpSharedSpace::dump_java_heap_objects(). I’m rerunning tests.

Thanks,
Jiangli

> On Aug 4, 2017, at 10:47 AM, [hidden email] wrote:
>
>
> Hi, I think this change looks good except in metaspaceShared.cpp:
>
> 1632 NoSafepointVerifier nsv;
>
>
> Belongs above when you create the archive cache.
>
> 1600 MetaspaceShared::create_archive_object_cache();
>
>
> The call:
>
> 1646 int hash = obj->identity_hash();
>
>
> Can safepoint though.   Not sure why NSV didn't have an assert. But maybe it won't if you've eagerly installed the hashcode in these objects, which I think you have.   Otherwise you could use NoGCVerifier.  Or GCLocker, which the GC group won't like.
>
> Thanks,
> Coleen
>
> On 8/3/17 8:15 PM, Jiangli Zhou wrote:
>> Here are the updated webrevs.
>>
>> http://cr.openjdk.java.net/~jiangli/8179302/webrev.hotspot.02/
>> http://cr.openjdk.java.net/~jiangli/8179302/webrev.whitebox.02/
>>
>> Changes in the updated webrevs include:
>> Merge with Ioi’s recent shared space auto-sizing change (8072061)
>> Addressed all feedbacks from Ioi and Coleen (Thanks for detailed review!)
>>
>> Thanks,
>> Jiangli
>>
>>
>>> On Aug 1, 2017, at 5:29 PM, Jiangli Zhou <[hidden email]> wrote:
>>>
>>> Hi Ioi,
>>>
>>> Thank you so much for reviewing this. I’ve addressed all your feedbacks. Please see details below. I’ll updated the webrev after addressing Coleen’s comments.
>>>
>>>> On Jul 30, 2017, at 9:07 PM, Ioi Lam <[hidden email]> wrote:
>>>>
>>>> Hi Jiangli,
>>>>
>>>> Here are my comments. I've not reviewed the GC code and I'll leave that to the GC experts :-)
>>>>
>>>> stringTable.cpp: StringTable::archive_string
>>>>
>>>>    add assert for DumpSharedSpaces only
>>> Ok.
>>>
>>>> filemap.cpp
>>>>
>>>> 525 void FileMapInfo::write_archive_heap_regions(GrowableArray<MemRegion> *regions,
>>>> 526                                              int first_region, int num_regions) {
>>>>
>>>> When I first read this function, I found it hard to follow, especially this part that coalesces the trailing regions:
>>>>
>>>> 537         int len = regions->length();
>>>> 538         if (len > 1) {
>>>> 539           start = (char*)regions->at(1).start();
>>>> 540           size = (char*)regions->at(len - 1).end() - start;
>>>> 541         }
>>>> 542       }
>>>>
>>>> The rest of filemap.cpp always perform identical operations on MemRegion arrays, which are either 1 or 2 in size. However, this function doesn't follow that pattern; it also has a very different notion of "region", and the confusing part is regions->size() is not the same as num_regions.
>>>>
>>>> How about we change the API to something like the following? Before calling this API, the caller needs to coalesce the trailing G1 regions into a single MemRegion.
>>>>
>>>>     FileMapInfo::write_archive_heap_regions(MemRegion *regions, int first, int num_regions) {
>>>>        if (first == MetaspaceShared::first_string) {
>>>>           assert(num_regons <=  MetaspaceShared::max_strings, "...");
>>>>        } else {
>>>>           assert(first == MetaspaceShared::first_open_archive_heap_region, "...");
>>>>           assert(num_regons <= MetaspaceShared::max_open_archive_heap_region, "...");
>>>> }
>>>>        ....
>>>>
>>>>
>>> I’ve reworked the function and simplified the code.
>>>
>>>> 756   if (!string_data_mapped) {
>>>> 757     StringTable::ignore_shared_strings(true);
>>>> 758     assert(string_ranges == NULL && num_string_ranges == 0, "sanity");
>>>> 759   }
>>>> 760
>>>> 761   if (open_archive_heap_data_mapped) {
>>>> 762     MetaspaceShared::set_open_archive_heap_region_mapped();
>>>> 763   } else {
>>>> 764     assert(open_archive_heap_ranges == NULL && num_open_archive_heap_ranges == 0, "sanity");
>>>> 765   }
>>>>
>>>> Maybe the two "if" statements should be more consistent? Instead of StringTable::ignore_shared_strings, how about StringTable::set_shared_strings_region_mapped()?
>>> Fixed.
>>>
>>>> FileMapInfo::map_heap_data() --
>>>>
>>>> 818     char* addr = (char*)regions[i].start();
>>>> 819     char* base = os::map_memory(_fd, _full_path, si->_file_offset,
>>>> 820                                 addr, regions[i].byte_size(), si->_read_only,
>>>> 821                                 si->_allow_exec);
>>>>
>>>> What happens when the first region succeeds to map but the second region fails to map? Will both regions be unmapped? I don't see where you store the return value (base) from os::map_memory(). Does it mean the code assumes that (addr == base). If so, we need an assert here.
>>> If any of the region fails to map, we bail out and call dealloc_archive_heap_regions(), which handles the deallocation of any regions specified. If second region fails to map, all memory ranges specified by ‘regions’ array are deallocated. We don’t unmap the memory here since it is part of the java heap. Unmapping of heap memory are handled by GC code. The ‘if’ check below makes sure base == addr.
>>>
>>>     if (base == NULL || base != addr) {
>>>       // dealloc the regions from java heap
>>>       dealloc_archive_heap_regions(regions, region_num);
>>>       if (log_is_enabled(Info, cds)) {
>>>         log_info(cds)("UseSharedSpaces: Unable to map at required address in java heap.");
>>>       }
>>>       return false;
>>>     }
>>>
>>>
>>>> constantPool.cpp
>>>>
>>>>     Handle refs_handle;
>>>>     ...
>>>>     refs_handle = Handle(THREAD, (oop)archived);
>>>>
>>>> This will first create a NULL handle, then construct a temporary handle, and then assign the temp handle back to the null handle. This means two handles will be pushed onto THREAD->metadata_handles()
>>>>
>>>> I think it's more efficient if you merge these into a single statement
>>>>
>>>>     Handle refs_handle(THREAD, (oop)archived);
>>> Fixed.
>>>
>>>> Is this experimental code? Maybe it should be removed?
>>>>
>>>> 664     if (tag_at(index).is_unresolved_klass()) {
>>>> 665 #if 0
>>>> 666       CPSlot entry = cp->slot_at(index);
>>>> 667       Symbol* name = entry.get_symbol();
>>>> 668       Klass* k = SystemDictionary::find(name, NULL, NULL, THREAD);
>>>> 669       if (k != NULL) {
>>>> 670         klass_at_put(index, k);
>>>> 671       }
>>>> 672 #endif
>>>> 673     } else
>>> Removed.
>>>
>>>> cpCache.hpp:
>>>>
>>>>     u8   _archived_references
>>>>
>>>> shouldn't this be declared as an narrowOop to avoid the type casts when it's used?
>>> Ok.
>>>
>>>> cpCache.cpp:
>>>>
>>>>    add assert so that one of these is used only at dump time and the other only at run time?
>>>>
>>>> 610 oop ConstantPoolCache::archived_references() {
>>>> 611   return oopDesc::decode_heap_oop((narrowOop)_archived_references);
>>>> 612 }
>>>> 613
>>>> 614 void ConstantPoolCache::set_archived_references(oop o) {
>>>> 615   _archived_references = (u8)oopDesc::encode_heap_oop(o);
>>>> 616 }
>>> Ok.
>>>
>>> Thanks!
>>>
>>> Jiangli
>>>
>>>> Thanks!
>>>> - Ioi
>>>>
>>>> On 7/27/17 1:37 PM, Jiangli Zhou wrote:
>>>>> Sorry, the mail didn’t handle the rich text well. I fixed the format below.
>>>>>
>>>>> Please help review the changes for JDK-8179302 (Pre-resolve constant pool string entries and cache resolved_reference arrays in CDS archive). Currently, the CDS archive can contain cached class metadata, interned java.lang.String objects. This RFE adds the constant pool ‘resolved_references’ arrays (hotspot specific) to the archive for startup/runtime performance enhancement. The ‘resolved_references' arrays are used to hold references of resolved constant pool entries including Strings, mirrors, etc. With the 'resolved_references’ being cached, string constants in shared classes can now be resolved to existing interned java.lang.Strings at CDS dump time. G1 and 64-bit platforms are required.
>>>>>
>>>>> The GC changes in the RFE were discussed and guided by Thomas Schatzl and GC team. Part of the changes were contributed by Thomas himself.
>>>>> RFE:        https://bugs.openjdk.java.net/browse/JDK-8179302
>>>>> hotspot:   http://cr.openjdk.java.net/~jiangli/8179302/webrev.hotspot.01/
>>>>> whitebox: http://cr.openjdk.java.net/~jiangli/8179302/webrev.whitebox.01/
>>>>>
>>>>> Please see below for details of supporting cached ‘resolved_references’ and pre-resolving string constants.
>>>>>
>>>>> Types of Pinned G1 Heap Regions
>>>>>
>>>>> The pinned region type is a super type of all archive region types, which include the open archive type and the closed archive type.
>>>>>
>>>>> 00100 0 [ 8] Pinned Mask
>>>>> 01000 0 [16] Old Mask
>>>>> 10000 0 [32] Archive Mask
>>>>> 11100 0 [56] Open Archive:   ArchiveMask | PinnedMask | OldMask
>>>>> 11100 1 [57] Closed Archive: ArchiveMask | PinnedMask | OldMask + 1
>>>>>
>>>>>
>>>>> Pinned Regions
>>>>>
>>>>> Objects within the region are 'pinned', which means GC does not move any live objects. GC scans and marks objects in the pinned region as normal, but skips forwarding live objects. Pointers in live objects are updated. Dead objects (unreachable) can be collected and freed.
>>>>>
>>>>> Archive Regions
>>>>>
>>>>> The archive types are sub-types of 'pinned'. There are two types of archive region currently, open archive and closed archive. Both can support caching java heap objects via the CDS archive.
>>>>>
>>>>> An archive region is also an old region by design.
>>>>>
>>>>> Open Archive (GC-RW) Regions
>>>>>
>>>>> Open archive region is GC writable. GC scans & marks objects within the region and adjusts (updates) pointers in live objects the same way as a pinned region. Live objects (reachable) are pinned and not forwarded by GC.
>>>>> Open archive region does not have 'dead' objects. Unreachable objects are 'dormant' objects. Dormant objects are not collected and freed by GC.
>>>>>
>>>>> Adjustable Outgoing Pointers
>>>>>
>>>>> As GC can adjust pointers within the live objects in open archive heap region, objects can have outgoing pointers to another java heap region, including closed archive region, open archive region, pinned (or humongous) region, and normal generational region. When a referenced object is moved by GC, the pointer within the open archive region is updated accordingly.
>>>>>
>>>>> Closed Archive (GC-RO) Regions
>>>>>
>>>>> The closed archive region is GC read-only region. GC cannot write into the region. Objects are not scanned and marked by GC. Objects are pinned and not forwarded. Pointers are not updated by GC either. Hence, objects within the archive region cannot have any outgoing pointers to another java heap region. Objects however can still have pointers to other objects within the closed archive regions (we might allow pointers to open archive regions in the future). That restricts the type of java objects that can be supported by the archive region.
>>>>> In JDK 9 we support archive Strings with the archive regions.
>>>>>
>>>>> The GC-readonly archive region makes java heap memory sharable among different JVM processes. NOTE: synchronization on the objects within the archive heap region can still cause writes to the memory page.
>>>>>
>>>>> Dormant Objects
>>>>>
>>>>> Dormant objects are unreachable java objects within the open archive heap region.
>>>>> A java object in the open archive heap region is a live object if it can be reached during scanning. Some of the java objects in the region may not be reachable during scanning. Those objects are considered as dormant, but not dead. For example, a constant pool 'resolved_references' array is reachable via the klass root if its container klass (shared) is already loaded at the time during GC scanning. If a shared klass is not yet loaded, the klass root is not scanned and it's constant pool 'resolved_reference' array (A) in the open archive region is not reachable. Then A is a dormant object.
>>>>>
>>>>> Object State Transition
>>>>>
>>>>> All java objects are initially dormant objects when open archive heap regions are mapped to the runtime java heap. A dormant object becomes live object when the associated shared class is loaded at runtime. Explicit call to G1SATBCardTableModRefBS::enqueue() needs to be made when a dormant object becomes live. That should be the case for cached objects with strong roots as well, since strong roots are only scanned at the start of GC marking (the initial marking) but not during Remarking/Final marking. If a cached object becomes live during concurrent marking phase, G1 may not find it and mark it live unless a call to G1SATBCardTableModRefBS::enqueue() is made for the object.
>>>>>
>>>>> Currently, a live object in the open archive heap region cannot become dormant again. This restriction simplifies GC requirement and guarantees all outgoing pointers are updated by GC correctly. Only objects for shared classes from the builtin class loaders (boot, PlatformClassLoaders, and AppClassLoaders) are supported for caching.
>>>>>
>>>>> Caching Java Objects at Archive Dump Time
>>>>>
>>>>> The closed archive and open archive regions are allocated near the top of the dump time java heap. Archived java objects are copied into the designated archive heap regions. For example, String objects and the underlying 'value' arrays are copied into the closed archive regions. All references to the archived objects (from shared class metadata, string table, etc) are set to the new heap locations. A hash table is used to keep track of all archived java objects during the copying process to make sure java object is not archived more than once if reached from different roots. It also makes sure references to the same archived object are updated using the same new address location.
>>>>>
>>>>> Caching Constant Pool resolved_references Array
>>>>>
>>>>> The 'resolved_references' is an array that holds references of resolved constant pool entries including Strings, mirrors and methodTypes, etc. Each loaded class has one 'resolved_references' array (in ConstantPoolCache). The 'resolved_references' arrays are copied into the open archive regions during dump process. Prior to copying the 'resolved_references' arrays, JVM iterates through constant pool entries and resolves all JVM_CONSTANT_String entries to existing interned Strings for all archived classes. When resolving, JVM only looks up the string table and finds existing interned Strings without inserting new ones. If a string entry cannot be resolved to an existing interned String, the constant pool entry remain as unresolved. That prevents memory waste if a constant pool string entry is never used at runtime.
>>>>>
>>>>> All String objects referenced by the string table are copied first into the closed archive regions. The string table entry is updated with the new location when each String object is archived. The JVM updates the resolved constant pool string entries with the new object locations when copying the 'resolved_references' arrays to the open archive regions. References to the 'resolved_references' arrays in the ConstantPoolCache are also updated.
>>>>> At runtime as part of ConstantPool::restore_unshareable_info() work, call G1SATBCardTableModRefBS::enqueue() to let GC know the 'resolved_references' is becoming live. A handle is created for the cached object and added to the loader_data's handles.
>>>>>
>>>>> Runtime Java Heap With Cached Java Objects
>>>>>
>>>>>
>>>>> The closed archive regions (the string regions) and open archive regions are mapped to the runtime java heap at the same offsets as the dump time offsets from the runtime java heap base.
>>>>>
>>>>> Preliminary test execution and status:
>>>>>
>>>>> JPRT: passed
>>>>> Tier2-rt: passed
>>>>> Tier2-gc: passed
>>>>> Tier2-comp: passed
>>>>> Tier3-rt: passed
>>>>> Tier3-gc: passed
>>>>> Tier3-comp: passed
>>>>> Tier4-rt: passed
>>>>> Tier4-gc: passed
>>>>> Tier4-comp:6 jobs timed out, all other tests passed
>>>>> Tier5-rt: one test failed but passed when running locally, all other tests passed
>>>>> Tier5-gc: passed
>>>>> Tier5-comp: running
>>>>> hotspot_gc: two jobs timed out, all other tests passed
>>>>> hotspot_gc in CDS mode: two jobs timed out, all other tests passed
>>>>> vm.gc: passed
>>>>> vm.gc in CDS mode: passed
>>>>> Kichensink: passed
>>>>> Kichensink in CDS mode: passed
>>>>>
>>>>> Thanks,
>>>>> Jiangli
>

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: RFR: 8179302: Pre-resolve constant pool string entries and cache resolved_reference arrays in CDS archive

Thomas Schatzl
In reply to this post by Jiangli Zhou
Hi,

On Thu, 2017-08-03 at 17:15 -0700, Jiangli Zhou wrote:
> Here are the updated webrevs.
>
> http://cr.openjdk.java.net/~jiangli/8179302/webrev.hotspot.02/
> http://cr.openjdk.java.net/~jiangli/8179302/webrev.whitebox.02/
>
> Changes in the updated webrevs include:
> Merge with Ioi’s recent shared space auto-sizing change (8072061)
> Addressed all feedbacks from Ioi and Coleen (Thanks for detailed
> review!)

- the comment in g1Allocator.hpp:326 needs to be updated. I would merge
the information from the _open member here. I.e. define what "open" and
"closed" archive are.

- g1Allocator.inline.hpp:66-72: formatting - parameters should be
aligned below each other

- g1CollectedHeap.cpp:665/6: formatting - parameters should be aligned
below each other

- g1CollectedHeap.cpp:750 + 756: maybe make a static method to avoid
repetition here.

- G1NoteEndOfConcMarkClosure::doHeapRegion(): would it be too much work
to make an extra CR out of this change? It is a change that fixes an
existing bug unrelated to this change after all (not doing remembered
set cleanup work for archive regions).

- g1HeapVerifier.cpp:63: I think this change is obsolete since
is_obj_dead_cond() will always return false. (I.e. some leftover of
some internal discussion)

- g1HeapVerifier.cpp: there is a verbose flag passed around. Not sure
if it should be kept, as it seems to be some code that has been used
for debugging this feature, but can't be activated anyway without code
changes.

- heapRegion.inline.hpp:120: please fix the comment of the assert.
Something like "Closed archive regions should not have references into
other regions"

- heapRegion.inline.hpp:125: I think the existing code of faking open
archive regions as all-live does not work as implemented. Consider the
case when a new object in there is made live, and references in there
set to refer to some object outside this region, and is the only
reference (and it has not been marked live yet): if there is a
remembered set entry to that, and it is about to be scanned.

The current implementation of HeapRegion::is_obj_dead() will consider
it dead, so we will enter the code path at line 125. Block_size_using
bitmap() will jump over that object, but the return values of
is_obj_dead_with_size() method will indicate the caller to not iterate
over this object anyway, potentially missing that reference.

HeapRegion::is_obj_dead() needs to return that the object is not dead
for open archive regions. I think for now the safest way is to add
!is_open_archive() to the condition calculated there. That will obviate
the need for that existing hack to the assert too.

It may have some perf impact though - actually recently there has been
some effort to remove that is_archive() check from that code (the one
that is now the is_closed_archive() assert). I do not see an easy way
to fix this. :( (i.e. there is likely no perf impact vs. jdk9 so it's
not that bad)

This suggestion also only works with the assumption laid out in the CR
that there is no way that a live object can not become dormant again,
and the objects in the open archive regions are always parsable (never
contain junk data).

- HeapRegionType::relabel_as_old(): the code in line 175/176 is
unreachable and should be removed.

I did not look through runtime code too closely.

Thanks,
  Thomas

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: RFR: 8179302: Pre-resolve constant pool string entries and cache resolved_reference arrays in CDS archive

Jiangli Zhou
Hi Thomas,

Thanks a lot for the review!

> On Aug 7, 2017, at 7:52 AM, Thomas Schatzl <[hidden email]> wrote:
>
> Hi,
>
> On Thu, 2017-08-03 at 17:15 -0700, Jiangli Zhou wrote:
>> Here are the updated webrevs.
>>
>> http://cr.openjdk.java.net/~jiangli/8179302/webrev.hotspot.02/
>> http://cr.openjdk.java.net/~jiangli/8179302/webrev.whitebox.02/
>>
>> Changes in the updated webrevs include:
>> Merge with Ioi’s recent shared space auto-sizing change (8072061)
>> Addressed all feedbacks from Ioi and Coleen (Thanks for detailed
>> review!)
>
> - the comment in g1Allocator.hpp:326 needs to be updated. I would merge
> the information from the _open member here. I.e. define what "open" and
> "closed" archive are.

Good catch. I updated the comments as the following:

// G1ArchiveAllocator is used to allocate memory in archive
// regions. Such regions are not scavenged nor compacted by GC.
// There are two types of archive regions, which are
// differ in the kind of references allowed for the contained objects:
//
// - 'Closed' archive region contain no references outside of archive
//   regions. The region is immutable by GC. GC does not mark object
//   header in 'closed' archive region.
// - An 'open' archive region may contain references pointing to
//   non-archive heap region. GC can adjust pointers and mark object
//   header in 'open' archive region.

>
> - g1Allocator.inline.hpp:66-72: formatting - parameters should be
> aligned below each other

Fixed.

>
> - g1CollectedHeap.cpp:665/6: formatting - parameters should be aligned
> below each other

Fixed.

>
> - g1CollectedHeap.cpp:750 + 756: maybe make a static method to avoid
> repetition here.

I changed the code to be following. New static function is a little overkill since the usage is very limited. :-)

      HeapWord* top;
      HeapRegion* next_region;
      if (curr_region != last_region) {
        top = curr_region->end();
        next_region = _hrm.next_region_in_heap(curr_region);
      } else {
        top = last_address + 1;
        next_region = NULL;
      }
      curr_region->set_top(top);
      curr_region->set_first_dead(top);
      curr_region->set_end_of_live(top);
      curr_region = next_region;
    }
>
> - G1NoteEndOfConcMarkClosure::doHeapRegion(): would it be too much work
> to make an extra CR out of this change? It is a change that fixes an
> existing bug unrelated to this change after all (not doing remembered
> set cleanup work for archive regions).

Using separate CR to track this sounds good. I just created JDK-8185924 <https://bugs.openjdk.java.net/browse/JDK-8185924>. Since we have been testing the fix with other changes together, I'll integrate them together with both CRs.

>
> - g1HeapVerifier.cpp:63: I think this change is obsolete since
> is_obj_dead_cond() will always return false. (I.e. some leftover of
> some internal discussion)

You are right. I reverted the change.

>
> - g1HeapVerifier.cpp: there is a verbose flag passed around. Not sure
> if it should be kept, as it seems to be some code that has been used
> for debugging this feature, but can't be activated anyway without code
> changes.

Removed.

>
> - heapRegion.inline.hpp:120: please fix the comment of the assert.
> Something like "Closed archive regions should not have references into
> other regions”

Done.

>
> - heapRegion.inline.hpp:125: I think the existing code of faking open
> archive regions as all-live does not work as implemented. Consider the
> case when a new object in there is made live, and references in there
> set to refer to some object outside this region, and is the only
> reference (and it has not been marked live yet): if there is a
> remembered set entry to that, and it is about to be scanned.
>
> The current implementation of HeapRegion::is_obj_dead() will consider
> it dead, so we will enter the code path at line 125. Block_size_using
> bitmap() will jump over that object, but the return values of
> is_obj_dead_with_size() method will indicate the caller to not iterate
> over this object anyway, potentially missing that reference.
>
> HeapRegion::is_obj_dead() needs to return that the object is not dead
> for open archive regions. I think for now the safest way is to add
> !is_open_archive() to the condition calculated there. That will obviate
> the need for that existing hack to the assert too.
>
> It may have some perf impact though - actually recently there has been
> some effort to remove that is_archive() check from that code (the one
> that is now the is_closed_archive() assert). I do not see an easy way
> to fix this. :( (i.e. there is likely no perf impact vs. jdk9 so it's
> not that bad)
>
> This suggestion also only works with the assumption laid out in the CR
> that there is no way that a live object can not become dormant again,
> and the objects in the open archive regions are always parsable (never
> contain junk data).

Thank you for the analysis. I changed HeapRegion::is_obj_dead() with added !is_open_archive() condition as you suggested. I’m glad to get rid of the is_open_archive() change from is_obj_dead_with_size() assert.

Thinking more about the case you described above, when an object (A) in the open archive just becomes live, there would be no reference to any other non-archive region at the moment. The object A only contains references to the archive (open or closed) regions initially. Scanning has no issue at the moment. When a new object (B) is allocated in the java heap and the reference is set in A. B is considered live, scanning would update the A->B reference accordingly. Is that correct?

>
> - HeapRegionType::relabel_as_old(): the code in line 175/176 is
> unreachable and should be removed.

Agreed. Removed.

I’ll send out an updated webrev later. Thank you so much for all the help and contributions!!

Jiangli

>
> I did not look through runtime code too closely.
>
> Thanks,
>   Thomas
>

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: RFR: 8179302: Pre-resolve constant pool string entries and cache resolved_reference arrays in CDS archive

Ioi Lam
In reply to this post by Jiangli Zhou
On 8/4/17 10:19 PM, Jiangli Zhou wrote:

> Hi Ioi,
>
> Thanks for looking again.
>
>> On Aug 4, 2017, at 2:22 PM, Ioi Lam <[hidden email]
>> <mailto:[hidden email]>> wrote:
>>
>> Hi Jiangli,
>>
>> The code looks good in general. I just have a few pet peeves for
>> readability:
>>
>>
>> (1) stringTable.cpp and metaspaceShared.cpp have the same asserts
>>
>>  704   assert(UseG1GC, "Only support G1 GC");
>>  705   assert(UseCompressedOops && UseCompressedClassPointers,
>>  706          "Only support UseCompressedOops and
>> UseCompressedClassPointers enabled");
>>
>> 1615   assert(UseG1GC, "Only support G1 GC");
>> 1616   assert(UseCompressedOops && UseCompressedClassPointers,
>> 1617          "Only support UseCompressedOops and
>> UseCompressedClassPointers enabled");
>>
>> Maybe it's better to combine them into a single function like
>> MetaspaceShared::assert_vm_flags() so they don't get out of sync?
>
> There is a MetaspaceShared::allow_archive_heap_object(), which checks
> for UseG1GC, UseCompressedOops and UseCompressedClassPointers
> combined. It does not seem to worth add another separate API for
> asserting the required flags. I’ll use that in the assert.
>
>>
>>
>>
>> (2) FileMapInfo::write_archive_heap_regions()
>>
>> I still find this code very hard to read, especially due to the loop.
>>
>> First, the comments are not consistent with the code:
>>
>>     498   assert(arr_len <= max_num_regions, "number of memory
>> regions exceeds maximum");
>>
>> but the comments says: "The rest are consecutive full GC regions"
>> which means there's a chance for max_num_regions to be more than 2
>> (which will be the case with Calvin's java-loader dumping changes
>> using very small heap size).So the code is actually wrong.
>
> The max_num_regions is the maximum number of region for each archived
> heap space (the string space, or open archive space). We only run into
> the case where the MemRegion array size is larger than max_num_regions
> with Calvin’s pending change. As part of Calvin’s change, he will
> change the assert into a check and bail out if the number of
> MemRegions are larger than max_num_regions due to heap fragmentation.
>
>
Your latest patch assumes that arr_len <= 2, but the implementation of
G1CollectedHeap::heap()->begin_archive_alloc_range() /
G1CollectedHeap::heap()->end_archive_alloc_range() actually allows more
than 2 regions to returned. So simply putting an assert there seems
risky (unless you have analyzed all possible scenarios to prove that's
impossible).

Instead of trying to come up with a complicated proof, I think it's much
safer to disable the archived string region if the arr_len > 2. Also, if
the string region is disabled, you should also disable the
open_archive_heap_region

I think this is a general issue with the mapped heap regions, and it
just happens to be revealed by Calvin's patch. So we should fix it now
and not wait for Calvin's patch.


>>
>> The word "region" is used in these parameters, but they don't mean
>> the same thing.
>>
>> GrowableArray<MemRegion> *regions
>>               int first_region, int max_num_regions,
>>
>>
>> How about regions      -> g1_regions_list
>>           first_region -> first_region_in_archive
>
> The GrowableArray above is the MemRegions that GC code gives back to
> us. The GC code combines multiple G1 regions. The comments probably
> are over-explaining the details, which are hidden in the GC code.
> Probably that’s the confusing source. I’ll make the comment more clear.
>
> Using g1_regions_list would also be confusing, since
> write_archive_heap_regions does not handle G1 regions directly. It
> processes the MemRegion array that GC code returns. How about changing
> ‘regions’ to ‘mem_regions’ or ‘archive_regions'?
>
How about heap_regions? These are regions in the active Java heap, which
current has not mapped anything from the CDS archive.


>>
>>
>> In the comments, I find the phrase 'the current archive heap region'
>> ambiguous. It could be (erroneously) interpreted as "a region from
>> the currently mapped archive”
>>
>> To make it unambiguous, how about changing
>>
>>
>>  464 // Write the current archive heap region, which contains one or
>> multiple GC(G1) regions.
>>
>>
>> to
>>
>>     // Write the given list of G1 memory regions into the archive,
>> starting at
>>     // first_region_in_archive.
>
>
> Ok. How about the following:
>
> // Write the given list of java heap memory regions into the archive,
> starting at
> // first_region_in_archive.
>
Sounds good.

Thanks
- Ioi

>>
>>
>> Also, for the explanation of how the G1 regions are written into the
>> archive, how about:
>>
>>    // The G1 regions in the list are sorted in ascending address
>> order. When there are more objects
>>    // than the capacity of a single G1 region, the bottom-most G1
>> region may be partially filled, and the
>>    // remaining G1 region(s) are consecutively allocated and fully
>> filled.
>>    //
>>    // Thus, the bottom-most G1 region (if not empty) is written into
>> first_region_in_archive.
>>    // The remaining G1 regions (if exist) are coalesced and written
>> as a single block
>>    // into (first_region_in_archive + 1)
>>
>>    // Here's the mapping from (g1 regions) -> (archive regions).
>>
>>
>> All this function needs to do is to decide the values for
>>
>>      r0_start, r0_top
>>      r1_start, r1_top
>>
>> I think it would be much better to not use the loop, and not use the
>> max_num_regions parameter (it's always 2 anyway).
>>
>>      *r0_start = *r0_top = NULL;
>>      *r1_start = *r1_top = NULL;
>>
>>      if (arr_len >= 1) {
>>          *r0_start = regions->at(0).start();
>>          *r0_end = *r0_start + regions->at(0).byte_size();
>>      }
>>      if (arr_len >= 2) {
>>          int last = arr_len - 1;
>>          *r1_start = regions->at(1).start();
>>          *r1_end = regions->at(last).start() +
>> regions->at(last).byte_size();
>>       }
>>
>> what do you think?
>
> We need to write out all archive regions including the empty ones. The
> loop using max_num_regions is the easiest way. I’d like to remove the
> code that deals with r0_* and r1_ explicitly. Let me try that.
>
>>
>>
>>
>> (3) metaspace.cpp
>>
>> 3350         // Map the archived heap regions after compressed pointers
>> 3351         // because it relies on compressed class pointers
>> setting to work
>>
>> do you mean this?
>>
>>     // Archived heap regions depend on the parameters of compressed
>> class pointers, so
>>     // they must be mapped after such parameters have been decided in
>> the above call.
>
> Hmmm, maybe use ‘arguments’ instead of ‘parameters’?
>
>>
>>
>> (4) I found this name not strictly grammatical. How about this:
>>
>>      allow_archive_heap_object -> is_heap_object_archiving_allowed
>
> Ok.
>
>>
>> (5) in most of your code, 'archive' is used as a noun, except in
>> StringTable::archive_string() where it's used as a verb.
>>
>> archive_string could also be interpreted erroneously as "return a
>> string that's already in the archive". So to be consistent and
>> unambiguous, I think it's better to rename it to
>> StringTable::create_archived_string()
>
> Ok.
>
> Thanks,
> Jiangli
>
>>
>>
>> Thanks
>> - Ioi
>>
>>
>> On 8/3/17 5:15 PM, Jiangli Zhou wrote:
>>> Here are the updated webrevs.
>>>
>>> http://cr.openjdk.java.net/~jiangli/8179302/webrev.hotspot.02/ 
>>> <http://cr.openjdk.java.net/%7Ejiangli/8179302/webrev.hotspot.02/>
>>> http://cr.openjdk.java.net/~jiangli/8179302/webrev.whitebox.02/
>>>
>>> Changes in the updated webrevs include:
>>>
>>>   * Merge with Ioi’s recent shared space auto-sizing change (8072061)
>>>   * Addressed all feedbacks from Ioi and Coleen (Thanks for detailed
>>>     review!)
>>>
>>>
>>> Thanks,
>>> Jiangli
>>>
>>>
>>>> On Aug 1, 2017, at 5:29 PM, Jiangli Zhou <[hidden email]>
>>>> wrote:
>>>>
>>>> Hi Ioi,
>>>>
>>>> Thank you so much for reviewing this. I’ve addressed all your
>>>> feedbacks. Please see details below. I’ll updated the webrev
>>>> after addressing Coleen’s comments.
>>>>
>>>>> On Jul 30, 2017, at 9:07 PM, Ioi Lam <[hidden email]> wrote:
>>>>>
>>>>> Hi Jiangli,
>>>>>
>>>>> Here are my comments. I've not reviewed the GC code and I'll leave
>>>>> that to the GC experts :-)
>>>>>
>>>>> stringTable.cpp: StringTable::archive_string
>>>>>
>>>>>    add assert for DumpSharedSpaces only
>>>>
>>>> Ok.
>>>>
>>>>>
>>>>> filemap.cpp
>>>>>
>>>>> 525 void
>>>>> FileMapInfo::write_archive_heap_regions(GrowableArray<MemRegion>
>>>>> *regions,
>>>>> 526  int first_region, int num_regions) {
>>>>>
>>>>> When I first read this function, I found it hard to follow,
>>>>> especially this part that coalesces the trailing regions:
>>>>>
>>>>> 537         int len = regions->length();
>>>>> 538         if (len > 1) {
>>>>> 539           start = (char*)regions->at(1).start();
>>>>> 540           size = (char*)regions->at(len - 1).end() - start;
>>>>> 541         }
>>>>> 542       }
>>>>>
>>>>> The rest of filemap.cpp always perform identical operations on
>>>>> MemRegion arrays, which are either 1 or 2 in size. However,
>>>>> this function doesn't follow that pattern; it also has a very
>>>>> different notion of "region", and the confusing part is
>>>>> regions->size() is not the same as num_regions.
>>>>>
>>>>> How about we change the API to something like the following?
>>>>> Before calling this API, the caller needs to coalesce the trailing
>>>>> G1 regions into a single MemRegion.
>>>>>
>>>>> FileMapInfo::write_archive_heap_regions(MemRegion *regions, int
>>>>> first, int num_regions) {
>>>>>        if (first == MetaspaceShared::first_string) {
>>>>>           assert(num_regons <=  MetaspaceShared::max_strings, "...");
>>>>>        } else {
>>>>>           assert(first ==
>>>>> MetaspaceShared::first_open_archive_heap_region, "...");
>>>>>           assert(num_regons <=
>>>>> MetaspaceShared::max_open_archive_heap_region, "...");
>>>>> }
>>>>>        ....
>>>>>
>>>>>
>>>>
>>>> I’ve reworked the function and simplified the code.
>>>>
>>>>>
>>>>> 756   if (!string_data_mapped) {
>>>>> 757 StringTable::ignore_shared_strings(true);
>>>>> 758     assert(string_ranges == NULL && num_string_ranges == 0,
>>>>> "sanity");
>>>>> 759   }
>>>>> 760
>>>>> 761   if (open_archive_heap_data_mapped) {
>>>>> 762 MetaspaceShared::set_open_archive_heap_region_mapped();
>>>>> 763   } else {
>>>>> 764     assert(open_archive_heap_ranges == NULL &&
>>>>> num_open_archive_heap_ranges == 0, "sanity");
>>>>> 765   }
>>>>>
>>>>> Maybe the two "if" statements should be more consistent? Instead
>>>>> of StringTable::ignore_shared_strings, how
>>>>> about StringTable::set_shared_strings_region_mapped()?
>>>>
>>>> Fixed.
>>>>
>>>>>
>>>>> FileMapInfo::map_heap_data() --
>>>>>
>>>>> 818     char* addr = (char*)regions[i].start();
>>>>> 819     char* base = os::map_memory(_fd, _full_path, si->_file_offset,
>>>>> 820                                 addr, regions[i].byte_size(),
>>>>> si->_read_only,
>>>>> 821 si->_allow_exec);
>>>>>
>>>>> What happens when the first region succeeds to map but the second
>>>>> region fails to map? Will both regions be unmapped? I don't see
>>>>> where you store the return value (base) from os::map_memory().
>>>>> Does it mean the code assumes that (addr == base). If so, we need
>>>>> an assert here.
>>>>
>>>> If any of the region fails to map, we bail out and call
>>>> dealloc_archive_heap_regions(), which handles the deallocation of
>>>> any regions specified. If second region fails to map, all memory
>>>> ranges specified by ‘regions’ array are deallocated. We don’t unmap
>>>> the memory here since it is part of the java heap. Unmapping of
>>>> heap memory are handled by GC code. The ‘if’ check below makes sure
>>>> base == addr.
>>>>
>>>>     if (base == NULL || base != addr) {
>>>>       // dealloc the regions from java heap
>>>>       dealloc_archive_heap_regions(regions, region_num);
>>>>       if (log_is_enabled(Info, cds)) {
>>>>         log_info(cds)("UseSharedSpaces: Unable to map at required
>>>> address in java heap.");
>>>>       }
>>>>       return false;
>>>>     }
>>>>
>>>>
>>>>>
>>>>> constantPool.cpp
>>>>>
>>>>>     Handle refs_handle;
>>>>>     ...
>>>>>     refs_handle = Handle(THREAD, (oop)archived);
>>>>>
>>>>> This will first create a NULL handle, then construct a temporary
>>>>> handle, and then assign the temp handle back to the null
>>>>> handle. This means two handles will be pushed onto
>>>>> THREAD->metadata_handles()
>>>>>
>>>>> I think it's more efficient if you merge these into a single statement
>>>>>
>>>>>     Handle refs_handle(THREAD, (oop)archived);
>>>>
>>>> Fixed.
>>>>
>>>>>
>>>>> Is this experimental code? Maybe it should be removed?
>>>>>
>>>>> 664     if (tag_at(index).is_unresolved_klass()) {
>>>>> 665 #if 0
>>>>> 666       CPSlot entry = cp->slot_at(index);
>>>>> 667       Symbol* name = entry.get_symbol();
>>>>> 668       Klass* k = SystemDictionary::find(name, NULL, NULL, THREAD);
>>>>> 669       if (k != NULL) {
>>>>> 670         klass_at_put(index, k);
>>>>> 671       }
>>>>> 672 #endif
>>>>> 673     } else
>>>>
>>>> Removed.
>>>>
>>>>>
>>>>> cpCache.hpp:
>>>>>
>>>>>     u8   _archived_references
>>>>>
>>>>> shouldn't this be declared as an narrowOop to avoid the type casts
>>>>> when it's used?
>>>>
>>>> Ok.
>>>>
>>>>>
>>>>> cpCache.cpp:
>>>>>
>>>>>    add assert so that one of these is used only at dump time and
>>>>> the other only at run time?
>>>>>
>>>>> 610 oop ConstantPoolCache::archived_references() {
>>>>> 611   return
>>>>> oopDesc::decode_heap_oop((narrowOop)_archived_references);
>>>>> 612 }
>>>>> 613
>>>>> 614 void ConstantPoolCache::set_archived_references(oop o) {
>>>>> 615   _archived_references = (u8)oopDesc::encode_heap_oop(o);
>>>>> 616 }
>>>>
>>>> Ok.
>>>>
>>>> Thanks!
>>>>
>>>> Jiangli
>>>>
>>>>>
>>>>> Thanks!
>>>>> - Ioi
>>>>>
>>>>> On 7/27/17 1:37 PM, Jiangli Zhou wrote:
>>>>>> Sorry, the mail didn’t handle the rich text well. I fixed the
>>>>>> format below.
>>>>>>
>>>>>> Please help review the changes for JDK-8179302 (Pre-resolve
>>>>>> constant pool string entries and cache resolved_reference arrays
>>>>>> in CDS archive). Currently, the CDS archive can contain cached
>>>>>> class metadata, interned java.lang.String objects. This RFE adds
>>>>>> the constant pool ‘resolved_references’ arrays (hotspot specific)
>>>>>> to the archive for startup/runtime performance enhancement.
>>>>>> The ‘resolved_references' arrays are used to hold references of
>>>>>> resolved constant pool entries including Strings, mirrors, etc.
>>>>>> With the 'resolved_references’ being cached, string constants in
>>>>>> shared classes can now be resolved to existing interned
>>>>>> java.lang.Strings at CDS dump time. G1 and 64-bit platforms are
>>>>>> required.
>>>>>>
>>>>>> The GC changes in the RFE were discussed and guided by Thomas
>>>>>> Schatzl and GC team. Part of the changes were contributed by
>>>>>> Thomas himself.
>>>>>> RFE: https://bugs.openjdk.java.net/browse/JDK-8179302
>>>>>> hotspot:
>>>>>> http://cr.openjdk.java.net/~jiangli/8179302/webrev.hotspot.01/
>>>>>> whitebox:
>>>>>> http://cr.openjdk.java.net/~jiangli/8179302/webrev.whitebox.01/
>>>>>>
>>>>>> Please see below for details of supporting cached
>>>>>> ‘resolved_references’ and pre-resolving string constants.
>>>>>>
>>>>>> Types of Pinned G1 Heap Regions
>>>>>>
>>>>>> The pinned region type is a super type of all archive region
>>>>>> types, which include the open archive type and the closed archive
>>>>>> type.
>>>>>>
>>>>>> 00100 0 [ 8] Pinned Mask
>>>>>> 01000 0 [16] Old Mask
>>>>>> 10000 0 [32] Archive Mask
>>>>>> 11100 0 [56] Open Archive:   ArchiveMask | PinnedMask | OldMask
>>>>>> 11100 1 [57] Closed Archive: ArchiveMask | PinnedMask | OldMask + 1
>>>>>>
>>>>>>
>>>>>> Pinned Regions
>>>>>>
>>>>>> Objects within the region are 'pinned', which means GC does not
>>>>>> move any live objects. GC scans and marks objects in the
>>>>>> pinned region as normal, but skips forwarding live objects.
>>>>>> Pointers in live objects are updated. Dead objects (unreachable)
>>>>>> can be collected and freed.
>>>>>>
>>>>>> Archive Regions
>>>>>>
>>>>>> The archive types are sub-types of 'pinned'. There are two types
>>>>>> of archive region currently, open archive and closed archive.
>>>>>> Both can support caching java heap objects via the CDS archive.
>>>>>>
>>>>>> An archive region is also an old region by design.
>>>>>>
>>>>>> Open Archive (GC-RW) Regions
>>>>>>
>>>>>> Open archive region is GC writable. GC scans & marks objects
>>>>>> within the region and adjusts (updates) pointers in live objects
>>>>>> the same way as a pinned region. Live objects (reachable) are
>>>>>> pinned and not forwarded by GC.
>>>>>> Open archive region does not have 'dead' objects. Unreachable
>>>>>> objects are 'dormant' objects. Dormant objects are not
>>>>>> collected and freed by GC.
>>>>>>
>>>>>> Adjustable Outgoing Pointers
>>>>>>
>>>>>> As GC can adjust pointers within the live objects in open archive
>>>>>> heap region, objects can have outgoing pointers to another
>>>>>> java heap region, including closed archive region, open archive
>>>>>> region, pinned (or humongous) region, and normal generational
>>>>>> region. When a referenced object is moved by GC, the pointer
>>>>>> within the open archive region is updated accordingly.
>>>>>>
>>>>>> Closed Archive (GC-RO) Regions
>>>>>>
>>>>>> The closed archive region is GC read-only region. GC cannot write
>>>>>> into the region. Objects are not scanned and marked by
>>>>>> GC. Objects are pinned and not forwarded. Pointers are not
>>>>>> updated by GC either. Hence, objects within the archive region
>>>>>> cannot have any outgoing pointers to another java heap region.
>>>>>> Objects however can still have pointers to other objects within
>>>>>> the closed archive regions (we might allow pointers to open
>>>>>> archive regions in the future). That restricts the type of java
>>>>>> objects that can be supported by the archive region.
>>>>>> In JDK 9 we support archive Strings with the archive regions.
>>>>>>
>>>>>> The GC-readonly archive region makes java heap memory sharable
>>>>>> among different JVM processes. NOTE: synchronization on the
>>>>>> objects within the archive heap region can still cause writes to
>>>>>> the memory page.
>>>>>>
>>>>>> Dormant Objects
>>>>>>
>>>>>> Dormant objects are unreachable java objects within the open
>>>>>> archive heap region.
>>>>>> A java object in the open archive heap region is a live object if
>>>>>> it can be reached during scanning. Some of the java objects in
>>>>>> the region may not be reachable during scanning. Those objects
>>>>>> are considered as dormant, but not dead. For example, a
>>>>>> constant pool 'resolved_references' array is reachable via the
>>>>>> klass root if its container klass (shared) is already loaded at
>>>>>> the time during GC scanning. If a shared klass is not yet loaded,
>>>>>> the klass root is not scanned and it's constant pool
>>>>>> 'resolved_reference' array (A) in the open archive region is not
>>>>>> reachable. Then A is a dormant object.
>>>>>>
>>>>>> Object State Transition
>>>>>>
>>>>>> All java objects are initially dormant objects when open archive
>>>>>> heap regions are mapped to the runtime java heap. A
>>>>>> dormant object becomes live object when the associated shared
>>>>>> class is loaded at runtime. Explicit call
>>>>>> to G1SATBCardTableModRefBS::enqueue() needs to be made when a
>>>>>> dormant object becomes live. That should be the case for cached
>>>>>> objects with strong roots as well, since strong roots are only
>>>>>> scanned at the start of GC marking (the initial marking) but
>>>>>> not during Remarking/Final marking. If a cached object becomes
>>>>>> live during concurrent marking phase, G1 may not find it and mark
>>>>>> it live unless a call to G1SATBCardTableModRefBS::enqueue() is
>>>>>> made for the object.
>>>>>>
>>>>>> Currently, a live object in the open archive heap region cannot
>>>>>> become dormant again. This restriction simplifies GC
>>>>>> requirement and guarantees all outgoing pointers are updated by
>>>>>> GC correctly. Only objects for shared classes from the builtin
>>>>>> class loaders (boot, PlatformClassLoaders, and AppClassLoaders)
>>>>>> are supported for caching.
>>>>>>
>>>>>> Caching Java Objects at Archive Dump Time
>>>>>>
>>>>>> The closed archive and open archive regions are allocated near
>>>>>> the top of the dump time java heap. Archived java objects
>>>>>> are copied into the designated archive heap regions. For example,
>>>>>> String objects and the underlying 'value' arrays are copied into
>>>>>> the closed archive regions. All references to the archived
>>>>>> objects (from shared class metadata, string table, etc) are set
>>>>>> to the new heap locations. A hash table is used to keep track of
>>>>>> all archived java objects during the copying process to make sure
>>>>>> java object is not archived more than once if reached from
>>>>>> different roots. It also makes sure references to the same
>>>>>> archived object are updated using the same new address location.
>>>>>>
>>>>>> Caching Constant Pool resolved_references Array
>>>>>>
>>>>>> The 'resolved_references' is an array that holds references of
>>>>>> resolved constant pool entries including Strings, mirrors
>>>>>> and methodTypes, etc. Each loaded class has one
>>>>>> 'resolved_references' array (in ConstantPoolCache). The
>>>>>> 'resolved_references' arrays are copied into the open archive
>>>>>> regions during dump process. Prior to copying the
>>>>>> 'resolved_references' arrays, JVM iterates through constant pool
>>>>>> entries and resolves all JVM_CONSTANT_String entries to existing
>>>>>> interned Strings for all archived classes. When resolving, JVM
>>>>>> only looks up the string table and finds existing interned
>>>>>> Strings without inserting new ones. If a string entry cannot be
>>>>>> resolved to an existing interned String, the constant pool entry
>>>>>> remain as unresolved. That prevents memory waste if a constant
>>>>>> pool string entry is never used at runtime.
>>>>>>
>>>>>> All String objects referenced by the string table are copied
>>>>>> first into the closed archive regions. The string table entry is
>>>>>> updated with the new location when each String object is
>>>>>> archived. The JVM updates the resolved constant pool string
>>>>>> entries with the new object locations when copying the
>>>>>> 'resolved_references' arrays to the open archive regions.
>>>>>> References to the 'resolved_references' arrays in the
>>>>>> ConstantPoolCache are also updated.
>>>>>> At runtime as part of ConstantPool::restore_unshareable_info()
>>>>>> work, call G1SATBCardTableModRefBS::enqueue() to let GC know the
>>>>>> 'resolved_references' is becoming live. A handle is created for
>>>>>> the cached object and added to the loader_data's handles.
>>>>>>
>>>>>> Runtime Java Heap With Cached Java Objects
>>>>>>
>>>>>>
>>>>>> The closed archive regions (the string regions) and open archive
>>>>>> regions are mapped to the runtime java heap at the same
>>>>>> offsets as the dump time offsets from the runtime java heap base.
>>>>>>
>>>>>> Preliminary test execution and status:
>>>>>>
>>>>>> JPRT: passed
>>>>>> Tier2-rt: passed
>>>>>> Tier2-gc: passed
>>>>>> Tier2-comp: passed
>>>>>> Tier3-rt: passed
>>>>>> Tier3-gc: passed
>>>>>> Tier3-comp: passed
>>>>>> Tier4-rt: passed
>>>>>> Tier4-gc: passed
>>>>>> Tier4-comp:6 jobs timed out, all other tests passed
>>>>>> Tier5-rt: one test failed but passed when running locally, all
>>>>>> other tests passed
>>>>>> Tier5-gc: passed
>>>>>> Tier5-comp: running
>>>>>> hotspot_gc: two jobs timed out, all other tests passed
>>>>>> hotspot_gc in CDS mode: two jobs timed out, all other tests passed
>>>>>> vm.gc: passed
>>>>>> vm.gc in CDS mode: passed
>>>>>> Kichensink: passed
>>>>>> Kichensink in CDS mode: passed
>>>>>>
>>>>>> Thanks,
>>>>>> Jiangli
>>>>>
>>>>
>>>
>>
>

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: RFR: 8179302: Pre-resolve constant pool string entries and cache resolved_reference arrays in CDS archive

Jiangli Zhou
Hi Ioi,

Thanks for getting back to me.

> On Aug 7, 2017, at 5:45 PM, Ioi Lam <[hidden email]> wrote:
>
> On 8/4/17 10:19 PM, Jiangli Zhou wrote:
>> Hi Ioi,
>>
>> Thanks for looking again.
>>
>>> On Aug 4, 2017, at 2:22 PM, Ioi Lam <[hidden email] <mailto:[hidden email]>> wrote:
>>>
>>> Hi Jiangli,
>>>
>>> The code looks good in general. I just have a few pet peeves for readability:
>>>
>>>
>>> (1) stringTable.cpp and metaspaceShared.cpp have the same asserts
>>>
>>>  704   assert(UseG1GC, "Only support G1 GC");
>>>  705   assert(UseCompressedOops && UseCompressedClassPointers,
>>>  706          "Only support UseCompressedOops and UseCompressedClassPointers enabled");
>>>
>>> 1615   assert(UseG1GC, "Only support G1 GC");
>>> 1616   assert(UseCompressedOops && UseCompressedClassPointers,
>>> 1617          "Only support UseCompressedOops and UseCompressedClassPointers enabled");
>>>
>>> Maybe it's better to combine them into a single function like MetaspaceShared::assert_vm_flags() so they don't get out of sync?
>>
>> There is a MetaspaceShared::allow_archive_heap_object(), which checks for UseG1GC, UseCompressedOops and UseCompressedClassPointers combined. It does not seem to worth add another separate API for asserting the required flags. I’ll use that in the assert.
>>
>>>
>>>
>>>
>>> (2) FileMapInfo::write_archive_heap_regions()
>>>
>>> I still find this code very hard to read, especially due to the loop.
>>>
>>> First, the comments are not consistent with the code:
>>>
>>>     498   assert(arr_len <= max_num_regions, "number of memory regions exceeds maximum");
>>>
>>> but the comments says: "The rest are consecutive full GC regions" which means there's a chance for max_num_regions to be more than 2 (which will be the case with Calvin's java-loader dumping changes using very small heap size). So the code is actually wrong.
>>
>> The max_num_regions is the maximum number of region for each archived heap space (the string space, or open archive space). We only run into the case where the MemRegion array size is larger than max_num_regions with Calvin’s pending change. As part of Calvin’s change, he will change the assert into a check and bail out if the number of MemRegions are larger than max_num_regions due to heap fragmentation.
>>
>>
> Your latest patch assumes that arr_len <= 2, but the implementation of G1CollectedHeap::heap()->begin_archive_alloc_range() / G1CollectedHeap::heap()->end_archive_alloc_range() actually allows more than 2 regions to returned. So simply putting an assert there seems risky (unless you have analyzed all possible scenarios to prove that's impossible).
>
> Instead of trying to come up with a complicated proof, I think it's much safer to disable the archived string region if the arr_len > 2. Also, if the string region is disabled, you should also disable the open_archive_heap_region
>
> I think this is a general issue with the mapped heap regions, and it just happens to be revealed by Calvin's patch. So we should fix it now and not wait for Calvin's patch.

Ok. I’ll change the assert to be a check.

>
>
>>>
>>> The word "region" is used in these parameters, but they don't mean the same thing.
>>>
>>>               GrowableArray<MemRegion> *regions
>>>               int first_region, int max_num_regions,
>>>
>>>
>>> How about regions      -> g1_regions_list
>>>           first_region -> first_region_in_archive
>>
>> The GrowableArray above is the MemRegions that GC code gives back to us. The GC code combines multiple G1 regions. The comments probably are over-explaining the details, which are hidden in the GC code. Probably that’s the confusing source. I’ll make the comment more clear.
>>
>> Using g1_regions_list would also be confusing, since write_archive_heap_regions does not handle G1 regions directly. It processes the MemRegion array that GC code returns. How about changing ‘regions’ to ‘mem_regions’ or ‘archive_regions'?
>>
> How about heap_regions? These are regions in the active Java heap, which current has not mapped anything from the CDS archive.

Ok.

I’m updating my changes and will send out a consolidated webrev.

Thanks!
Jiangli

>
>
>>>
>>>
>>> In the comments, I find the phrase 'the current archive heap region' ambiguous. It could be (erroneously) interpreted as "a region from the currently mapped archive”
>>
>>>
>>> To make it unambiguous, how about changing
>>>
>>>
>>>  464 // Write the current archive heap region, which contains one or multiple GC(G1) regions.
>>>
>>>
>>> to
>>>
>>>     // Write the given list of G1 memory regions into the archive, starting at
>>>     // first_region_in_archive.
>>
>>
>> Ok. How about the following:
>>
>> // Write the given list of java heap memory regions into the archive, starting at
>> // first_region_in_archive.
>>
> Sounds good.
>
> Thanks
> - Ioi
>
>>>
>>>
>>> Also, for the explanation of how the G1 regions are written into the archive, how about:
>>>
>>>    // The G1 regions in the list are sorted in ascending address order. When there are more objects
>>>    // than the capacity of a single G1 region, the bottom-most G1 region may be partially filled, and the
>>>    // remaining G1 region(s) are consecutively allocated and fully filled.
>>>    //
>>>    // Thus, the bottom-most G1 region (if not empty) is written into first_region_in_archive.
>>>    // The remaining G1 regions (if exist) are coalesced and written as a single block
>>>    // into (first_region_in_archive + 1)
>>>
>>>    // Here's the mapping from (g1 regions) -> (archive regions).
>>>
>>>
>>> All this function needs to do is to decide the values for
>>>
>>>      r0_start, r0_top
>>>      r1_start, r1_top
>>>
>>> I think it would be much better to not use the loop, and not use the max_num_regions parameter (it's always 2 anyway).
>>>
>>>      *r0_start = *r0_top = NULL;
>>>      *r1_start = *r1_top = NULL;
>>>
>>>      if (arr_len >= 1) {
>>>          *r0_start = regions->at(0).start();
>>>          *r0_end = *r0_start + regions->at(0).byte_size();
>>>      }
>>>      if (arr_len >= 2) {
>>>          int last = arr_len - 1;
>>>          *r1_start = regions->at(1).start();
>>>          *r1_end = regions->at(last).start() + regions->at(last).byte_size();
>>>       }
>>>
>>> what do you think?
>>
>> We need to write out all archive regions including the empty ones. The loop using max_num_regions is the easiest way. I’d like to remove the code that deals with r0_* and r1_ explicitly. Let me try that.
>>
>>>
>>>
>>>
>>> (3) metaspace.cpp
>>>
>>> 3350         // Map the archived heap regions after compressed pointers
>>> 3351         // because it relies on compressed class pointers setting to work
>>>
>>> do you mean this?
>>>
>>>     // Archived heap regions depend on the parameters of compressed class pointers, so
>>>     // they must be mapped after such parameters have been decided in the above call.
>>
>> Hmmm, maybe use ‘arguments’ instead of ‘parameters’?
>>  
>>>
>>>
>>> (4) I found this name not strictly grammatical. How about this:
>>>
>>>      allow_archive_heap_object -> is_heap_object_archiving_allowed
>>
>> Ok.
>>
>>>
>>> (5) in most of your code, 'archive' is used as a noun, except in StringTable::archive_string() where it's used as a verb.
>>>
>>> archive_string could also be interpreted erroneously as "return a string that's already in the archive". So to be consistent and unambiguous, I think it's better to rename it to StringTable::create_archived_string()
>>
>> Ok.
>>
>> Thanks,
>> Jiangli
>>
>>>
>>>
>>> Thanks
>>> - Ioi
>>>
>>>
>>> On 8/3/17 5:15 PM, Jiangli Zhou wrote:
>>>> Here are the updated webrevs.
>>>>
>>>> http://cr.openjdk.java.net/~jiangli/8179302/webrev.hotspot.02/ <http://cr.openjdk.java.net/%7Ejiangli/8179302/webrev.hotspot.02/>
>>>> http://cr.openjdk.java.net/~jiangli/8179302/webrev.whitebox.02/ <http://cr.openjdk.java.net/%7Ejiangli/8179302/webrev.whitebox.02/>
>>>>
>>>> Changes in the updated webrevs include:
>>>> Merge with Ioi’s recent shared space auto-sizing change (8072061)
>>>> Addressed all feedbacks from Ioi and Coleen (Thanks for detailed review!)
>>>>
>>>> Thanks,
>>>> Jiangli
>>>>
>>>>
>>>>> On Aug 1, 2017, at 5:29 PM, Jiangli Zhou <[hidden email]> <mailto:[hidden email]> wrote:
>>>>>
>>>>> Hi Ioi,
>>>>>
>>>>> Thank you so much for reviewing this. I’ve addressed all your feedbacks. Please see details below. I’ll updated the webrev after addressing Coleen’s comments.
>>>>>
>>>>>> On Jul 30, 2017, at 9:07 PM, Ioi Lam <[hidden email]> <mailto:[hidden email]> wrote:
>>>>>>
>>>>>> Hi Jiangli,
>>>>>>
>>>>>> Here are my comments. I've not reviewed the GC code and I'll leave that to the GC experts :-)
>>>>>>
>>>>>> stringTable.cpp: StringTable::archive_string
>>>>>>
>>>>>>    add assert for DumpSharedSpaces only
>>>>>
>>>>> Ok.
>>>>>
>>>>>>
>>>>>> filemap.cpp
>>>>>>
>>>>>> 525 void FileMapInfo::write_archive_heap_regions(GrowableArray<MemRegion> *regions,
>>>>>> 526                                              int first_region, int num_regions) {
>>>>>>
>>>>>> When I first read this function, I found it hard to follow, especially this part that coalesces the trailing regions:
>>>>>>
>>>>>> 537         int len = regions->length();
>>>>>> 538         if (len > 1) {
>>>>>> 539           start = (char*)regions->at(1).start();
>>>>>> 540           size = (char*)regions->at(len - 1).end() - start;
>>>>>> 541         }
>>>>>> 542       }
>>>>>>
>>>>>> The rest of filemap.cpp always perform identical operations on MemRegion arrays, which are either 1 or 2 in size. However, this function doesn't follow that pattern; it also has a very different notion of "region", and the confusing part is regions->size() is not the same as num_regions.
>>>>>>
>>>>>> How about we change the API to something like the following? Before calling this API, the caller needs to coalesce the trailing G1 regions into a single MemRegion.
>>>>>>
>>>>>>     FileMapInfo::write_archive_heap_regions(MemRegion *regions, int first, int num_regions) {
>>>>>>        if (first == MetaspaceShared::first_string) {
>>>>>>           assert(num_regons <=  MetaspaceShared::max_strings, "...");
>>>>>>        } else {
>>>>>>           assert(first == MetaspaceShared::first_open_archive_heap_region, "...");
>>>>>>           assert(num_regons <= MetaspaceShared::max_open_archive_heap_region, "...");
>>>>>> }
>>>>>>        ....
>>>>>>
>>>>>>
>>>>>
>>>>> I’ve reworked the function and simplified the code.
>>>>>
>>>>>>
>>>>>> 756   if (!string_data_mapped) {
>>>>>> 757     StringTable::ignore_shared_strings(true);
>>>>>> 758     assert(string_ranges == NULL && num_string_ranges == 0, "sanity");
>>>>>> 759   }
>>>>>> 760
>>>>>> 761   if (open_archive_heap_data_mapped) {
>>>>>> 762     MetaspaceShared::set_open_archive_heap_region_mapped();
>>>>>> 763   } else {
>>>>>> 764     assert(open_archive_heap_ranges == NULL && num_open_archive_heap_ranges == 0, "sanity");
>>>>>> 765   }
>>>>>>
>>>>>> Maybe the two "if" statements should be more consistent? Instead of StringTable::ignore_shared_strings, how about StringTable::set_shared_strings_region_mapped()?
>>>>>
>>>>> Fixed.
>>>>>
>>>>>>
>>>>>> FileMapInfo::map_heap_data() --
>>>>>>
>>>>>> 818     char* addr = (char*)regions[i].start();
>>>>>> 819     char* base = os::map_memory(_fd, _full_path, si->_file_offset,
>>>>>> 820                                 addr, regions[i].byte_size(), si->_read_only,
>>>>>> 821                                 si->_allow_exec);
>>>>>>
>>>>>> What happens when the first region succeeds to map but the second region fails to map? Will both regions be unmapped? I don't see where you store the return value (base) from os::map_memory(). Does it mean the code assumes that (addr == base). If so, we need an assert here.
>>>>>
>>>>> If any of the region fails to map, we bail out and call dealloc_archive_heap_regions(), which handles the deallocation of any regions specified. If second region fails to map, all memory ranges specified by ‘regions’ array are deallocated. We don’t unmap the memory here since it is part of the java heap. Unmapping of heap memory are handled by GC code. The ‘if’ check below makes sure base == addr.
>>>>>
>>>>>     if (base == NULL || base != addr) {
>>>>>       // dealloc the regions from java heap
>>>>>       dealloc_archive_heap_regions(regions, region_num);
>>>>>       if (log_is_enabled(Info, cds)) {
>>>>>         log_info(cds)("UseSharedSpaces: Unable to map at required address in java heap.");
>>>>>       }
>>>>>       return false;
>>>>>     }
>>>>>
>>>>>
>>>>>>
>>>>>> constantPool.cpp
>>>>>>
>>>>>>     Handle refs_handle;
>>>>>>     ...
>>>>>>     refs_handle = Handle(THREAD, (oop)archived);
>>>>>>
>>>>>> This will first create a NULL handle, then construct a temporary handle, and then assign the temp handle back to the null handle. This means two handles will be pushed onto THREAD->metadata_handles()
>>>>>>
>>>>>> I think it's more efficient if you merge these into a single statement
>>>>>>
>>>>>>     Handle refs_handle(THREAD, (oop)archived);
>>>>>
>>>>> Fixed.
>>>>>
>>>>>>
>>>>>> Is this experimental code? Maybe it should be removed?
>>>>>>
>>>>>> 664     if (tag_at(index).is_unresolved_klass()) {
>>>>>> 665 #if 0
>>>>>> 666       CPSlot entry = cp->slot_at(index);
>>>>>> 667       Symbol* name = entry.get_symbol();
>>>>>> 668       Klass* k = SystemDictionary::find(name, NULL, NULL, THREAD);
>>>>>> 669       if (k != NULL) {
>>>>>> 670         klass_at_put(index, k);
>>>>>> 671       }
>>>>>> 672 #endif
>>>>>> 673     } else
>>>>>
>>>>> Removed.
>>>>>
>>>>>>
>>>>>> cpCache.hpp:
>>>>>>
>>>>>>     u8   _archived_references
>>>>>>
>>>>>> shouldn't this be declared as an narrowOop to avoid the type casts when it's used?
>>>>>
>>>>> Ok.
>>>>>
>>>>>>
>>>>>> cpCache.cpp:
>>>>>>
>>>>>>    add assert so that one of these is used only at dump time and the other only at run time?
>>>>>>
>>>>>> 610 oop ConstantPoolCache::archived_references() {
>>>>>> 611   return oopDesc::decode_heap_oop((narrowOop)_archived_references);
>>>>>> 612 }
>>>>>> 613
>>>>>> 614 void ConstantPoolCache::set_archived_references(oop o) {
>>>>>> 615   _archived_references = (u8)oopDesc::encode_heap_oop(o);
>>>>>> 616 }
>>>>>
>>>>> Ok.
>>>>>
>>>>> Thanks!
>>>>>
>>>>> Jiangli
>>>>>
>>>>>>
>>>>>> Thanks!
>>>>>> - Ioi
>>>>>>
>>>>>> On 7/27/17 1:37 PM, Jiangli Zhou wrote:
>>>>>>> Sorry, the mail didn’t handle the rich text well. I fixed the format below.
>>>>>>>
>>>>>>> Please help review the changes for JDK-8179302 (Pre-resolve constant pool string entries and cache resolved_reference arrays in CDS archive). Currently, the CDS archive can contain cached class metadata, interned java.lang.String objects. This RFE adds the constant pool ‘resolved_references’ arrays (hotspot specific) to the archive for startup/runtime performance enhancement. The ‘resolved_references' arrays are used to hold references of resolved constant pool entries including Strings, mirrors, etc. With the 'resolved_references’ being cached, string constants in shared classes can now be resolved to existing interned java.lang.Strings at CDS dump time. G1 and 64-bit platforms are required.
>>>>>>>
>>>>>>> The GC changes in the RFE were discussed and guided by Thomas Schatzl and GC team. Part of the changes were contributed by Thomas himself.
>>>>>>> RFE:        https://bugs.openjdk.java.net/browse/JDK-8179302 <https://bugs.openjdk.java.net/browse/JDK-8179302>
>>>>>>> hotspot:   http://cr.openjdk.java.net/~jiangli/8179302/webrev.hotspot.01/ <http://cr.openjdk.java.net/%7Ejiangli/8179302/webrev.hotspot.01/>
>>>>>>> whitebox: http://cr.openjdk.java.net/~jiangli/8179302/webrev.whitebox.01/ <http://cr.openjdk.java.net/%7Ejiangli/8179302/webrev.whitebox.01/>
>>>>>>>
>>>>>>> Please see below for details of supporting cached ‘resolved_references’ and pre-resolving string constants.
>>>>>>>
>>>>>>> Types of Pinned G1 Heap Regions
>>>>>>>
>>>>>>> The pinned region type is a super type of all archive region types, which include the open archive type and the closed archive type.
>>>>>>>
>>>>>>> 00100 0 [ 8] Pinned Mask
>>>>>>> 01000 0 [16] Old Mask
>>>>>>> 10000 0 [32] Archive Mask
>>>>>>> 11100 0 [56] Open Archive:   ArchiveMask | PinnedMask | OldMask
>>>>>>> 11100 1 [57] Closed Archive: ArchiveMask | PinnedMask | OldMask + 1
>>>>>>>
>>>>>>>
>>>>>>> Pinned Regions
>>>>>>>
>>>>>>> Objects within the region are 'pinned', which means GC does not move any live objects. GC scans and marks objects in the pinned region as normal, but skips forwarding live objects. Pointers in live objects are updated. Dead objects (unreachable) can be collected and freed.
>>>>>>>
>>>>>>> Archive Regions
>>>>>>>
>>>>>>> The archive types are sub-types of 'pinned'. There are two types of archive region currently, open archive and closed archive. Both can support caching java heap objects via the CDS archive.
>>>>>>>
>>>>>>> An archive region is also an old region by design.
>>>>>>>
>>>>>>> Open Archive (GC-RW) Regions
>>>>>>>
>>>>>>> Open archive region is GC writable. GC scans & marks objects within the region and adjusts (updates) pointers in live objects the same way as a pinned region. Live objects (reachable) are pinned and not forwarded by GC.
>>>>>>> Open archive region does not have 'dead' objects. Unreachable objects are 'dormant' objects. Dormant objects are not collected and freed by GC.
>>>>>>>
>>>>>>> Adjustable Outgoing Pointers
>>>>>>>
>>>>>>> As GC can adjust pointers within the live objects in open archive heap region, objects can have outgoing pointers to another java heap region, including closed archive region, open archive region, pinned (or humongous) region, and normal generational region. When a referenced object is moved by GC, the pointer within the open archive region is updated accordingly.
>>>>>>>
>>>>>>> Closed Archive (GC-RO) Regions
>>>>>>>
>>>>>>> The closed archive region is GC read-only region. GC cannot write into the region. Objects are not scanned and marked by GC. Objects are pinned and not forwarded. Pointers are not updated by GC either. Hence, objects within the archive region cannot have any outgoing pointers to another java heap region. Objects however can still have pointers to other objects within the closed archive regions (we might allow pointers to open archive regions in the future). That restricts the type of java objects that can be supported by the archive region.
>>>>>>> In JDK 9 we support archive Strings with the archive regions.
>>>>>>>
>>>>>>> The GC-readonly archive region makes java heap memory sharable among different JVM processes. NOTE: synchronization on the objects within the archive heap region can still cause writes to the memory page.
>>>>>>>
>>>>>>> Dormant Objects
>>>>>>>
>>>>>>> Dormant objects are unreachable java objects within the open archive heap region.
>>>>>>> A java object in the open archive heap region is a live object if it can be reached during scanning. Some of the java objects in the region may not be reachable during scanning. Those objects are considered as dormant, but not dead. For example, a constant pool 'resolved_references' array is reachable via the klass root if its container klass (shared) is already loaded at the time during GC scanning. If a shared klass is not yet loaded, the klass root is not scanned and it's constant pool 'resolved_reference' array (A) in the open archive region is not reachable. Then A is a dormant object.
>>>>>>>
>>>>>>> Object State Transition
>>>>>>>
>>>>>>> All java objects are initially dormant objects when open archive heap regions are mapped to the runtime java heap. A dormant object becomes live object when the associated shared class is loaded at runtime. Explicit call to G1SATBCardTableModRefBS::enqueue() needs to be made when a dormant object becomes live. That should be the case for cached objects with strong roots as well, since strong roots are only scanned at the start of GC marking (the initial marking) but not during Remarking/Final marking. If a cached object becomes live during concurrent marking phase, G1 may not find it and mark it live unless a call to G1SATBCardTableModRefBS::enqueue() is made for the object.
>>>>>>>
>>>>>>> Currently, a live object in the open archive heap region cannot become dormant again. This restriction simplifies GC requirement and guarantees all outgoing pointers are updated by GC correctly. Only objects for shared classes from the builtin class loaders (boot, PlatformClassLoaders, and AppClassLoaders) are supported for caching.
>>>>>>>
>>>>>>> Caching Java Objects at Archive Dump Time
>>>>>>>
>>>>>>> The closed archive and open archive regions are allocated near the top of the dump time java heap. Archived java objects are copied into the designated archive heap regions. For example, String objects and the underlying 'value' arrays are copied into the closed archive regions. All references to the archived objects (from shared class metadata, string table, etc) are set to the new heap locations. A hash table is used to keep track of all archived java objects during the copying process to make sure java object is not archived more than once if reached from different roots. It also makes sure references to the same archived object are updated using the same new address location.
>>>>>>>
>>>>>>> Caching Constant Pool resolved_references Array
>>>>>>>
>>>>>>> The 'resolved_references' is an array that holds references of resolved constant pool entries including Strings, mirrors and methodTypes, etc. Each loaded class has one 'resolved_references' array (in ConstantPoolCache). The 'resolved_references' arrays are copied into the open archive regions during dump process. Prior to copying the 'resolved_references' arrays, JVM iterates through constant pool entries and resolves all JVM_CONSTANT_String entries to existing interned Strings for all archived classes. When resolving, JVM only looks up the string table and finds existing interned Strings without inserting new ones. If a string entry cannot be resolved to an existing interned String, the constant pool entry remain as unresolved. That prevents memory waste if a constant pool string entry is never used at runtime.
>>>>>>>
>>>>>>> All String objects referenced by the string table are copied first into the closed archive regions. The string table entry is updated with the new location when each String object is archived. The JVM updates the resolved constant pool string entries with the new object locations when copying the 'resolved_references' arrays to the open archive regions. References to the 'resolved_references' arrays in the ConstantPoolCache are also updated.
>>>>>>> At runtime as part of ConstantPool::restore_unshareable_info() work, call G1SATBCardTableModRefBS::enqueue() to let GC know the 'resolved_references' is becoming live. A handle is created for the cached object and added to the loader_data's handles.
>>>>>>>
>>>>>>> Runtime Java Heap With Cached Java Objects
>>>>>>>
>>>>>>>
>>>>>>> The closed archive regions (the string regions) and open archive regions are mapped to the runtime java heap at the same offsets as the dump time offsets from the runtime java heap base.
>>>>>>>
>>>>>>> Preliminary test execution and status:
>>>>>>>
>>>>>>> JPRT: passed
>>>>>>> Tier2-rt: passed
>>>>>>> Tier2-gc: passed
>>>>>>> Tier2-comp: passed
>>>>>>> Tier3-rt: passed
>>>>>>> Tier3-gc: passed
>>>>>>> Tier3-comp: passed
>>>>>>> Tier4-rt: passed
>>>>>>> Tier4-gc: passed
>>>>>>> Tier4-comp:6 jobs timed out, all other tests passed
>>>>>>> Tier5-rt: one test failed but passed when running locally, all other tests passed
>>>>>>> Tier5-gc: passed
>>>>>>> Tier5-comp: running
>>>>>>> hotspot_gc: two jobs timed out, all other tests passed
>>>>>>> hotspot_gc in CDS mode: two jobs timed out, all other tests passed
>>>>>>> vm.gc: passed
>>>>>>> vm.gc in CDS mode: passed
>>>>>>> Kichensink: passed
>>>>>>> Kichensink in CDS mode: passed
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Jiangli
>>>>>>
>>>>>
>>>>
>>>
>>
>

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: RFR: 8179302: Pre-resolve constant pool string entries and cache resolved_reference arrays in CDS archive

Jiangli Zhou
Here is the incremental webrev that has all the changes incorporated with suggestions from Coleen, Ioi and Thomas:

http://cr.openjdk.java.net/~jiangli/8179302/webrev.hotspot.03.inc/

Updated full webrev: http://cr.openjdk.java.net/~jiangli/8179302/webrev.hotspot.03/

Thanks again for Coleen's, Ioi's and Thomas’ review!
Jiangli

> On Aug 7, 2017, at 7:57 PM, Jiangli Zhou <[hidden email]> wrote:
>
> Hi Ioi,
>
> Thanks for getting back to me.
>
>> On Aug 7, 2017, at 5:45 PM, Ioi Lam <[hidden email] <mailto:[hidden email]>> wrote:
>>
>> On 8/4/17 10:19 PM, Jiangli Zhou wrote:
>>> Hi Ioi,
>>>
>>> Thanks for looking again.
>>>
>>>> On Aug 4, 2017, at 2:22 PM, Ioi Lam <[hidden email] <mailto:[hidden email]>> wrote:
>>>>
>>>> Hi Jiangli,
>>>>
>>>> The code looks good in general. I just have a few pet peeves for readability:
>>>>
>>>>
>>>> (1) stringTable.cpp and metaspaceShared.cpp have the same asserts
>>>>
>>>>  704   assert(UseG1GC, "Only support G1 GC");
>>>>  705   assert(UseCompressedOops && UseCompressedClassPointers,
>>>>  706          "Only support UseCompressedOops and UseCompressedClassPointers enabled");
>>>>
>>>> 1615   assert(UseG1GC, "Only support G1 GC");
>>>> 1616   assert(UseCompressedOops && UseCompressedClassPointers,
>>>> 1617          "Only support UseCompressedOops and UseCompressedClassPointers enabled");
>>>>
>>>> Maybe it's better to combine them into a single function like MetaspaceShared::assert_vm_flags() so they don't get out of sync?
>>>
>>> There is a MetaspaceShared::allow_archive_heap_object(), which checks for UseG1GC, UseCompressedOops and UseCompressedClassPointers combined. It does not seem to worth add another separate API for asserting the required flags. I’ll use that in the assert.
>>>
>>>>
>>>>
>>>>
>>>> (2) FileMapInfo::write_archive_heap_regions()
>>>>
>>>> I still find this code very hard to read, especially due to the loop.
>>>>
>>>> First, the comments are not consistent with the code:
>>>>
>>>>     498   assert(arr_len <= max_num_regions, "number of memory regions exceeds maximum");
>>>>
>>>> but the comments says: "The rest are consecutive full GC regions" which means there's a chance for max_num_regions to be more than 2 (which will be the case with Calvin's java-loader dumping changes using very small heap size). So the code is actually wrong.
>>>
>>> The max_num_regions is the maximum number of region for each archived heap space (the string space, or open archive space). We only run into the case where the MemRegion array size is larger than max_num_regions with Calvin’s pending change. As part of Calvin’s change, he will change the assert into a check and bail out if the number of MemRegions are larger than max_num_regions due to heap fragmentation.
>>>
>>>
>> Your latest patch assumes that arr_len <= 2, but the implementation of G1CollectedHeap::heap()->begin_archive_alloc_range() / G1CollectedHeap::heap()->end_archive_alloc_range() actually allows more than 2 regions to returned. So simply putting an assert there seems risky (unless you have analyzed all possible scenarios to prove that's impossible).
>>
>> Instead of trying to come up with a complicated proof, I think it's much safer to disable the archived string region if the arr_len > 2. Also, if the string region is disabled, you should also disable the open_archive_heap_region
>>
>> I think this is a general issue with the mapped heap regions, and it just happens to be revealed by Calvin's patch. So we should fix it now and not wait for Calvin's patch.
>
> Ok. I’ll change the assert to be a check.
>
>>
>>
>>>>
>>>> The word "region" is used in these parameters, but they don't mean the same thing.
>>>>
>>>>               GrowableArray<MemRegion> *regions
>>>>               int first_region, int max_num_regions,
>>>>
>>>>
>>>> How about regions      -> g1_regions_list
>>>>           first_region -> first_region_in_archive
>>>
>>> The GrowableArray above is the MemRegions that GC code gives back to us. The GC code combines multiple G1 regions. The comments probably are over-explaining the details, which are hidden in the GC code. Probably that’s the confusing source. I’ll make the comment more clear.
>>>
>>> Using g1_regions_list would also be confusing, since write_archive_heap_regions does not handle G1 regions directly. It processes the MemRegion array that GC code returns. How about changing ‘regions’ to ‘mem_regions’ or ‘archive_regions'?
>>>
>> How about heap_regions? These are regions in the active Java heap, which current has not mapped anything from the CDS archive.
>
> Ok.
>
> I’m updating my changes and will send out a consolidated webrev.
>
> Thanks!
> Jiangli
>
>>
>>
>>>>
>>>>
>>>> In the comments, I find the phrase 'the current archive heap region' ambiguous. It could be (erroneously) interpreted as "a region from the currently mapped archive”
>>>
>>>>
>>>> To make it unambiguous, how about changing
>>>>
>>>>
>>>>  464 // Write the current archive heap region, which contains one or multiple GC(G1) regions.
>>>>
>>>>
>>>> to
>>>>
>>>>     // Write the given list of G1 memory regions into the archive, starting at
>>>>     // first_region_in_archive.
>>>
>>>
>>> Ok. How about the following:
>>>
>>> // Write the given list of java heap memory regions into the archive, starting at
>>> // first_region_in_archive.
>>>
>> Sounds good.
>>
>> Thanks
>> - Ioi
>>
>>>>
>>>>
>>>> Also, for the explanation of how the G1 regions are written into the archive, how about:
>>>>
>>>>    // The G1 regions in the list are sorted in ascending address order. When there are more objects
>>>>    // than the capacity of a single G1 region, the bottom-most G1 region may be partially filled, and the
>>>>    // remaining G1 region(s) are consecutively allocated and fully filled.
>>>>    //
>>>>    // Thus, the bottom-most G1 region (if not empty) is written into first_region_in_archive.
>>>>    // The remaining G1 regions (if exist) are coalesced and written as a single block
>>>>    // into (first_region_in_archive + 1)
>>>>
>>>>    // Here's the mapping from (g1 regions) -> (archive regions).
>>>>
>>>>
>>>> All this function needs to do is to decide the values for
>>>>
>>>>      r0_start, r0_top
>>>>      r1_start, r1_top
>>>>
>>>> I think it would be much better to not use the loop, and not use the max_num_regions parameter (it's always 2 anyway).
>>>>
>>>>      *r0_start = *r0_top = NULL;
>>>>      *r1_start = *r1_top = NULL;
>>>>
>>>>      if (arr_len >= 1) {
>>>>          *r0_start = regions->at(0).start();
>>>>          *r0_end = *r0_start + regions->at(0).byte_size();
>>>>      }
>>>>      if (arr_len >= 2) {
>>>>          int last = arr_len - 1;
>>>>          *r1_start = regions->at(1).start();
>>>>          *r1_end = regions->at(last).start() + regions->at(last).byte_size();
>>>>       }
>>>>
>>>> what do you think?
>>>
>>> We need to write out all archive regions including the empty ones. The loop using max_num_regions is the easiest way. I’d like to remove the code that deals with r0_* and r1_ explicitly. Let me try that.
>>>
>>>>
>>>>
>>>>
>>>> (3) metaspace.cpp
>>>>
>>>> 3350         // Map the archived heap regions after compressed pointers
>>>> 3351         // because it relies on compressed class pointers setting to work
>>>>
>>>> do you mean this?
>>>>
>>>>     // Archived heap regions depend on the parameters of compressed class pointers, so
>>>>     // they must be mapped after such parameters have been decided in the above call.
>>>
>>> Hmmm, maybe use ‘arguments’ instead of ‘parameters’?
>>>  
>>>>
>>>>
>>>> (4) I found this name not strictly grammatical. How about this:
>>>>
>>>>      allow_archive_heap_object -> is_heap_object_archiving_allowed
>>>
>>> Ok.
>>>
>>>>
>>>> (5) in most of your code, 'archive' is used as a noun, except in StringTable::archive_string() where it's used as a verb.
>>>>
>>>> archive_string could also be interpreted erroneously as "return a string that's already in the archive". So to be consistent and unambiguous, I think it's better to rename it to StringTable::create_archived_string()
>>>
>>> Ok.
>>>
>>> Thanks,
>>> Jiangli
>>>
>>>>
>>>>
>>>> Thanks
>>>> - Ioi
>>>>
>>>>
>>>> On 8/3/17 5:15 PM, Jiangli Zhou wrote:
>>>>> Here are the updated webrevs.
>>>>>
>>>>> http://cr.openjdk.java.net/~jiangli/8179302/webrev.hotspot.02/ <http://cr.openjdk.java.net/%7Ejiangli/8179302/webrev.hotspot.02/>
>>>>> http://cr.openjdk.java.net/~jiangli/8179302/webrev.whitebox.02/ <http://cr.openjdk.java.net/%7Ejiangli/8179302/webrev.whitebox.02/>
>>>>>
>>>>> Changes in the updated webrevs include:
>>>>> Merge with Ioi’s recent shared space auto-sizing change (8072061)
>>>>> Addressed all feedbacks from Ioi and Coleen (Thanks for detailed review!)
>>>>>
>>>>> Thanks,
>>>>> Jiangli
>>>>>
>>>>>
>>>>>> On Aug 1, 2017, at 5:29 PM, Jiangli Zhou <[hidden email]> <mailto:[hidden email]> wrote:
>>>>>>
>>>>>> Hi Ioi,
>>>>>>
>>>>>> Thank you so much for reviewing this. I’ve addressed all your feedbacks. Please see details below. I’ll updated the webrev after addressing Coleen’s comments.
>>>>>>
>>>>>>> On Jul 30, 2017, at 9:07 PM, Ioi Lam <[hidden email]> <mailto:[hidden email]> wrote:
>>>>>>>
>>>>>>> Hi Jiangli,
>>>>>>>
>>>>>>> Here are my comments. I've not reviewed the GC code and I'll leave that to the GC experts :-)
>>>>>>>
>>>>>>> stringTable.cpp: StringTable::archive_string
>>>>>>>
>>>>>>>    add assert for DumpSharedSpaces only
>>>>>>
>>>>>> Ok.
>>>>>>
>>>>>>>
>>>>>>> filemap.cpp
>>>>>>>
>>>>>>> 525 void FileMapInfo::write_archive_heap_regions(GrowableArray<MemRegion> *regions,
>>>>>>> 526                                              int first_region, int num_regions) {
>>>>>>>
>>>>>>> When I first read this function, I found it hard to follow, especially this part that coalesces the trailing regions:
>>>>>>>
>>>>>>> 537         int len = regions->length();
>>>>>>> 538         if (len > 1) {
>>>>>>> 539           start = (char*)regions->at(1).start();
>>>>>>> 540           size = (char*)regions->at(len - 1).end() - start;
>>>>>>> 541         }
>>>>>>> 542       }
>>>>>>>
>>>>>>> The rest of filemap.cpp always perform identical operations on MemRegion arrays, which are either 1 or 2 in size. However, this function doesn't follow that pattern; it also has a very different notion of "region", and the confusing part is regions->size() is not the same as num_regions.
>>>>>>>
>>>>>>> How about we change the API to something like the following? Before calling this API, the caller needs to coalesce the trailing G1 regions into a single MemRegion.
>>>>>>>
>>>>>>>     FileMapInfo::write_archive_heap_regions(MemRegion *regions, int first, int num_regions) {
>>>>>>>        if (first == MetaspaceShared::first_string) {
>>>>>>>           assert(num_regons <=  MetaspaceShared::max_strings, "...");
>>>>>>>        } else {
>>>>>>>           assert(first == MetaspaceShared::first_open_archive_heap_region, "...");
>>>>>>>           assert(num_regons <= MetaspaceShared::max_open_archive_heap_region, "...");
>>>>>>> }
>>>>>>>        ....
>>>>>>>
>>>>>>>
>>>>>>
>>>>>> I’ve reworked the function and simplified the code.
>>>>>>
>>>>>>>
>>>>>>> 756   if (!string_data_mapped) {
>>>>>>> 757     StringTable::ignore_shared_strings(true);
>>>>>>> 758     assert(string_ranges == NULL && num_string_ranges == 0, "sanity");
>>>>>>> 759   }
>>>>>>> 760
>>>>>>> 761   if (open_archive_heap_data_mapped) {
>>>>>>> 762     MetaspaceShared::set_open_archive_heap_region_mapped();
>>>>>>> 763   } else {
>>>>>>> 764     assert(open_archive_heap_ranges == NULL && num_open_archive_heap_ranges == 0, "sanity");
>>>>>>> 765   }
>>>>>>>
>>>>>>> Maybe the two "if" statements should be more consistent? Instead of StringTable::ignore_shared_strings, how about StringTable::set_shared_strings_region_mapped()?
>>>>>>
>>>>>> Fixed.
>>>>>>
>>>>>>>
>>>>>>> FileMapInfo::map_heap_data() --
>>>>>>>
>>>>>>> 818     char* addr = (char*)regions[i].start();
>>>>>>> 819     char* base = os::map_memory(_fd, _full_path, si->_file_offset,
>>>>>>> 820                                 addr, regions[i].byte_size(), si->_read_only,
>>>>>>> 821                                 si->_allow_exec);
>>>>>>>
>>>>>>> What happens when the first region succeeds to map but the second region fails to map? Will both regions be unmapped? I don't see where you store the return value (base) from os::map_memory(). Does it mean the code assumes that (addr == base). If so, we need an assert here.
>>>>>>
>>>>>> If any of the region fails to map, we bail out and call dealloc_archive_heap_regions(), which handles the deallocation of any regions specified. If second region fails to map, all memory ranges specified by ‘regions’ array are deallocated. We don’t unmap the memory here since it is part of the java heap. Unmapping of heap memory are handled by GC code. The ‘if’ check below makes sure base == addr.
>>>>>>
>>>>>>     if (base == NULL || base != addr) {
>>>>>>       // dealloc the regions from java heap
>>>>>>       dealloc_archive_heap_regions(regions, region_num);
>>>>>>       if (log_is_enabled(Info, cds)) {
>>>>>>         log_info(cds)("UseSharedSpaces: Unable to map at required address in java heap.");
>>>>>>       }
>>>>>>       return false;
>>>>>>     }
>>>>>>
>>>>>>
>>>>>>>
>>>>>>> constantPool.cpp
>>>>>>>
>>>>>>>     Handle refs_handle;
>>>>>>>     ...
>>>>>>>     refs_handle = Handle(THREAD, (oop)archived);
>>>>>>>
>>>>>>> This will first create a NULL handle, then construct a temporary handle, and then assign the temp handle back to the null handle. This means two handles will be pushed onto THREAD->metadata_handles()
>>>>>>>
>>>>>>> I think it's more efficient if you merge these into a single statement
>>>>>>>
>>>>>>>     Handle refs_handle(THREAD, (oop)archived);
>>>>>>
>>>>>> Fixed.
>>>>>>
>>>>>>>
>>>>>>> Is this experimental code? Maybe it should be removed?
>>>>>>>
>>>>>>> 664     if (tag_at(index).is_unresolved_klass()) {
>>>>>>> 665 #if 0
>>>>>>> 666       CPSlot entry = cp->slot_at(index);
>>>>>>> 667       Symbol* name = entry.get_symbol();
>>>>>>> 668       Klass* k = SystemDictionary::find(name, NULL, NULL, THREAD);
>>>>>>> 669       if (k != NULL) {
>>>>>>> 670         klass_at_put(index, k);
>>>>>>> 671       }
>>>>>>> 672 #endif
>>>>>>> 673     } else
>>>>>>
>>>>>> Removed.
>>>>>>
>>>>>>>
>>>>>>> cpCache.hpp:
>>>>>>>
>>>>>>>     u8   _archived_references
>>>>>>>
>>>>>>> shouldn't this be declared as an narrowOop to avoid the type casts when it's used?
>>>>>>
>>>>>> Ok.
>>>>>>
>>>>>>>
>>>>>>> cpCache.cpp:
>>>>>>>
>>>>>>>    add assert so that one of these is used only at dump time and the other only at run time?
>>>>>>>
>>>>>>> 610 oop ConstantPoolCache::archived_references() {
>>>>>>> 611   return oopDesc::decode_heap_oop((narrowOop)_archived_references);
>>>>>>> 612 }
>>>>>>> 613
>>>>>>> 614 void ConstantPoolCache::set_archived_references(oop o) {
>>>>>>> 615   _archived_references = (u8)oopDesc::encode_heap_oop(o);
>>>>>>> 616 }
>>>>>>
>>>>>> Ok.
>>>>>>
>>>>>> Thanks!
>>>>>>
>>>>>> Jiangli
>>>>>>
>>>>>>>
>>>>>>> Thanks!
>>>>>>> - Ioi
>>>>>>>
>>>>>>> On 7/27/17 1:37 PM, Jiangli Zhou wrote:
>>>>>>>> Sorry, the mail didn’t handle the rich text well. I fixed the format below.
>>>>>>>>
>>>>>>>> Please help review the changes for JDK-8179302 (Pre-resolve constant pool string entries and cache resolved_reference arrays in CDS archive). Currently, the CDS archive can contain cached class metadata, interned java.lang.String objects. This RFE adds the constant pool ‘resolved_references’ arrays (hotspot specific) to the archive for startup/runtime performance enhancement. The ‘resolved_references' arrays are used to hold references of resolved constant pool entries including Strings, mirrors, etc. With the 'resolved_references’ being cached, string constants in shared classes can now be resolved to existing interned java.lang.Strings at CDS dump time. G1 and 64-bit platforms are required.
>>>>>>>>
>>>>>>>> The GC changes in the RFE were discussed and guided by Thomas Schatzl and GC team. Part of the changes were contributed by Thomas himself.
>>>>>>>> RFE:        https://bugs.openjdk.java.net/browse/JDK-8179302 <https://bugs.openjdk.java.net/browse/JDK-8179302>
>>>>>>>> hotspot:   http://cr.openjdk.java.net/~jiangli/8179302/webrev.hotspot.01/ <http://cr.openjdk.java.net/%7Ejiangli/8179302/webrev.hotspot.01/>
>>>>>>>> whitebox: http://cr.openjdk.java.net/~jiangli/8179302/webrev.whitebox.01/ <http://cr.openjdk.java.net/%7Ejiangli/8179302/webrev.whitebox.01/>
>>>>>>>>
>>>>>>>> Please see below for details of supporting cached ‘resolved_references’ and pre-resolving string constants.
>>>>>>>>
>>>>>>>> Types of Pinned G1 Heap Regions
>>>>>>>>
>>>>>>>> The pinned region type is a super type of all archive region types, which include the open archive type and the closed archive type.
>>>>>>>>
>>>>>>>> 00100 0 [ 8] Pinned Mask
>>>>>>>> 01000 0 [16] Old Mask
>>>>>>>> 10000 0 [32] Archive Mask
>>>>>>>> 11100 0 [56] Open Archive:   ArchiveMask | PinnedMask | OldMask
>>>>>>>> 11100 1 [57] Closed Archive: ArchiveMask | PinnedMask | OldMask + 1
>>>>>>>>
>>>>>>>>
>>>>>>>> Pinned Regions
>>>>>>>>
>>>>>>>> Objects within the region are 'pinned', which means GC does not move any live objects. GC scans and marks objects in the pinned region as normal, but skips forwarding live objects. Pointers in live objects are updated. Dead objects (unreachable) can be collected and freed.
>>>>>>>>
>>>>>>>> Archive Regions
>>>>>>>>
>>>>>>>> The archive types are sub-types of 'pinned'. There are two types of archive region currently, open archive and closed archive. Both can support caching java heap objects via the CDS archive.
>>>>>>>>
>>>>>>>> An archive region is also an old region by design.
>>>>>>>>
>>>>>>>> Open Archive (GC-RW) Regions
>>>>>>>>
>>>>>>>> Open archive region is GC writable. GC scans & marks objects within the region and adjusts (updates) pointers in live objects the same way as a pinned region. Live objects (reachable) are pinned and not forwarded by GC.
>>>>>>>> Open archive region does not have 'dead' objects. Unreachable objects are 'dormant' objects. Dormant objects are not collected and freed by GC.
>>>>>>>>
>>>>>>>> Adjustable Outgoing Pointers
>>>>>>>>
>>>>>>>> As GC can adjust pointers within the live objects in open archive heap region, objects can have outgoing pointers to another java heap region, including closed archive region, open archive region, pinned (or humongous) region, and normal generational region. When a referenced object is moved by GC, the pointer within the open archive region is updated accordingly.
>>>>>>>>
>>>>>>>> Closed Archive (GC-RO) Regions
>>>>>>>>
>>>>>>>> The closed archive region is GC read-only region. GC cannot write into the region. Objects are not scanned and marked by GC. Objects are pinned and not forwarded. Pointers are not updated by GC either. Hence, objects within the archive region cannot have any outgoing pointers to another java heap region. Objects however can still have pointers to other objects within the closed archive regions (we might allow pointers to open archive regions in the future). That restricts the type of java objects that can be supported by the archive region.
>>>>>>>> In JDK 9 we support archive Strings with the archive regions.
>>>>>>>>
>>>>>>>> The GC-readonly archive region makes java heap memory sharable among different JVM processes. NOTE: synchronization on the objects within the archive heap region can still cause writes to the memory page.
>>>>>>>>
>>>>>>>> Dormant Objects
>>>>>>>>
>>>>>>>> Dormant objects are unreachable java objects within the open archive heap region.
>>>>>>>> A java object in the open archive heap region is a live object if it can be reached during scanning. Some of the java objects in the region may not be reachable during scanning. Those objects are considered as dormant, but not dead. For example, a constant pool 'resolved_references' array is reachable via the klass root if its container klass (shared) is already loaded at the time during GC scanning. If a shared klass is not yet loaded, the klass root is not scanned and it's constant pool 'resolved_reference' array (A) in the open archive region is not reachable. Then A is a dormant object.
>>>>>>>>
>>>>>>>> Object State Transition
>>>>>>>>
>>>>>>>> All java objects are initially dormant objects when open archive heap regions are mapped to the runtime java heap. A dormant object becomes live object when the associated shared class is loaded at runtime. Explicit call to G1SATBCardTableModRefBS::enqueue() needs to be made when a dormant object becomes live. That should be the case for cached objects with strong roots as well, since strong roots are only scanned at the start of GC marking (the initial marking) but not during Remarking/Final marking. If a cached object becomes live during concurrent marking phase, G1 may not find it and mark it live unless a call to G1SATBCardTableModRefBS::enqueue() is made for the object.
>>>>>>>>
>>>>>>>> Currently, a live object in the open archive heap region cannot become dormant again. This restriction simplifies GC requirement and guarantees all outgoing pointers are updated by GC correctly. Only objects for shared classes from the builtin class loaders (boot, PlatformClassLoaders, and AppClassLoaders) are supported for caching.
>>>>>>>>
>>>>>>>> Caching Java Objects at Archive Dump Time
>>>>>>>>
>>>>>>>> The closed archive and open archive regions are allocated near the top of the dump time java heap. Archived java objects are copied into the designated archive heap regions. For example, String objects and the underlying 'value' arrays are copied into the closed archive regions. All references to the archived objects (from shared class metadata, string table, etc) are set to the new heap locations. A hash table is used to keep track of all archived java objects during the copying process to make sure java object is not archived more than once if reached from different roots. It also makes sure references to the same archived object are updated using the same new address location.
>>>>>>>>
>>>>>>>> Caching Constant Pool resolved_references Array
>>>>>>>>
>>>>>>>> The 'resolved_references' is an array that holds references of resolved constant pool entries including Strings, mirrors and methodTypes, etc. Each loaded class has one 'resolved_references' array (in ConstantPoolCache). The 'resolved_references' arrays are copied into the open archive regions during dump process. Prior to copying the 'resolved_references' arrays, JVM iterates through constant pool entries and resolves all JVM_CONSTANT_String entries to existing interned Strings for all archived classes. When resolving, JVM only looks up the string table and finds existing interned Strings without inserting new ones. If a string entry cannot be resolved to an existing interned String, the constant pool entry remain as unresolved. That prevents memory waste if a constant pool string entry is never used at runtime.
>>>>>>>>
>>>>>>>> All String objects referenced by the string table are copied first into the closed archive regions. The string table entry is updated with the new location when each String object is archived. The JVM updates the resolved constant pool string entries with the new object locations when copying the 'resolved_references' arrays to the open archive regions. References to the 'resolved_references' arrays in the ConstantPoolCache are also updated.
>>>>>>>> At runtime as part of ConstantPool::restore_unshareable_info() work, call G1SATBCardTableModRefBS::enqueue() to let GC know the 'resolved_references' is becoming live. A handle is created for the cached object and added to the loader_data's handles.
>>>>>>>>
>>>>>>>> Runtime Java Heap With Cached Java Objects
>>>>>>>>
>>>>>>>>
>>>>>>>> The closed archive regions (the string regions) and open archive regions are mapped to the runtime java heap at the same offsets as the dump time offsets from the runtime java heap base.
>>>>>>>>
>>>>>>>> Preliminary test execution and status:
>>>>>>>>
>>>>>>>> JPRT: passed
>>>>>>>> Tier2-rt: passed
>>>>>>>> Tier2-gc: passed
>>>>>>>> Tier2-comp: passed
>>>>>>>> Tier3-rt: passed
>>>>>>>> Tier3-gc: passed
>>>>>>>> Tier3-comp: passed
>>>>>>>> Tier4-rt: passed
>>>>>>>> Tier4-gc: passed
>>>>>>>> Tier4-comp:6 jobs timed out, all other tests passed
>>>>>>>> Tier5-rt: one test failed but passed when running locally, all other tests passed
>>>>>>>> Tier5-gc: passed
>>>>>>>> Tier5-comp: running
>>>>>>>> hotspot_gc: two jobs timed out, all other tests passed
>>>>>>>> hotspot_gc in CDS mode: two jobs timed out, all other tests passed
>>>>>>>> vm.gc: passed
>>>>>>>> vm.gc in CDS mode: passed
>>>>>>>> Kichensink: passed
>>>>>>>> Kichensink in CDS mode: passed
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> Jiangli
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: RFR: 8179302: Pre-resolve constant pool string entries and cache resolved_reference arrays in CDS archive

Thomas Schatzl
In reply to this post by Jiangli Zhou
Hi Jiangli,

  going to look at the new webrev in the other email again, answering
some questions here.

Thanks for considering all these suggestions.

On Mon, 2017-08-07 at 16:39 -0700, Jiangli Zhou wrote:

> Hi Thomas,
>
> Thanks a lot for the review!
>
> > On Aug 7, 2017, at 7:52 AM, Thomas Schatzl <[hidden email]
> > om> wrote:
> >
> > Hi,
> >
> > On Thu, 2017-08-03 at 17:15 -0700, Jiangli Zhou wrote:
> > > Here are the updated webrevs.
> > >
> > > http://cr.openjdk.java.net/~jiangli/8179302/webrev.hotspot.02/
> > > http://cr.openjdk.java.net/~jiangli/8179302/webrev.whitebox.02/
> > >
> > > Changes in the updated webrevs include:
> > > Merge with Ioi’s recent shared space auto-sizing change (8072061)
> > > Addressed all feedbacks from Ioi and Coleen (Thanks for detailed
> > > review!)
> > - the comment in g1Allocator.hpp:326 needs to be updated. I would
> > merge the information from the _open member here. I.e. define what
> > "open" and "closed" archive are.
> Good catch. I updated the comments as the following:
>
> // G1ArchiveAllocator is used to allocate memory in archive
> // regions. Such regions are not scavenged nor compacted by GC.
> // There are two types of archive regions, which are
> // differ in the kind of references allowed for the contained
> // objects:
> //
> // - 'Closed' archive region contain no references outside of archive

better: other "closed" archive

> //   regions. The region is immutable by GC. GC does not mark object
> //   header in 'closed' archive region. 
> // - An 'open' archive region may contain references pointing to
> //   non-archive heap region. GC can adjust pointers and mark object
> //   header in 'open' archive region.
>
[...]
> > - g1CollectedHeap.cpp:750 + 756: maybe make a static method to
> > avoid repetition here.
> I changed the code to be following. New static function is a little
> overkill since the usage is very limited. :-)

Okay :)
> > - G1NoteEndOfConcMarkClosure::doHeapRegion(): would it be too much
> > work to make an extra CR out of this change? It is a change that
> > fixes an existing bug unrelated to this change after all (not doing
> > remembered set cleanup work for archive regions).
> Using separate CR to track this sounds good. I just created JDK-
> 8185924. Since we have been testing the fix with other changes
> together, I'll integrate them together with both CRs.

Since you assigned this to me, do you want me to post the RFR?

> >
> > - g1HeapVerifier.cpp: there is a verbose flag passed around. Not
> > sure if it should be kept, as it seems to be some code that has
> > been used for debugging this feature, but can't be activated anyway
> > without code changes.
> Removed.

Thanks.

> > - heapRegion.inline.hpp:125: I think the existing code of faking
> > open archive regions as all-live does not work as implemented.
> > Consider the case when a new object in there is made live, and
> > references in there set to refer to some object outside this
> > region, and is the only reference (and it has not been marked live
> > yet): if there is a remembered set entry to that, and it is about
> > to be scanned.
> >
> > The current implementation of HeapRegion::is_obj_dead() will
> > consider it dead, so we will enter the code path at line 125.
> > Block_size_using bitmap() will jump over that object, but the
> > return values of is_obj_dead_with_size() method will indicate the
> > caller to not iterate over this object anyway, potentially missing
> > that reference.
> >
> > HeapRegion::is_obj_dead() needs to return that the object is not
> > dead for open archive regions. I think for now the safest way is to
> > add !is_open_archive() to the condition calculated there. That will
> > obviate the need for that existing hack to the assert too.
> >
> > It may have some perf impact though - actually recently there has
> > been some effort to remove that is_archive() check from that code
> > (the one that is now the is_closed_archive() assert). I do not see
> > an easy way to fix this. :( (i.e. there is likely no perf impact
> > vs. jdk9 so it's not that bad)
> >
> > This suggestion also only works with the assumption laid out in the
> > CR that there is no way that a live object can not become dormant
> > again, and the objects in the open archive regions are always
> > parsable (never contain junk data).
> Thank you for the analysis. I changed HeapRegion::is_obj_dead() with
> added !is_open_archive() condition as you suggested. I’m glad to get
> rid of the is_open_archive() change from is_obj_dead_with_size()
> assert.
>
> Thinking more about the case you described above, when an object (A)
> in the open archive just becomes live, there would be no reference to
> any other non-archive region at the moment. The object A only
> contains references to the archive (open or closed) regions
> initially. Scanning has no issue at the moment. When a new object (B)
> is allocated in the java heap and the reference is set in A. B is
> considered live, scanning would update the A->B reference
> accordingly. Is that correct?

  that is exactly the case this suggested change covers. Otherwise the
not-yet-marked object would be skipped, and the reference not updated.

As mentioned I will look again at the new webrev.

Thomas
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: RFR: 8179302: Pre-resolve constant pool string entries and cache resolved_reference arrays in CDS archive

Thomas Schatzl
In reply to this post by Jiangli Zhou
Hi,

On Tue, 2017-08-08 at 17:33 -0700, Jiangli Zhou wrote:
> Here is the incremental webrev that has all the changes incorporated
> with suggestions from Coleen, Ioi and Thomas:
>
> http://cr.openjdk.java.net/~jiangli/8179302/webrev.hotspot.03.inc/
>
> Updated full webrev: http://cr.openjdk.java.net/~jiangli/8179302/webr
> ev.hotspot.03/
>

 - (just repeating) g1Allocator.hpp:328:  "Closed" archive region
contain no references outside of archive ... -> outside of other closed
archive regions.
As for the description of open archive regions, it may be useful to
allow references to any other regions (afaik there is no restriction on
pointing to other open archive regions, and none to closed).

 - g1Allocator.inline.hpp:68 and 71: argument alignment.

 - filemap.cpp:473: missing space before the opening bracket

 - filemap.cpp:475: s/consqusective/consecutive

 - g1HeapVerifier.cpp:252: comment should say "Should be closed archive
region"

 - filemap.cpp:675: this is just a note about the big comment. It
mentions "archive string regions" and "string regions" (also seen
"metaspace string region") which may or may not be defined elsewhere.
Maybe it is useful to consolidate these terms.

The text also mentions "regions", which is a G1 specific term. Not sure
if there is a better way to define these areas in this context.

The comment in metaspaceShared.cpp seems to carefully avoid this by
using "shraed strings" and "shared archive heap space" (i.e. no mention
of "regions") most of the time. Sometimes "regions" is also used as
"area". 

Consolidating this may avoid confusion between "G1 regions" and region
as "area" and help a reader.

Or maybe I'm just reading too much into this :)

I have no particular opinion on whether you want to change anything
here.

 - FileMapInfo::map_heap_data(): there are quite a few occurrances of
this construct:

 761     if (log_is_enabled(Info, cds)) {
 762       log_info(cds)("UseSharedSpaces: Unable to allocate region, "

Afaik the log_info() call already only prints out data if
log_is_enabled(Info, cds), so it seems superfluous. The use of
log_is_enabled() call seems only useful if the block inside contains
some expensive computation, which does not seem the case here.

Sorry for bringing up these mostly cosmetic issues that late. I do not
need a re-review for the comment changes (and removal of the
log_is_enabled()).

Thanks,
  Thomas
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: RFR: 8179302: Pre-resolve constant pool string entries and cache resolved_reference arrays in CDS archive

Jiangli Zhou
Hi Thomas,

Thank you so much for looking at the update in great detail! All points were taken and fixed as suggested.

> On Aug 9, 2017, at 8:12 AM, Thomas Schatzl <[hidden email]> wrote:
>
> Hi,
>
> On Tue, 2017-08-08 at 17:33 -0700, Jiangli Zhou wrote:
>> Here is the incremental webrev that has all the changes incorporated
>> with suggestions from Coleen, Ioi and Thomas:
>>
>> http://cr.openjdk.java.net/~jiangli/8179302/webrev.hotspot.03.inc/
>>
>> Updated full webrev: http://cr.openjdk.java.net/~jiangli/8179302/webr
>> ev.hotspot.03/
>>
>
>  - (just repeating) g1Allocator.hpp:328:  "Closed" archive region
> contain no references outside of archive ... -> outside of other closed
> archive regions.
> As for the description of open archive regions, it may be useful to
> allow references to any other regions (afaik there is no restriction on
> pointing to other open archive regions, and none to closed).

Fixed.

>
>  - g1Allocator.inline.hpp:68 and 71: argument alignment.

Fixed.

>
>  - filemap.cpp:473: missing space before the opening bracket

Fixed.

>
>  - filemap.cpp:475: s/consqusective/consecutive

Fixed.

>
>  - g1HeapVerifier.cpp:252: comment should say "Should be closed archive
> region”

Fixed.

>
>  - filemap.cpp:675: this is just a note about the big comment. It
> mentions "archive string regions" and "string regions" (also seen
> "metaspace string region") which may or may not be defined elsewhere.
> Maybe it is useful to consolidate these terms.
>
> The text also mentions "regions", which is a G1 specific term. Not sure
> if there is a better way to define these areas in this context.
>
> The comment in metaspaceShared.cpp seems to carefully avoid this by
> using "shraed strings" and "shared archive heap space" (i.e. no mention
> of "regions") most of the time. Sometimes "regions" is also used as
> "area".
>
> Consolidating this may avoid confusion between "G1 regions" and region
> as "area" and help a reader.
>
> Or maybe I'm just reading too much into this :)
>
> I have no particular opinion on whether you want to change anything
> here.

I changed the comments and replaced with shared string object and open archive heap objects (or open archive heap data).

>
>  - FileMapInfo::map_heap_data(): there are quite a few occurrances of
> this construct:
>
>  761     if (log_is_enabled(Info, cds)) {
>  762       log_info(cds)("UseSharedSpaces: Unable to allocate region, "
>
> Afaik the log_info() call already only prints out data if
> log_is_enabled(Info, cds), so it seems superfluous. The use of
> log_is_enabled() call seems only useful if the block inside contains
> some expensive computation, which does not seem the case here.

Removed those unnecessary log_is_enabled(Info, cds) checks.

>
> Sorry for bringing up these mostly cosmetic issues that late. I do not
> need a re-review for the comment changes (and removal of the
> log_is_enabled()).

Thanks!

Jiangli
>
> Thanks,
>   Thomas

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: RFR: 8179302: Pre-resolve constant pool string entries and cache resolved_reference arrays in CDS archive

Ioi Lam
In reply to this post by Jiangli Zhou
Hi Jiangli,


The changes look good to me. Thanks for considering my suggestions.


- Ioi


On 8/8/17 5:33 PM, Jiangli Zhou wrote:

> Here is the incremental webrev that has all the changes incorporated
> with suggestions from Coleen, Ioi and Thomas:
>
> http://cr.openjdk.java.net/~jiangli/8179302/webrev.hotspot.03.inc/ 
> <http://cr.openjdk.java.net/%7Ejiangli/8179302/webrev.hotspot.03.inc/>
>
> Updated full webrev:
> http://cr.openjdk.java.net/~jiangli/8179302/webrev.hotspot.03/ 
> <http://cr.openjdk.java.net/%7Ejiangli/8179302/webrev.hotspot.03/>
>
> Thanks again for Coleen's, Ioi's and Thomas’ review!
> Jiangli
>
>> On Aug 7, 2017, at 7:57 PM, Jiangli Zhou <[hidden email]
>> <mailto:[hidden email]>> wrote:
>>
>> Hi Ioi,
>>
>> Thanks for getting back to me.
>>
>>> On Aug 7, 2017, at 5:45 PM, Ioi Lam <[hidden email]
>>> <mailto:[hidden email]>> wrote:
>>>
>>> On 8/4/17 10:19 PM, Jiangli Zhou wrote:
>>>
>>>> Hi Ioi,
>>>>
>>>> Thanks for looking again.
>>>>
>>>>> On Aug 4, 2017, at 2:22 PM, Ioi Lam <[hidden email]
>>>>> <mailto:[hidden email]>> wrote:
>>>>>
>>>>> Hi Jiangli,
>>>>>
>>>>> The code looks good in general. I just have a few pet peeves for
>>>>> readability:
>>>>>
>>>>>
>>>>> (1) stringTable.cpp and metaspaceShared.cpp have the same asserts
>>>>>
>>>>>  704 assert(UseG1GC, "Only support G1 GC");
>>>>>  705 assert(UseCompressedOops && UseCompressedClassPointers,
>>>>>  706 "Only support UseCompressedOops and
>>>>> UseCompressedClassPointers enabled");
>>>>>
>>>>> 1615 assert(UseG1GC, "Only support G1 GC");
>>>>> 1616 assert(UseCompressedOops && UseCompressedClassPointers,
>>>>> 1617 "Only support UseCompressedOops and
>>>>> UseCompressedClassPointers enabled");
>>>>>
>>>>> Maybe it's better to combine them into a single function like
>>>>> MetaspaceShared::assert_vm_flags() so they don't get out of sync?
>>>>
>>>> There is a MetaspaceShared::allow_archive_heap_object(), which
>>>> checks for UseG1GC, UseCompressedOops and
>>>> UseCompressedClassPointers combined. It does not seem to worth add
>>>> another separate API for asserting the required flags. I’ll use
>>>> that in the assert.
>>>>
>>>>>
>>>>>
>>>>>
>>>>> (2) FileMapInfo::write_archive_heap_regions()
>>>>>
>>>>> I still find this code very hard to read, especially due to the loop.
>>>>>
>>>>> First, the comments are not consistent with the code:
>>>>>
>>>>>     498 assert(arr_len <= max_num_regions, "number of memory
>>>>> regions exceeds maximum");
>>>>>
>>>>> but the comments says: "The rest are consecutive full GC regions"
>>>>> which means there's a chance for max_num_regions to be more than 2
>>>>> (which will be the case with Calvin's java-loader dumping changes
>>>>> using very small heap size).So the code is actually wrong.
>>>>
>>>> The max_num_regions is the maximum number of region for each
>>>> archived heap space (the string space, or open archive space). We
>>>> only run into the case where the MemRegion array size is larger
>>>> than max_num_regions with Calvin’s pending change. As part of
>>>> Calvin’s change, he will change the assert into a check and bail
>>>> out if the number of MemRegions are larger than max_num_regions due
>>>> to heap fragmentation.
>>>>
>>>>
>>> Your latest patch assumes that arr_len <= 2, but the implementation
>>> of G1CollectedHeap::heap()->begin_archive_alloc_range() /
>>> G1CollectedHeap::heap()->end_archive_alloc_range() actually allows
>>> more than 2 regions to returned. So simply putting an assert there
>>> seems risky (unless you have analyzed all possible scenarios to
>>> prove that's impossible).
>>>
>>> Instead of trying to come up with a complicated proof, I think it's
>>> much safer to disable the archived string region if the arr_len > 2.
>>> Also, if the string region is disabled, you should also disable the
>>> open_archive_heap_region
>>>
>>> I think this is a general issue with the mapped heap regions, and it
>>> just happens to be revealed by Calvin's patch. So we should fix it
>>> now and not wait for Calvin's patch.
>>
>> Ok. I’ll change the assert to be a check.
>>
>>>
>>>
>>>>>
>>>>> The word "region" is used in these parameters, but they don't mean
>>>>> the same thing.
>>>>>
>>>>> GrowableArray<MemRegion> *regions
>>>>> int first_region, int max_num_regions,
>>>>>
>>>>>
>>>>> How about regions      -> g1_regions_list
>>>>>           first_region -> first_region_in_archive
>>>>
>>>> The GrowableArray above is the MemRegions that GC code gives back
>>>> to us. The GC code combines multiple G1 regions. The comments
>>>> probably are over-explaining the details, which are hidden in the
>>>> GC code. Probably that’s the confusing source. I’ll make the
>>>> comment more clear.
>>>>
>>>> Using g1_regions_list would also be confusing, since
>>>> write_archive_heap_regions does not handle G1 regions directly. It
>>>> processes the MemRegion array that GC code returns. How about
>>>> changing ‘regions’ to ‘mem_regions’ or ‘archive_regions'?
>>>>
>>> How about heap_regions? These are regions in the active Java heap,
>>> which current has not mapped anything from the CDS archive.
>>
>> Ok.
>>
>> I’m updating my changes and will send out a consolidated webrev.
>>
>> Thanks!
>> Jiangli
>>
>>>
>>>
>>>>>
>>>>>
>>>>> In the comments, I find the phrase 'the current archive heap
>>>>> region' ambiguous. It could be (erroneously) interpreted as "a
>>>>> region from the currently mapped archive”
>>>>>
>>>>> To make it unambiguous, how about changing
>>>>>
>>>>>
>>>>>  464 // Write the current archive heap region, which contains one
>>>>> or multiple GC(G1) regions.
>>>>>
>>>>>
>>>>> to
>>>>>
>>>>>     // Write the given list of G1 memory regions into the archive,
>>>>> starting at
>>>>>     // first_region_in_archive.
>>>>
>>>>
>>>> Ok. How about the following:
>>>>
>>>> // Write the given list of java heap memory regions into the
>>>> archive, starting at
>>>> // first_region_in_archive.
>>>>
>>> Sounds good.
>>>
>>> Thanks
>>> - Ioi
>>>
>>>>>
>>>>>
>>>>> Also, for the explanation of how the G1 regions are written into
>>>>> the archive, how about:
>>>>>
>>>>>    // The G1 regions in the list are sorted in ascending address
>>>>> order. When there are more objects
>>>>>    // than the capacity of a single G1 region, the bottom-most G1
>>>>> region may be partially filled, and the
>>>>>    // remaining G1 region(s) are consecutively allocated and fully
>>>>> filled.
>>>>>    //
>>>>>    // Thus, the bottom-most G1 region (if not empty) is written
>>>>> into first_region_in_archive.
>>>>>    // The remaining G1 regions (if exist) are coalesced and
>>>>> written as a single block
>>>>>    // into (first_region_in_archive + 1)
>>>>>
>>>>>    // Here's the mapping from (g1 regions) -> (archive regions).
>>>>>
>>>>>
>>>>> All this function needs to do is to decide the values for
>>>>>
>>>>>      r0_start, r0_top
>>>>>      r1_start, r1_top
>>>>>
>>>>> I think it would be much better to not use the loop, and not use
>>>>> the max_num_regions parameter (it's always 2 anyway).
>>>>>
>>>>>      *r0_start = *r0_top = NULL;
>>>>>      *r1_start = *r1_top = NULL;
>>>>>
>>>>>      if (arr_len >= 1) {
>>>>> *r0_start = regions->at(0).start();
>>>>> *r0_end = *r0_start + regions->at(0).byte_size();
>>>>>      }
>>>>>      if (arr_len >= 2) {
>>>>>          int last = arr_len - 1;
>>>>> *r1_start = regions->at(1).start();
>>>>> *r1_end = regions->at(last).start() + regions->at(last).byte_size();
>>>>>       }
>>>>>
>>>>> what do you think?
>>>>
>>>> We need to write out all archive regions including the empty ones.
>>>> The loop using max_num_regions is the easiest way. I’d like to
>>>> remove the code that deals with r0_* and r1_ explicitly. Let me try
>>>> that.
>>>>
>>>>>
>>>>>
>>>>>
>>>>> (3) metaspace.cpp
>>>>>
>>>>> 3350         // Map the archived heap regions after compressed
>>>>> pointers
>>>>> 3351         // because it relies on compressed class pointers
>>>>> setting to work
>>>>>
>>>>> do you mean this?
>>>>>
>>>>>     // Archived heap regions depend on the parameters of
>>>>> compressed class pointers, so
>>>>>     // they must be mapped after such parameters have been decided
>>>>> in the above call.
>>>>
>>>> Hmmm, maybe use ‘arguments’ instead of ‘parameters’?
>>>>
>>>>>
>>>>>
>>>>> (4) I found this name not strictly grammatical. How about this:
>>>>>
>>>>>      allow_archive_heap_object -> is_heap_object_archiving_allowed
>>>>
>>>> Ok.
>>>>
>>>>>
>>>>> (5) in most of your code, 'archive' is used as a noun, except in
>>>>> StringTable::archive_string() where it's used as a verb.
>>>>>
>>>>> archive_string could also be interpreted erroneously as "return a
>>>>> string that's already in the archive". So to be consistent and
>>>>> unambiguous, I think it's better to rename it to
>>>>> StringTable::create_archived_string()
>>>>
>>>> Ok.
>>>>
>>>> Thanks,
>>>> Jiangli
>>>>
>>>>>
>>>>>
>>>>> Thanks
>>>>> - Ioi
>>>>>
>>>>>
>>>>> On 8/3/17 5:15 PM, Jiangli Zhou wrote:
>>>>>> Here are the updated webrevs.
>>>>>>
>>>>>> http://cr.openjdk.java.net/~jiangli/8179302/webrev.hotspot.02/ 
>>>>>> <http://cr.openjdk.java.net/%7Ejiangli/8179302/webrev.hotspot.02/>
>>>>>> http://cr.openjdk.java.net/~jiangli/8179302/webrev.whitebox.02/
>>>>>>
>>>>>> Changes in the updated webrevs include:
>>>>>>
>>>>>>   * Merge with Ioi’s recent shared space auto-sizing change (8072061)
>>>>>>   * Addressed all feedbacks from Ioi and Coleen (Thanks for
>>>>>>     detailed review!)
>>>>>>
>>>>>>
>>>>>> Thanks,
>>>>>> Jiangli
>>>>>>
>>>>>>
>>>>>>> On Aug 1, 2017, at 5:29 PM, Jiangli Zhou
>>>>>>> <[hidden email]> wrote:
>>>>>>>
>>>>>>> Hi Ioi,
>>>>>>>
>>>>>>> Thank you so much for reviewing this. I’ve addressed all your
>>>>>>> feedbacks. Please see details below. I’ll updated the webrev
>>>>>>> after addressing Coleen’s comments.
>>>>>>>
>>>>>>>> On Jul 30, 2017, at 9:07 PM, Ioi Lam <[hidden email]> wrote:
>>>>>>>>
>>>>>>>> Hi Jiangli,
>>>>>>>>
>>>>>>>> Here are my comments. I've not reviewed the GC code and I'll
>>>>>>>> leave that to the GC experts :-)
>>>>>>>>
>>>>>>>> stringTable.cpp: StringTable::archive_string
>>>>>>>>
>>>>>>>>    add assert for DumpSharedSpaces only
>>>>>>>
>>>>>>> Ok.
>>>>>>>
>>>>>>>>
>>>>>>>> filemap.cpp
>>>>>>>>
>>>>>>>> 525 void
>>>>>>>> FileMapInfo::write_archive_heap_regions(GrowableArray<MemRegion>
>>>>>>>> *regions,
>>>>>>>> 526  int first_region, int num_regions) {
>>>>>>>>
>>>>>>>> When I first read this function, I found it hard to follow,
>>>>>>>> especially this part that coalesces the trailing regions:
>>>>>>>>
>>>>>>>> 537         int len = regions->length();
>>>>>>>> 538         if (len > 1) {
>>>>>>>> 539           start = (char*)regions->at(1).start();
>>>>>>>> 540           size = (char*)regions->at(len - 1).end() - start;
>>>>>>>> 541         }
>>>>>>>> 542       }
>>>>>>>>
>>>>>>>> The rest of filemap.cpp always perform identical operations on
>>>>>>>> MemRegion arrays, which are either 1 or 2 in size. However,
>>>>>>>> this function doesn't follow that pattern; it also has a very
>>>>>>>> different notion of "region", and the confusing part is
>>>>>>>> regions->size() is not the same as num_regions.
>>>>>>>>
>>>>>>>> How about we change the API to something like the following?
>>>>>>>> Before calling this API, the caller needs to coalesce the
>>>>>>>> trailing G1 regions into a single MemRegion.
>>>>>>>>
>>>>>>>> FileMapInfo::write_archive_heap_regions(MemRegion *regions, int
>>>>>>>> first, int num_regions) {
>>>>>>>>        if (first == MetaspaceShared::first_string) {
>>>>>>>> assert(num_regons <=  MetaspaceShared::max_strings, "...");
>>>>>>>>        } else {
>>>>>>>>           assert(first ==
>>>>>>>> MetaspaceShared::first_open_archive_heap_region, "...");
>>>>>>>> assert(num_regons <=
>>>>>>>> MetaspaceShared::max_open_archive_heap_region, "...");
>>>>>>>> }
>>>>>>>>        ....
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>> I’ve reworked the function and simplified the code.
>>>>>>>
>>>>>>>>
>>>>>>>> 756   if (!string_data_mapped) {
>>>>>>>> 757 StringTable::ignore_shared_strings(true);
>>>>>>>> 758 assert(string_ranges == NULL && num_string_ranges == 0,
>>>>>>>> "sanity");
>>>>>>>> 759   }
>>>>>>>> 760
>>>>>>>> 761   if (open_archive_heap_data_mapped) {
>>>>>>>> 762 MetaspaceShared::set_open_archive_heap_region_mapped();
>>>>>>>> 763   } else {
>>>>>>>> 764 assert(open_archive_heap_ranges == NULL &&
>>>>>>>> num_open_archive_heap_ranges == 0, "sanity");
>>>>>>>> 765   }
>>>>>>>>
>>>>>>>> Maybe the two "if" statements should be more consistent?
>>>>>>>> Instead of StringTable::ignore_shared_strings, how
>>>>>>>> about StringTable::set_shared_strings_region_mapped()?
>>>>>>>
>>>>>>> Fixed.
>>>>>>>
>>>>>>>>
>>>>>>>> FileMapInfo::map_heap_data() --
>>>>>>>>
>>>>>>>> 818     char* addr = (char*)regions[i].start();
>>>>>>>> 819     char* base = os::map_memory(_fd, _full_path,
>>>>>>>> si->_file_offset,
>>>>>>>> 820             addr, regions[i].byte_size(), si->_read_only,
>>>>>>>> 821 si->_allow_exec);
>>>>>>>>
>>>>>>>> What happens when the first region succeeds to map but the
>>>>>>>> second region fails to map? Will both regions be unmapped? I
>>>>>>>> don't see where you store the return value (base) from
>>>>>>>> os::map_memory(). Does it mean the code assumes that (addr ==
>>>>>>>> base). If so, we need an assert here.
>>>>>>>
>>>>>>> If any of the region fails to map, we bail out and call
>>>>>>> dealloc_archive_heap_regions(), which handles the deallocation
>>>>>>> of any regions specified. If second region fails to map, all
>>>>>>> memory ranges specified by ‘regions’ array are deallocated. We
>>>>>>> don’t unmap the memory here since it is part of the java heap.
>>>>>>> Unmapping of heap memory are handled by GC code. The ‘if’ check
>>>>>>> below makes sure base == addr.
>>>>>>>
>>>>>>>     if (base == NULL || base != addr) {
>>>>>>>       // dealloc the regions from java heap
>>>>>>> dealloc_archive_heap_regions(regions, region_num);
>>>>>>>   if (log_is_enabled(Info, cds)) {
>>>>>>> log_info(cds)("UseSharedSpaces: Unable to map at required
>>>>>>> address in java heap.");
>>>>>>>       }
>>>>>>>       return false;
>>>>>>>     }
>>>>>>>
>>>>>>>
>>>>>>>>
>>>>>>>> constantPool.cpp
>>>>>>>>
>>>>>>>>     Handle refs_handle;
>>>>>>>>     ...
>>>>>>>>     refs_handle = Handle(THREAD, (oop)archived);
>>>>>>>>
>>>>>>>> This will first create a NULL handle, then construct a
>>>>>>>> temporary handle, and then assign the temp handle back to the
>>>>>>>> null handle. This means two handles will be pushed onto
>>>>>>>> THREAD->metadata_handles()
>>>>>>>>
>>>>>>>> I think it's more efficient if you merge these into a single
>>>>>>>> statement
>>>>>>>>
>>>>>>>>     Handle refs_handle(THREAD, (oop)archived);
>>>>>>>
>>>>>>> Fixed.
>>>>>>>
>>>>>>>>
>>>>>>>> Is this experimental code? Maybe it should be removed?
>>>>>>>>
>>>>>>>> 664     if (tag_at(index).is_unresolved_klass()) {
>>>>>>>> 665 #if 0
>>>>>>>> 666       CPSlot entry = cp->slot_at(index);
>>>>>>>> 667       Symbol* name = entry.get_symbol();
>>>>>>>> 668       Klass* k = SystemDictionary::find(name, NULL, NULL,
>>>>>>>> THREAD);
>>>>>>>> 669       if (k != NULL) {
>>>>>>>> 670 klass_at_put(index, k);
>>>>>>>> 671       }
>>>>>>>> 672 #endif
>>>>>>>> 673     } else
>>>>>>>
>>>>>>> Removed.
>>>>>>>
>>>>>>>>
>>>>>>>> cpCache.hpp:
>>>>>>>>
>>>>>>>>     u8 _archived_references
>>>>>>>>
>>>>>>>> shouldn't this be declared as an narrowOop to avoid the type
>>>>>>>> casts when it's used?
>>>>>>>
>>>>>>> Ok.
>>>>>>>
>>>>>>>>
>>>>>>>> cpCache.cpp:
>>>>>>>>
>>>>>>>>    add assert so that one of these is used only at dump time
>>>>>>>> and the other only at run time?
>>>>>>>>
>>>>>>>> 610 oop ConstantPoolCache::archived_references() {
>>>>>>>> 611   return
>>>>>>>> oopDesc::decode_heap_oop((narrowOop)_archived_references);
>>>>>>>> 612 }
>>>>>>>> 613
>>>>>>>> 614 void ConstantPoolCache::set_archived_references(oop o) {
>>>>>>>> 615 _archived_references = (u8)oopDesc::encode_heap_oop(o);
>>>>>>>> 616 }
>>>>>>>
>>>>>>> Ok.
>>>>>>>
>>>>>>> Thanks!
>>>>>>>
>>>>>>> Jiangli
>>>>>>>
>>>>>>>>
>>>>>>>> Thanks!
>>>>>>>> - Ioi
>>>>>>>>
>>>>>>>> On 7/27/17 1:37 PM, Jiangli Zhou wrote:
>>>>>>>>> Sorry, the mail didn’t handle the rich text well. I fixed the
>>>>>>>>> format below.
>>>>>>>>>
>>>>>>>>> Please help review the changes for JDK-8179302 (Pre-resolve
>>>>>>>>> constant pool string entries and cache resolved_reference
>>>>>>>>> arrays in CDS archive). Currently, the CDS archive can contain
>>>>>>>>> cached class metadata, interned java.lang.String objects. This
>>>>>>>>> RFE adds the constant pool ‘resolved_references’ arrays
>>>>>>>>> (hotspot specific) to the archive for startup/runtime
>>>>>>>>> performance enhancement. The ‘resolved_references' arrays are
>>>>>>>>> used to hold references of resolved constant pool entries
>>>>>>>>> including Strings, mirrors, etc. With
>>>>>>>>> the 'resolved_references’ being cached, string constants in
>>>>>>>>> shared classes can now be resolved to existing interned
>>>>>>>>> java.lang.Strings at CDS dump time. G1 and 64-bit platforms
>>>>>>>>> are required.
>>>>>>>>>
>>>>>>>>> The GC changes in the RFE were discussed and guided by Thomas
>>>>>>>>> Schatzl and GC team. Part of the changes were contributed by
>>>>>>>>> Thomas himself.
>>>>>>>>> RFE: https://bugs.openjdk.java.net/browse/JDK-8179302
>>>>>>>>> hotspot:
>>>>>>>>> http://cr.openjdk.java.net/~jiangli/8179302/webrev.hotspot.01/
>>>>>>>>> whitebox:
>>>>>>>>> http://cr.openjdk.java.net/~jiangli/8179302/webrev.whitebox.01/
>>>>>>>>>
>>>>>>>>> Please see below for details of supporting cached
>>>>>>>>> ‘resolved_references’ and pre-resolving string constants.
>>>>>>>>>
>>>>>>>>> Types of Pinned G1 Heap Regions
>>>>>>>>>
>>>>>>>>> The pinned region type is a super type of all archive region
>>>>>>>>> types, which include the open archive type and the closed
>>>>>>>>> archive type.
>>>>>>>>>
>>>>>>>>> 00100 0 [ 8] Pinned Mask
>>>>>>>>> 01000 0 [16] Old Mask
>>>>>>>>> 10000 0 [32] Archive Mask
>>>>>>>>> 11100 0 [56] Open Archive:   ArchiveMask | PinnedMask | OldMask
>>>>>>>>> 11100 1 [57] Closed Archive: ArchiveMask | PinnedMask |
>>>>>>>>> OldMask + 1
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Pinned Regions
>>>>>>>>>
>>>>>>>>> Objects within the region are 'pinned', which means GC does
>>>>>>>>> not move any live objects. GC scans and marks objects in the
>>>>>>>>> pinned region as normal, but skips forwarding live objects.
>>>>>>>>> Pointers in live objects are updated. Dead objects
>>>>>>>>> (unreachable) can be collected and freed.
>>>>>>>>>
>>>>>>>>> Archive Regions
>>>>>>>>>
>>>>>>>>> The archive types are sub-types of 'pinned'. There are two
>>>>>>>>> types of archive region currently, open archive and closed
>>>>>>>>> archive. Both can support caching java heap objects via the
>>>>>>>>> CDS archive.
>>>>>>>>>
>>>>>>>>> An archive region is also an old region by design.
>>>>>>>>>
>>>>>>>>> Open Archive (GC-RW) Regions
>>>>>>>>>
>>>>>>>>> Open archive region is GC writable. GC scans & marks objects
>>>>>>>>> within the region and adjusts (updates) pointers in live
>>>>>>>>> objects the same way as a pinned region. Live objects
>>>>>>>>> (reachable) are pinned and not forwarded by GC.
>>>>>>>>> Open archive region does not have 'dead' objects. Unreachable
>>>>>>>>> objects are 'dormant' objects. Dormant objects are not
>>>>>>>>> collected and freed by GC.
>>>>>>>>>
>>>>>>>>> Adjustable Outgoing Pointers
>>>>>>>>>
>>>>>>>>> As GC can adjust pointers within the live objects in open
>>>>>>>>> archive heap region, objects can have outgoing pointers to
>>>>>>>>> another java heap region, including closed archive region,
>>>>>>>>> open archive region, pinned (or humongous) region, and normal
>>>>>>>>> generational region. When a referenced object is moved by GC,
>>>>>>>>> the pointer within the open archive region is updated accordingly.
>>>>>>>>>
>>>>>>>>> Closed Archive (GC-RO) Regions
>>>>>>>>>
>>>>>>>>> The closed archive region is GC read-only region. GC cannot
>>>>>>>>> write into the region. Objects are not scanned and marked by
>>>>>>>>> GC. Objects are pinned and not forwarded. Pointers are not
>>>>>>>>> updated by GC either. Hence, objects within the archive region
>>>>>>>>> cannot have any outgoing pointers to another java heap region.
>>>>>>>>> Objects however can still have pointers to other objects
>>>>>>>>> within the closed archive regions (we might allow pointers to
>>>>>>>>> open archive regions in the future). That restricts the type
>>>>>>>>> of java objects that can be supported by the archive region.
>>>>>>>>> In JDK 9 we support archive Strings with the archive regions.
>>>>>>>>>
>>>>>>>>> The GC-readonly archive region makes java heap memory sharable
>>>>>>>>> among different JVM processes. NOTE: synchronization on the
>>>>>>>>> objects within the archive heap region can still cause writes
>>>>>>>>> to the memory page.
>>>>>>>>>
>>>>>>>>> Dormant Objects
>>>>>>>>>
>>>>>>>>> Dormant objects are unreachable java objects within the open
>>>>>>>>> archive heap region.
>>>>>>>>> A java object in the open archive heap region is a live object
>>>>>>>>> if it can be reached during scanning. Some of the java objects
>>>>>>>>> in the region may not be reachable during scanning. Those
>>>>>>>>> objects are considered as dormant, but not dead. For example,
>>>>>>>>> a constant pool 'resolved_references' array is reachable via
>>>>>>>>> the klass root if its container klass (shared) is already
>>>>>>>>> loaded at the time during GC scanning. If a shared klass is
>>>>>>>>> not yet loaded, the klass root is not scanned and it's
>>>>>>>>> constant pool 'resolved_reference' array (A) in the open
>>>>>>>>> archive region is not reachable. Then A is a dormant object.
>>>>>>>>>
>>>>>>>>> Object State Transition
>>>>>>>>>
>>>>>>>>> All java objects are initially dormant objects when open
>>>>>>>>> archive heap regions are mapped to the runtime java heap. A
>>>>>>>>> dormant object becomes live object when the associated shared
>>>>>>>>> class is loaded at runtime. Explicit call
>>>>>>>>> to G1SATBCardTableModRefBS::enqueue() needs to be made when a
>>>>>>>>> dormant object becomes live. That should be the case
>>>>>>>>> for cached objects with strong roots as well, since strong
>>>>>>>>> roots are only scanned at the start of GC marking (the initial
>>>>>>>>> marking) but not during Remarking/Final marking. If a cached
>>>>>>>>> object becomes live during concurrent marking phase, G1 may
>>>>>>>>> not find it and mark it live unless a call to
>>>>>>>>> G1SATBCardTableModRefBS::enqueue() is made for the object.
>>>>>>>>>
>>>>>>>>> Currently, a live object in the open archive heap region
>>>>>>>>> cannot become dormant again. This restriction simplifies GC
>>>>>>>>> requirement and guarantees all outgoing pointers are updated
>>>>>>>>> by GC correctly. Only objects for shared classes from the
>>>>>>>>> builtin class loaders (boot, PlatformClassLoaders, and
>>>>>>>>> AppClassLoaders) are supported for caching.
>>>>>>>>>
>>>>>>>>> Caching Java Objects at Archive Dump Time
>>>>>>>>>
>>>>>>>>> The closed archive and open archive regions are allocated near
>>>>>>>>> the top of the dump time java heap. Archived java objects
>>>>>>>>> are copied into the designated archive heap regions. For
>>>>>>>>> example, String objects and the underlying 'value' arrays are
>>>>>>>>> copied into the closed archive regions. All references to the
>>>>>>>>> archived objects (from shared class metadata, string table,
>>>>>>>>> etc) are set to the new heap locations. A hash table is used
>>>>>>>>> to keep track of all archived java objects during the copying
>>>>>>>>> process to make sure java object is not archived more than
>>>>>>>>> once if reached from different roots. It also makes sure
>>>>>>>>> references to the same archived object are updated using the
>>>>>>>>> same new address location.
>>>>>>>>>
>>>>>>>>> Caching Constant Pool resolved_references Array
>>>>>>>>>
>>>>>>>>> The 'resolved_references' is an array that holds references of
>>>>>>>>> resolved constant pool entries including Strings, mirrors
>>>>>>>>> and methodTypes, etc. Each loaded class has one
>>>>>>>>> 'resolved_references' array (in ConstantPoolCache). The
>>>>>>>>> 'resolved_references' arrays are copied into the open archive
>>>>>>>>> regions during dump process. Prior to copying the
>>>>>>>>> 'resolved_references' arrays, JVM iterates through constant
>>>>>>>>> pool entries and resolves all JVM_CONSTANT_String entries to
>>>>>>>>> existing interned Strings for all archived classes. When
>>>>>>>>> resolving, JVM only looks up the string table and finds
>>>>>>>>> existing interned Strings without inserting new ones. If
>>>>>>>>> a string entry cannot be resolved to an existing interned
>>>>>>>>> String, the constant pool entry remain as unresolved. That
>>>>>>>>> prevents memory waste if a constant pool string entry is never
>>>>>>>>> used at runtime.
>>>>>>>>>
>>>>>>>>> All String objects referenced by the string table are copied
>>>>>>>>> first into the closed archive regions. The string table entry
>>>>>>>>> is updated with the new location when each String object is
>>>>>>>>> archived. The JVM updates the resolved constant pool string
>>>>>>>>> entries with the new object locations when copying the
>>>>>>>>> 'resolved_references' arrays to the open archive regions.
>>>>>>>>> References to the 'resolved_references' arrays in the
>>>>>>>>> ConstantPoolCache are also updated.
>>>>>>>>> At runtime as part of ConstantPool::restore_unshareable_info()
>>>>>>>>> work, call G1SATBCardTableModRefBS::enqueue() to let GC
>>>>>>>>> know the 'resolved_references' is becoming live. A handle is
>>>>>>>>> created for the cached object and added to the loader_data's
>>>>>>>>> handles.
>>>>>>>>>
>>>>>>>>> Runtime Java Heap With Cached Java Objects
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> The closed archive regions (the string regions) and open
>>>>>>>>> archive regions are mapped to the runtime java heap at the
>>>>>>>>> same offsets as the dump time offsets from the runtime java
>>>>>>>>> heap base.
>>>>>>>>>
>>>>>>>>> Preliminary test execution and status:
>>>>>>>>>
>>>>>>>>> JPRT: passed
>>>>>>>>> Tier2-rt: passed
>>>>>>>>> Tier2-gc: passed
>>>>>>>>> Tier2-comp: passed
>>>>>>>>> Tier3-rt: passed
>>>>>>>>> Tier3-gc: passed
>>>>>>>>> Tier3-comp: passed
>>>>>>>>> Tier4-rt: passed
>>>>>>>>> Tier4-gc: passed
>>>>>>>>> Tier4-comp:6 jobs timed out, all other tests passed
>>>>>>>>> Tier5-rt: one test failed but passed when running locally, all
>>>>>>>>> other tests passed
>>>>>>>>> Tier5-gc: passed
>>>>>>>>> Tier5-comp: running
>>>>>>>>> hotspot_gc: two jobs timed out, all other tests passed
>>>>>>>>> hotspot_gc in CDS mode: two jobs timed out, all other tests passed
>>>>>>>>> vm.gc: passed
>>>>>>>>> vm.gc in CDS mode: passed
>>>>>>>>> Kichensink: passed
>>>>>>>>> Kichensink in CDS mode: passed
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>> Jiangli
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: RFR: 8179302: Pre-resolve constant pool string entries and cache resolved_reference arrays in CDS archive

Jiangli Zhou
Thanks, Ioi!

Jiangli

> On Aug 10, 2017, at 2:15 PM, Ioi Lam <[hidden email]> wrote:
>
> Hi Jiangli,
>
>
> The changes look good to me. Thanks for considering my suggestions.
>
>
> - Ioi
>
> On 8/8/17 5:33 PM, Jiangli Zhou wrote:
>> Here is the incremental webrev that has all the changes incorporated with suggestions from Coleen, Ioi and Thomas:
>>
>> http://cr.openjdk.java.net/~jiangli/8179302/webrev.hotspot.03.inc/ <http://cr.openjdk.java.net/%7Ejiangli/8179302/webrev.hotspot.03.inc/>
>>
>> Updated full webrev: http://cr.openjdk.java.net/~jiangli/8179302/webrev.hotspot.03/ <http://cr.openjdk.java.net/%7Ejiangli/8179302/webrev.hotspot.03/>
>>
>> Thanks again for Coleen's, Ioi's and Thomas’ review!
>> Jiangli
>>
>>> On Aug 7, 2017, at 7:57 PM, Jiangli Zhou <[hidden email] <mailto:[hidden email]>> wrote:
>>>
>>> Hi Ioi,
>>>
>>> Thanks for getting back to me.
>>>
>>>> On Aug 7, 2017, at 5:45 PM, Ioi Lam <[hidden email] <mailto:[hidden email]>> wrote:
>>>>
>>>> On 8/4/17 10:19 PM, Jiangli Zhou wrote:
>>>>> Hi Ioi,
>>>>>
>>>>> Thanks for looking again.
>>>>>
>>>>>> On Aug 4, 2017, at 2:22 PM, Ioi Lam <[hidden email] <mailto:[hidden email]>> wrote:
>>>>>>
>>>>>> Hi Jiangli,
>>>>>>
>>>>>> The code looks good in general. I just have a few pet peeves for readability:
>>>>>>
>>>>>>
>>>>>> (1) stringTable.cpp and metaspaceShared.cpp have the same asserts
>>>>>>
>>>>>>  704   assert(UseG1GC, "Only support G1 GC");
>>>>>>  705   assert(UseCompressedOops && UseCompressedClassPointers,
>>>>>>  706          "Only support UseCompressedOops and UseCompressedClassPointers enabled");
>>>>>>
>>>>>> 1615   assert(UseG1GC, "Only support G1 GC");
>>>>>> 1616   assert(UseCompressedOops && UseCompressedClassPointers,
>>>>>> 1617          "Only support UseCompressedOops and UseCompressedClassPointers enabled");
>>>>>>
>>>>>> Maybe it's better to combine them into a single function like MetaspaceShared::assert_vm_flags() so they don't get out of sync?
>>>>>
>>>>> There is a MetaspaceShared::allow_archive_heap_object(), which checks for UseG1GC, UseCompressedOops and UseCompressedClassPointers combined. It does not seem to worth add another separate API for asserting the required flags. I’ll use that in the assert.
>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> (2) FileMapInfo::write_archive_heap_regions()
>>>>>>
>>>>>> I still find this code very hard to read, especially due to the loop.
>>>>>>
>>>>>> First, the comments are not consistent with the code:
>>>>>>
>>>>>>     498   assert(arr_len <= max_num_regions, "number of memory regions exceeds                                           maximum");
>>>>>>
>>>>>> but the comments says: "The rest are consecutive full GC regions" which means there's a chance for max_num_regions to be more than 2 (which will be the case                                           with Calvin's java-loader dumping changes using very small heap size). So the code is actually wrong.
>>>>>
>>>>> The max_num_regions is the maximum number of region for each archived heap space (the string space, or open archive space). We only run into the case where the MemRegion array size is larger than max_num_regions with Calvin’s pending change. As part of Calvin’s change, he will change the assert into a check and bail out if the number of MemRegions are larger than max_num_regions due to heap fragmentation.
>>>>>
>>>>>
>>>> Your latest patch assumes that arr_len <= 2, but the implementation of G1CollectedHeap::heap()->begin_archive_alloc_range() / G1CollectedHeap::heap()->end_archive_alloc_range() actually allows more than 2 regions to returned. So simply putting an assert there seems risky (unless you have analyzed all possible scenarios to prove that's impossible).
>>>>
>>>> Instead of trying to come up with a complicated proof, I think it's much safer to disable the archived string region if the arr_len > 2. Also, if the string region is disabled, you should also disable the open_archive_heap_region
>>>>
>>>> I think this is a general issue with the mapped heap regions, and it just happens to be revealed by Calvin's patch. So we should fix it now and not wait for Calvin's patch.
>>>
>>> Ok. I’ll change the assert to be a check.
>>>
>>>>
>>>>
>>>>>>
>>>>>> The word "region" is used in these parameters, but they don't mean the same thing.
>>>>>>
>>>>>>               GrowableArray<MemRegion> *regions
>>>>>>               int first_region, int max_num_regions,
>>>>>>
>>>>>>
>>>>>> How about regions      -> g1_regions_list
>>>>>>           first_region -> first_region_in_archive
>>>>>
>>>>> The GrowableArray above is the MemRegions that GC code gives back to us. The GC code combines multiple G1 regions. The comments probably are over-explaining the details, which are hidden in the GC code. Probably that’s the confusing source. I’ll make the comment more clear.
>>>>>
>>>>> Using g1_regions_list would also be confusing, since write_archive_heap_regions does not handle G1 regions directly. It processes the MemRegion array that GC code returns. How about changing ‘regions’ to ‘mem_regions’ or ‘archive_regions'?
>>>>>
>>>> How about heap_regions? These are regions in the active Java heap, which current has not mapped anything from the CDS archive.
>>>
>>> Ok.
>>>
>>> I’m updating my changes and will send out a consolidated webrev.
>>>
>>> Thanks!
>>> Jiangli
>>>
>>>>
>>>>
>>>>>>
>>>>>>
>>>>>> In the comments, I find the phrase 'the current archive heap region' ambiguous. It could be (erroneously) interpreted as "a region from the currently mapped archive”
>>>>>
>>>>>>
>>>>>> To make it unambiguous, how about changing
>>>>>>
>>>>>>
>>>>>>  464 // Write the current archive heap region, which contains one or multiple GC(G1) regions.
>>>>>>
>>>>>>
>>>>>> to
>>>>>>
>>>>>>     // Write the given list of G1 memory regions into the archive, starting at
>>>>>>     // first_region_in_archive.
>>>>>
>>>>>
>>>>> Ok. How about the following:
>>>>>
>>>>> // Write the given list of java heap memory regions into the archive, starting at
>>>>> // first_region_in_archive.
>>>>>
>>>> Sounds good.
>>>>
>>>> Thanks
>>>> - Ioi
>>>>
>>>>>>
>>>>>>
>>>>>> Also, for the explanation of how the G1 regions are written into the archive, how about:
>>>>>>
>>>>>>    // The G1 regions in the list are sorted in ascending address order. When there are more objects
>>>>>>    // than the capacity of a single G1 region, the bottom-most G1 region may be partially filled, and the
>>>>>>    // remaining G1 region(s) are consecutively allocated and fully filled.
>>>>>>    //
>>>>>>    // Thus, the bottom-most G1 region (if not empty) is written into first_region_in_archive.
>>>>>>    // The remaining G1 regions (if exist) are coalesced and written as a single block
>>>>>>    // into (first_region_in_archive + 1)
>>>>>>
>>>>>>    // Here's the mapping from (g1 regions) -> (archive regions).
>>>>>>
>>>>>>
>>>>>> All this function needs to do is to decide the values for
>>>>>>
>>>>>>      r0_start, r0_top
>>>>>>      r1_start, r1_top
>>>>>>
>>>>>> I think it would be much better to not use the loop, and not use the max_num_regions parameter (it's always 2 anyway).
>>>>>>
>>>>>>      *r0_start = *r0_top = NULL;
>>>>>>      *r1_start = *r1_top = NULL;
>>>>>>
>>>>>>      if (arr_len >= 1) {
>>>>>>          *r0_start = regions->at(0).start();
>>>>>>          *r0_end = *r0_start + regions->at(0).byte_size();
>>>>>>      }
>>>>>>      if (arr_len >= 2) {
>>>>>>          int last = arr_len - 1;
>>>>>>          *r1_start = regions->at(1).start();
>>>>>>          *r1_end = regions->at(last).start() + regions->at(last).byte_size();
>>>>>>       }
>>>>>>
>>>>>> what do you think?
>>>>>
>>>>> We need to write out all archive regions including the empty ones. The loop using max_num_regions is the easiest way. I’d like to remove the code that deals with r0_* and r1_ explicitly. Let me try that.
>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> (3) metaspace.cpp
>>>>>>
>>>>>> 3350         // Map the archived heap regions after compressed pointers
>>>>>> 3351         // because it relies on compressed class pointers setting to work
>>>>>>
>>>>>> do you mean this?
>>>>>>
>>>>>>     // Archived heap regions depend on the parameters of compressed class pointers, so
>>>>>>     // they must be mapped after such parameters have been decided in the above call.
>>>>>
>>>>> Hmmm, maybe use ‘arguments’ instead of ‘parameters’?
>>>>>  
>>>>>>
>>>>>>
>>>>>> (4) I found this name not strictly grammatical. How about this:
>>>>>>
>>>>>>      allow_archive_heap_object -> is_heap_object_archiving_allowed
>>>>>
>>>>> Ok.
>>>>>
>>>>>>
>>>>>> (5) in most of your code, 'archive' is used as a noun, except in StringTable::archive_string() where it's used as a verb.
>>>>>>
>>>>>> archive_string could also be interpreted erroneously as "return a string that's already in the archive". So to be consistent and unambiguous, I think it's better to rename it to StringTable::create_archived_string()
>>>>>
>>>>> Ok.
>>>>>
>>>>> Thanks,
>>>>> Jiangli
>>>>>
>>>>>>
>>>>>>
>>>>>> Thanks
>>>>>> - Ioi
>>>>>>
>>>>>>
>>>>>> On 8/3/17 5:15 PM, Jiangli Zhou wrote:
>>>>>>> Here are the updated webrevs.
>>>>>>>
>>>>>>> http://cr.openjdk.java.net/~jiangli/8179302/webrev.hotspot.02/ <http://cr.openjdk.java.net/%7Ejiangli/8179302/webrev.hotspot.02/>
>>>>>>> http://cr.openjdk.java.net/~jiangli/8179302/webrev.whitebox.02/ <http://cr.openjdk.java.net/%7Ejiangli/8179302/webrev.whitebox.02/>
>>>>>>>
>>>>>>> Changes in the updated webrevs include:
>>>>>>> Merge with Ioi’s recent shared space auto-sizing change (8072061)
>>>>>>> Addressed all feedbacks from Ioi and Coleen (Thanks for detailed review!)
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Jiangli
>>>>>>>
>>>>>>>
>>>>>>>> On Aug 1, 2017, at 5:29 PM, Jiangli Zhou <[hidden email]> <mailto:[hidden email]> wrote:
>>>>>>>>
>>>>>>>> Hi Ioi,
>>>>>>>>
>>>>>>>> Thank you so much for reviewing this. I’ve addressed all your feedbacks. Please see details below. I’ll updated the webrev after addressing Coleen’s comments.
>>>>>>>>
>>>>>>>>> On Jul 30, 2017, at 9:07 PM, Ioi Lam <[hidden email]> <mailto:[hidden email]> wrote:
>>>>>>>>>
>>>>>>>>> Hi Jiangli,
>>>>>>>>>
>>>>>>>>> Here are my comments. I've not reviewed the GC code and I'll leave that to the GC experts :-)
>>>>>>>>>
>>>>>>>>> stringTable.cpp: StringTable::archive_string
>>>>>>>>>
>>>>>>>>>    add assert for DumpSharedSpaces only
>>>>>>>>
>>>>>>>> Ok.
>>>>>>>>
>>>>>>>>>
>>>>>>>>> filemap.cpp
>>>>>>>>>
>>>>>>>>> 525 void FileMapInfo::write_archive_heap_regions(GrowableArray<MemRegion> *regions,
>>>>>>>>> 526                                              int first_region, int num_regions) {
>>>>>>>>>
>>>>>>>>> When I first read this function, I found it hard to follow, especially this part that coalesces the trailing regions:
>>>>>>>>>
>>>>>>>>> 537         int len = regions->length();
>>>>>>>>> 538         if (len > 1) {
>>>>>>>>> 539           start = (char*)regions->at(1).start();
>>>>>>>>> 540           size = (char*)regions->at(len - 1).end() - start;
>>>>>>>>> 541         }
>>>>>>>>> 542       }
>>>>>>>>>
>>>>>>>>> The rest of filemap.cpp always perform identical operations on MemRegion arrays, which are either 1 or 2 in size. However, this function doesn't follow that pattern; it also has a very different notion of "region", and the confusing part is regions->size() is not the same as num_regions.
>>>>>>>>>
>>>>>>>>> How about we change the API to something like the following? Before calling this API, the caller needs to coalesce the trailing G1 regions into a single MemRegion.
>>>>>>>>>
>>>>>>>>>     FileMapInfo::write_archive_heap_regions(MemRegion *regions, int first, int num_regions) {
>>>>>>>>>        if (first == MetaspaceShared::first_string) {
>>>>>>>>>           assert(num_regons <=  MetaspaceShared::max_strings, "...");
>>>>>>>>>        } else {
>>>>>>>>>           assert(first == MetaspaceShared::first_open_archive_heap_region, "...");
>>>>>>>>>           assert(num_regons <= MetaspaceShared::max_open_archive_heap_region, "...");
>>>>>>>>> }
>>>>>>>>>        ....
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>> I’ve reworked the function and simplified the code.
>>>>>>>>
>>>>>>>>>
>>>>>>>>> 756   if (!string_data_mapped) {
>>>>>>>>> 757     StringTable::ignore_shared_strings(true);
>>>>>>>>> 758     assert(string_ranges == NULL && num_string_ranges == 0, "sanity");
>>>>>>>>> 759   }
>>>>>>>>> 760
>>>>>>>>> 761   if (open_archive_heap_data_mapped) {
>>>>>>>>> 762     MetaspaceShared::set_open_archive_heap_region_mapped();
>>>>>>>>> 763   } else {
>>>>>>>>> 764     assert(open_archive_heap_ranges == NULL && num_open_archive_heap_ranges == 0, "sanity");
>>>>>>>>> 765   }
>>>>>>>>>
>>>>>>>>> Maybe the two "if" statements should be more consistent? Instead of StringTable::ignore_shared_strings, how about StringTable::set_shared_strings_region_mapped()?
>>>>>>>>
>>>>>>>> Fixed.
>>>>>>>>
>>>>>>>>>
>>>>>>>>> FileMapInfo::map_heap_data() --
>>>>>>>>>
>>>>>>>>> 818     char* addr = (char*)regions[i].start();
>>>>>>>>> 819     char* base = os::map_memory(_fd, _full_path, si->_file_offset,
>>>>>>>>> 820                                 addr, regions[i].byte_size(), si->_read_only,
>>>>>>>>> 821                                 si->_allow_exec);
>>>>>>>>>
>>>>>>>>> What happens when the first region succeeds to map but the second region fails to map? Will both regions be unmapped? I don't see where you store the return value (base) from os::map_memory(). Does it mean the code assumes that (addr == base). If so, we need an assert here.
>>>>>>>>
>>>>>>>> If any of the region fails to map, we bail out and call dealloc_archive_heap_regions(), which handles the deallocation of any regions specified. If second region fails to map, all memory ranges specified by ‘regions’ array are deallocated. We don’t unmap the memory here since it is part of the java heap. Unmapping of heap memory are handled by GC code. The ‘if’ check below makes sure base == addr.
>>>>>>>>
>>>>>>>>     if (base == NULL || base != addr) {
>>>>>>>>       // dealloc the regions from java heap
>>>>>>>>       dealloc_archive_heap_regions(regions, region_num);
>>>>>>>>       if (log_is_enabled(Info, cds)) {
>>>>>>>>         log_info(cds)("UseSharedSpaces: Unable to map at required address in java heap.");
>>>>>>>>       }
>>>>>>>>       return false;
>>>>>>>>     }
>>>>>>>>
>>>>>>>>
>>>>>>>>>
>>>>>>>>> constantPool.cpp
>>>>>>>>>
>>>>>>>>>     Handle refs_handle;
>>>>>>>>>     ...
>>>>>>>>>     refs_handle = Handle(THREAD, (oop)archived);
>>>>>>>>>
>>>>>>>>> This will first create a NULL handle, then construct a temporary handle, and then assign the temp handle back to the null handle. This means two handles will be pushed onto THREAD->metadata_handles()
>>>>>>>>>
>>>>>>>>> I think it's more efficient if you merge these into a single statement
>>>>>>>>>
>>>>>>>>>     Handle refs_handle(THREAD, (oop)archived);
>>>>>>>>
>>>>>>>> Fixed.
>>>>>>>>
>>>>>>>>>
>>>>>>>>> Is this experimental code? Maybe it should be removed?
>>>>>>>>>
>>>>>>>>> 664     if (tag_at(index).is_unresolved_klass()) {
>>>>>>>>> 665 #if 0
>>>>>>>>> 666       CPSlot entry = cp->slot_at(index);
>>>>>>>>> 667       Symbol* name = entry.get_symbol();
>>>>>>>>> 668       Klass* k = SystemDictionary::find(name, NULL, NULL, THREAD);
>>>>>>>>> 669       if (k != NULL) {
>>>>>>>>> 670         klass_at_put(index, k);
>>>>>>>>> 671       }
>>>>>>>>> 672 #endif
>>>>>>>>> 673     } else
>>>>>>>>
>>>>>>>> Removed.
>>>>>>>>
>>>>>>>>>
>>>>>>>>> cpCache.hpp:
>>>>>>>>>
>>>>>>>>>     u8   _archived_references
>>>>>>>>>
>>>>>>>>> shouldn't this be declared as an narrowOop to avoid the type casts when it's used?
>>>>>>>>
>>>>>>>> Ok.
>>>>>>>>
>>>>>>>>>
>>>>>>>>> cpCache.cpp:
>>>>>>>>>
>>>>>>>>>    add assert so that one of these is used only at dump time and the other only at run time?
>>>>>>>>>
>>>>>>>>> 610 oop ConstantPoolCache::archived_references() {
>>>>>>>>> 611   return oopDesc::decode_heap_oop((narrowOop)_archived_references);
>>>>>>>>> 612 }
>>>>>>>>> 613
>>>>>>>>> 614 void ConstantPoolCache::set_archived_references(oop o) {
>>>>>>>>> 615   _archived_references = (u8)oopDesc::encode_heap_oop(o);
>>>>>>>>> 616 }
>>>>>>>>
>>>>>>>> Ok.
>>>>>>>>
>>>>>>>> Thanks!
>>>>>>>>
>>>>>>>> Jiangli
>>>>>>>>
>>>>>>>>>
>>>>>>>>> Thanks!
>>>>>>>>> - Ioi
>>>>>>>>>
>>>>>>>>> On 7/27/17 1:37 PM, Jiangli Zhou wrote:
>>>>>>>>>> Sorry, the mail didn’t handle the rich text well. I fixed the format below.
>>>>>>>>>>
>>>>>>>>>> Please help review the changes for JDK-8179302 (Pre-resolve constant pool string entries and cache resolved_reference arrays in CDS archive). Currently, the CDS archive can contain cached class metadata, interned java.lang.String objects. This RFE adds the constant pool ‘resolved_references’ arrays (hotspot specific) to the archive for startup/runtime performance enhancement. The ‘resolved_references' arrays are used to hold references of resolved constant pool entries including Strings, mirrors, etc. With the 'resolved_references’ being cached, string constants in shared classes can now be resolved to existing interned java.lang.Strings at CDS dump time. G1 and 64-bit platforms are required.
>>>>>>>>>>
>>>>>>>>>> The GC changes in the RFE were discussed and guided by Thomas Schatzl and GC team. Part of the changes were contributed by Thomas himself.
>>>>>>>>>> RFE:        https://bugs.openjdk.java.net/browse/JDK-8179302 <https://bugs.openjdk.java.net/browse/JDK-8179302>
>>>>>>>>>> hotspot:   http://cr.openjdk.java.net/~jiangli/8179302/webrev.hotspot.01/ <http://cr.openjdk.java.net/%7Ejiangli/8179302/webrev.hotspot.01/>
>>>>>>>>>> whitebox: http://cr.openjdk.java.net/~jiangli/8179302/webrev.whitebox.01/ <http://cr.openjdk.java.net/%7Ejiangli/8179302/webrev.whitebox.01/>
>>>>>>>>>>
>>>>>>>>>> Please see below for details of supporting cached ‘resolved_references’ and pre-resolving string constants.
>>>>>>>>>>
>>>>>>>>>> Types of Pinned G1 Heap Regions
>>>>>>>>>>
>>>>>>>>>> The pinned region type is a super type of all archive region types, which include the open archive type and the closed archive type.
>>>>>>>>>>
>>>>>>>>>> 00100 0 [ 8] Pinned Mask
>>>>>>>>>> 01000 0 [16] Old Mask
>>>>>>>>>> 10000 0 [32] Archive Mask
>>>>>>>>>> 11100 0 [56] Open Archive:   ArchiveMask | PinnedMask | OldMask
>>>>>>>>>> 11100 1 [57] Closed Archive: ArchiveMask | PinnedMask | OldMask + 1
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Pinned Regions
>>>>>>>>>>
>>>>>>>>>> Objects within the region are 'pinned', which means GC does not move any live objects. GC scans and marks objects in the pinned region as normal, but skips forwarding live objects. Pointers in live objects are updated. Dead objects (unreachable) can be collected and freed.
>>>>>>>>>>
>>>>>>>>>> Archive Regions
>>>>>>>>>>
>>>>>>>>>> The archive types are sub-types of 'pinned'. There are two types of archive region currently, open archive and closed archive. Both can support caching java heap objects via the CDS archive.
>>>>>>>>>>
>>>>>>>>>> An archive region is also an old region by design.
>>>>>>>>>>
>>>>>>>>>> Open Archive (GC-RW) Regions
>>>>>>>>>>
>>>>>>>>>> Open archive region is GC writable. GC scans & marks objects within the region and adjusts (updates) pointers in live objects the same way as a pinned region. Live objects (reachable) are pinned and not forwarded by GC.
>>>>>>>>>> Open archive region does not have 'dead' objects. Unreachable objects are 'dormant' objects. Dormant objects are not collected and freed by GC.
>>>>>>>>>>
>>>>>>>>>> Adjustable Outgoing Pointers
>>>>>>>>>>
>>>>>>>>>> As GC can adjust pointers within the live objects in open archive heap region, objects can have outgoing pointers to another java heap region, including closed archive region, open archive region, pinned (or humongous) region, and normal generational region. When a referenced object is moved by GC, the pointer within the open archive region is updated accordingly.
>>>>>>>>>>
>>>>>>>>>> Closed Archive (GC-RO) Regions
>>>>>>>>>>
>>>>>>>>>> The closed archive region is GC read-only region. GC cannot write into the region. Objects are not scanned and marked by GC. Objects are pinned and not forwarded. Pointers are not updated by GC either. Hence, objects within the archive region cannot have any outgoing pointers to another java heap region. Objects however can still have pointers to other objects within the closed archive regions                                                   (we might allow pointers to open archive regions in the future). That restricts the type of java objects that can be supported by the archive region.
>>>>>>>>>> In JDK 9 we support archive Strings with the archive regions.
>>>>>>>>>>
>>>>>>>>>> The GC-readonly archive region makes java heap memory sharable among different JVM processes. NOTE: synchronization on the objects within the archive heap region can still cause writes to the memory page.
>>>>>>>>>>
>>>>>>>>>> Dormant Objects
>>>>>>>>>>
>>>>>>>>>> Dormant objects are unreachable java objects within the open archive heap region.
>>>>>>>>>> A java object in the open archive heap region is a live object if it can be reached during scanning. Some of the java objects in the region may not be reachable during scanning. Those objects are considered as dormant, but not dead. For example, a constant pool 'resolved_references' array is reachable via the klass root if its container klass (shared) is already loaded at the time during GC scanning. If a shared klass is not yet loaded, the klass root is not scanned and it's constant pool 'resolved_reference' array (A) in the open archive region is not reachable. Then A is a dormant object.
>>>>>>>>>>
>>>>>>>>>> Object State Transition
>>>>>>>>>>
>>>>>>>>>> All java objects are initially dormant objects when open archive heap regions are mapped to the runtime java heap. A dormant object becomes live object when the associated shared class is loaded at runtime. Explicit call to G1SATBCardTableModRefBS::enqueue() needs to be made when a dormant object becomes live. That should be the case for cached objects with strong roots as well, since strong roots are only scanned at the start of GC marking (the initial marking) but not during Remarking/Final marking. If a cached object becomes live during concurrent marking phase, G1 may not find it and mark it live unless a call to G1SATBCardTableModRefBS::enqueue() is made for the object.
>>>>>>>>>>
>>>>>>>>>> Currently, a live object in the open archive heap region cannot become dormant again. This restriction simplifies GC requirement and guarantees all outgoing pointers are updated by GC correctly. Only objects for shared classes from the builtin class loaders (boot, PlatformClassLoaders, and AppClassLoaders) are supported for caching.
>>>>>>>>>>
>>>>>>>>>> Caching Java Objects at Archive Dump Time
>>>>>>>>>>
>>>>>>>>>> The closed archive and open archive regions are allocated near the top of the dump time java heap. Archived java objects are copied into the designated archive heap regions. For example, String objects and the underlying 'value' arrays are copied into the closed archive regions. All references to the archived objects (from shared class metadata, string table, etc) are set to the new heap locations. A hash table is used to keep track of all archived java objects during the copying process to make sure java object is not archived more than once if reached from different roots. It also makes sure references to the same archived object are updated using the same new address location.
>>>>>>>>>>
>>>>>>>>>> Caching Constant Pool resolved_references Array
>>>>>>>>>>
>>>>>>>>>> The 'resolved_references' is an array that holds references of resolved constant pool entries including Strings, mirrors and methodTypes, etc. Each loaded class has one 'resolved_references' array (in ConstantPoolCache). The 'resolved_references' arrays are copied into the open archive regions during dump process. Prior to copying the 'resolved_references' arrays, JVM iterates through constant pool entries and resolves all JVM_CONSTANT_String entries to existing interned Strings for all archived classes. When resolving, JVM only looks up the string table and finds existing interned Strings without inserting new ones. If a string entry cannot be resolved to an existing interned String, the constant pool entry remain as unresolved. That prevents memory waste if a constant pool string entry is never used at runtime.
>>>>>>>>>>
>>>>>>>>>> All String objects referenced by the string table are copied first into the closed archive regions. The string table entry is updated with the new location when each String object is archived. The JVM updates the resolved constant pool string entries with the new object locations when copying the 'resolved_references' arrays to the open archive regions. References to the 'resolved_references' arrays in the ConstantPoolCache are also updated.
>>>>>>>>>> At runtime as part of ConstantPool::restore_unshareable_info() work, call G1SATBCardTableModRefBS::enqueue() to let GC know the 'resolved_references' is becoming live. A                                                   handle is created for the cached object and added to the loader_data's handles.
>>>>>>>>>>
>>>>>>>>>> Runtime Java Heap With Cached Java Objects
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> The closed archive regions (the string regions) and open archive regions are mapped to the runtime java heap at the same offsets as the dump time offsets from the runtime java heap base.
>>>>>>>>>>
>>>>>>>>>> Preliminary test execution and status:
>>>>>>>>>>
>>>>>>>>>> JPRT: passed
>>>>>>>>>> Tier2-rt: passed
>>>>>>>>>> Tier2-gc: passed
>>>>>>>>>> Tier2-comp: passed
>>>>>>>>>> Tier3-rt: passed
>>>>>>>>>> Tier3-gc: passed
>>>>>>>>>> Tier3-comp: passed
>>>>>>>>>> Tier4-rt: passed
>>>>>>>>>> Tier4-gc: passed
>>>>>>>>>> Tier4-comp:6 jobs timed out, all other tests passed
>>>>>>>>>> Tier5-rt: one test failed but passed when running locally, all other tests passed
>>>>>>>>>> Tier5-gc: passed
>>>>>>>>>> Tier5-comp: running
>>>>>>>>>> hotspot_gc: two jobs timed out, all other tests passed
>>>>>>>>>> hotspot_gc in CDS mode: two jobs timed out, all other tests passed
>>>>>>>>>> vm.gc: passed
>>>>>>>>>> vm.gc in CDS mode: passed
>>>>>>>>>> Kichensink: passed
>>>>>>>>>> Kichensink in CDS mode: passed
>>>>>>>>>>
>>>>>>>>>> Thanks,
>>>>>>>>>> Jiangli
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: RFR: 8179302: Pre-resolve constant pool string entries and cache resolved_reference arrays in CDS archive

coleen.phillimore

These incremental changes look good to me.
Thanks,
Coleen

On 8/10/17 7:51 PM, Jiangli Zhou wrote:

> Thanks, Ioi!
>
> Jiangli
>
>> On Aug 10, 2017, at 2:15 PM, Ioi Lam <[hidden email]
>> <mailto:[hidden email]>> wrote:
>>
>> Hi Jiangli,
>>
>>
>> The changes look good to me. Thanks for considering my suggestions.
>>
>>
>> - Ioi
>>
>>
>> On 8/8/17 5:33 PM, Jiangli Zhou wrote:
>>> Here is the incremental webrev that has all the changes incorporated
>>> with suggestions from Coleen, Ioi and Thomas:
>>>
>>> http://cr.openjdk.java.net/~jiangli/8179302/webrev.hotspot.03.inc/ 
>>> <http://cr.openjdk.java.net/%7Ejiangli/8179302/webrev.hotspot.03.inc/>
>>>
>>> Updated full webrev:
>>> http://cr.openjdk.java.net/~jiangli/8179302/webrev.hotspot.03/ 
>>> <http://cr.openjdk.java.net/%7Ejiangli/8179302/webrev.hotspot.03/>
>>>
>>> Thanks again for Coleen's, Ioi's and Thomas’ review!
>>> Jiangli
>>>
>>>> On Aug 7, 2017, at 7:57 PM, Jiangli Zhou <[hidden email]
>>>> <mailto:[hidden email]>> wrote:
>>>>
>>>> Hi Ioi,
>>>>
>>>> Thanks for getting back to me.
>>>>
>>>>> On Aug 7, 2017, at 5:45 PM, Ioi Lam <[hidden email]
>>>>> <mailto:[hidden email]>> wrote:
>>>>>
>>>>> On 8/4/17 10:19 PM, Jiangli Zhou wrote:
>>>>>
>>>>>> Hi Ioi,
>>>>>>
>>>>>> Thanks for looking again.
>>>>>>
>>>>>>> On Aug 4, 2017, at 2:22 PM, Ioi Lam <[hidden email]
>>>>>>> <mailto:[hidden email]>> wrote:
>>>>>>>
>>>>>>> Hi Jiangli,
>>>>>>>
>>>>>>> The code looks good in general. I just have a few pet peeves for
>>>>>>> readability:
>>>>>>>
>>>>>>>
>>>>>>> (1) stringTable.cpp and metaspaceShared.cpp have the same asserts
>>>>>>>
>>>>>>>  704 assert(UseG1GC, "Only support G1 GC");
>>>>>>>  705 assert(UseCompressedOops && UseCompressedClassPointers,
>>>>>>>  706 "Only support UseCompressedOops and
>>>>>>> UseCompressedClassPointers enabled");
>>>>>>>
>>>>>>> 1615 assert(UseG1GC, "Only support G1 GC");
>>>>>>> 1616 assert(UseCompressedOops && UseCompressedClassPointers,
>>>>>>> 1617 "Only support UseCompressedOops and
>>>>>>> UseCompressedClassPointers enabled");
>>>>>>>
>>>>>>> Maybe it's better to combine them into a single function like
>>>>>>> MetaspaceShared::assert_vm_flags() so they don't get out of sync?
>>>>>>
>>>>>> There is a MetaspaceShared::allow_archive_heap_object(), which
>>>>>> checks for UseG1GC, UseCompressedOops and
>>>>>> UseCompressedClassPointers combined. It does not seem to worth
>>>>>> add another separate API for asserting the required flags. I’ll
>>>>>> use that in the assert.
>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> (2) FileMapInfo::write_archive_heap_regions()
>>>>>>>
>>>>>>> I still find this code very hard to read, especially due to the
>>>>>>> loop.
>>>>>>>
>>>>>>> First, the comments are not consistent with the code:
>>>>>>>
>>>>>>> 498 assert(arr_len <= max_num_regions, "number of memory regions
>>>>>>> exceeds maximum");
>>>>>>>
>>>>>>> but the comments says: "The rest are consecutive full GC
>>>>>>> regions" which means there's a chance for max_num_regions to be
>>>>>>> more than 2 (which will be the case with Calvin's java-loader
>>>>>>> dumping changes using very small heap size).So the code is
>>>>>>> actually wrong.
>>>>>>
>>>>>> The max_num_regions is the maximum number of region for each
>>>>>> archived heap space (the string space, or open archive space). We
>>>>>> only run into the case where the MemRegion array size is larger
>>>>>> than max_num_regions with Calvin’s pending change. As part of
>>>>>> Calvin’s change, he will change the assert into a check and bail
>>>>>> out if the number of MemRegions are larger than max_num_regions
>>>>>> due to heap fragmentation.
>>>>>>
>>>>>>
>>>>> Your latest patch assumes that arr_len <= 2, but the
>>>>> implementation of
>>>>> G1CollectedHeap::heap()->begin_archive_alloc_range() /
>>>>> G1CollectedHeap::heap()->end_archive_alloc_range() actually allows
>>>>> more than 2 regions to returned. So simply putting an assert there
>>>>> seems risky (unless you have analyzed all possible scenarios to
>>>>> prove that's impossible).
>>>>>
>>>>> Instead of trying to come up with a complicated proof, I think
>>>>> it's much safer to disable the archived string region if the
>>>>> arr_len > 2. Also, if the string region is disabled, you should
>>>>> also disable the open_archive_heap_region
>>>>>
>>>>> I think this is a general issue with the mapped heap regions, and
>>>>> it just happens to be revealed by Calvin's patch. So we should fix
>>>>> it now and not wait for Calvin's patch.
>>>>
>>>> Ok. I’ll change the assert to be a check.
>>>>
>>>>>
>>>>>
>>>>>>>
>>>>>>> The word "region" is used in these parameters, but they don't
>>>>>>> mean the same thing.
>>>>>>>
>>>>>>> GrowableArray<MemRegion> *regions
>>>>>>> int first_region, int max_num_regions,
>>>>>>>
>>>>>>>
>>>>>>> How about regions      -> g1_regions_list
>>>>>>> first_region -> first_region_in_archive
>>>>>>
>>>>>> The GrowableArray above is the MemRegions that GC code gives back
>>>>>> to us. The GC code combines multiple G1 regions. The comments
>>>>>> probably are over-explaining the details, which are hidden in the
>>>>>> GC code. Probably that’s the confusing source. I’ll make the
>>>>>> comment more clear.
>>>>>>
>>>>>> Using g1_regions_list would also be confusing, since
>>>>>> write_archive_heap_regions does not handle G1 regions directly.
>>>>>> It processes the MemRegion array that GC code returns. How about
>>>>>> changing ‘regions’ to ‘mem_regions’ or ‘archive_regions'?
>>>>>>
>>>>> How about heap_regions? These are regions in the active Java heap,
>>>>> which current has not mapped anything from the CDS archive.
>>>>
>>>> Ok.
>>>>
>>>> I’m updating my changes and will send out a consolidated webrev.
>>>>
>>>> Thanks!
>>>> Jiangli
>>>>
>>>>>
>>>>>
>>>>>>>
>>>>>>>
>>>>>>> In the comments, I find the phrase 'the current archive heap
>>>>>>> region' ambiguous. It could be (erroneously) interpreted as "a
>>>>>>> region from the currently mapped archive”
>>>>>>>
>>>>>>> To make it unambiguous, how about changing
>>>>>>>
>>>>>>>
>>>>>>>  464 // Write the current archive heap region, which contains
>>>>>>> one or multiple GC(G1) regions.
>>>>>>>
>>>>>>>
>>>>>>> to
>>>>>>>
>>>>>>>     // Write the given list of G1 memory regions into the
>>>>>>> archive, starting at
>>>>>>>     // first_region_in_archive.
>>>>>>
>>>>>>
>>>>>> Ok. How about the following:
>>>>>>
>>>>>> // Write the given list of java heap memory regions into the
>>>>>> archive, starting at
>>>>>> // first_region_in_archive.
>>>>>>
>>>>> Sounds good.
>>>>>
>>>>> Thanks
>>>>> - Ioi
>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Also, for the explanation of how the G1 regions are written into
>>>>>>> the archive, how about:
>>>>>>>
>>>>>>>    // The G1 regions in the list are sorted in ascending address
>>>>>>> order. When there are more objects
>>>>>>>    // than the capacity of a single G1 region, the bottom-most
>>>>>>> G1 region may be partially filled, and the
>>>>>>>    // remaining G1 region(s) are consecutively allocated and
>>>>>>> fully filled.
>>>>>>>    //
>>>>>>>    // Thus, the bottom-most G1 region (if not empty) is written
>>>>>>> into first_region_in_archive.
>>>>>>>    // The remaining G1 regions (if exist) are coalesced and
>>>>>>> written as a single block
>>>>>>>    // into (first_region_in_archive + 1)
>>>>>>>
>>>>>>>    // Here's the mapping from (g1 regions) -> (archive regions).
>>>>>>>
>>>>>>>
>>>>>>> All this function needs to do is to decide the values for
>>>>>>>
>>>>>>> r0_start, r0_top
>>>>>>> r1_start, r1_top
>>>>>>>
>>>>>>> I think it would be much better to not use the loop, and not use
>>>>>>> the max_num_regions parameter (it's always 2 anyway).
>>>>>>>
>>>>>>>      *r0_start = *r0_top = NULL;
>>>>>>>      *r1_start = *r1_top = NULL;
>>>>>>>
>>>>>>> if (arr_len >= 1) {
>>>>>>>       *r0_start = regions->at(0).start();
>>>>>>>       *r0_end = *r0_start + regions->at(0).byte_size();
>>>>>>> }
>>>>>>> if (arr_len >= 2) {
>>>>>>>     int last = arr_len - 1;
>>>>>>>          *r1_start = regions->at(1).start();
>>>>>>>          *r1_end = regions->at(last).start() +
>>>>>>> regions->at(last).byte_size();
>>>>>>> }
>>>>>>>
>>>>>>> what do you think?
>>>>>>
>>>>>> We need to write out all archive regions including the empty
>>>>>> ones. The loop using max_num_regions is the easiest way. I’d like
>>>>>> to remove the code that deals with r0_* and r1_ explicitly. Let
>>>>>> me try that.
>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> (3) metaspace.cpp
>>>>>>>
>>>>>>> 3350         // Map the archived heap regions after compressed
>>>>>>> pointers
>>>>>>> 3351         // because it relies on compressed class pointers
>>>>>>> setting to work
>>>>>>>
>>>>>>> do you mean this?
>>>>>>>
>>>>>>>     // Archived heap regions depend on the parameters of
>>>>>>> compressed class pointers, so
>>>>>>>     // they must be mapped after such parameters have been
>>>>>>> decided in the above call.
>>>>>>
>>>>>> Hmmm, maybe use ‘arguments’ instead of ‘parameters’?
>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> (4) I found this name not strictly grammatical. How about this:
>>>>>>>
>>>>>>> allow_archive_heap_object -> is_heap_object_archiving_allowed
>>>>>>
>>>>>> Ok.
>>>>>>
>>>>>>>
>>>>>>> (5) in most of your code, 'archive' is used as a noun, except in
>>>>>>> StringTable::archive_string() where it's used as a verb.
>>>>>>>
>>>>>>> archive_string could also be interpreted erroneously as "return
>>>>>>> a string that's already in the archive". So to be consistent and
>>>>>>> unambiguous, I think it's better to rename it to
>>>>>>> StringTable::create_archived_string()
>>>>>>
>>>>>> Ok.
>>>>>>
>>>>>> Thanks,
>>>>>> Jiangli
>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Thanks
>>>>>>> - Ioi
>>>>>>>
>>>>>>>
>>>>>>> On 8/3/17 5:15 PM, Jiangli Zhou wrote:
>>>>>>>> Here are the updated webrevs.
>>>>>>>>
>>>>>>>> http://cr.openjdk.java.net/~jiangli/8179302/webrev.hotspot.02/ 
>>>>>>>> <http://cr.openjdk.java.net/%7Ejiangli/8179302/webrev.hotspot.02/>
>>>>>>>> http://cr.openjdk.java.net/~jiangli/8179302/webrev.whitebox.02/
>>>>>>>>
>>>>>>>> Changes in the updated webrevs include:
>>>>>>>>
>>>>>>>>   * Merge with Ioi’s recent shared space auto-sizing change
>>>>>>>>     (8072061)
>>>>>>>>   * Addressed all feedbacks from Ioi and Coleen (Thanks for
>>>>>>>>     detailed review!)
>>>>>>>>
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> Jiangli
>>>>>>>>
>>>>>>>>
>>>>>>>>> On Aug 1, 2017, at 5:29 PM, Jiangli Zhou
>>>>>>>>> <[hidden email]> wrote:
>>>>>>>>>
>>>>>>>>> Hi Ioi,
>>>>>>>>>
>>>>>>>>> Thank you so much for reviewing this. I’ve addressed all your
>>>>>>>>> feedbacks. Please see details below. I’ll updated the webrev
>>>>>>>>> after addressing Coleen’s comments.
>>>>>>>>>
>>>>>>>>>> On Jul 30, 2017, at 9:07 PM, Ioi Lam <[hidden email]> wrote:
>>>>>>>>>>
>>>>>>>>>> Hi Jiangli,
>>>>>>>>>>
>>>>>>>>>> Here are my comments. I've not reviewed the GC code and I'll
>>>>>>>>>> leave that to the GC experts :-)
>>>>>>>>>>
>>>>>>>>>> stringTable.cpp: StringTable::archive_string
>>>>>>>>>>
>>>>>>>>>>    add assert for DumpSharedSpaces only
>>>>>>>>>
>>>>>>>>> Ok.
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> filemap.cpp
>>>>>>>>>>
>>>>>>>>>> 525 void
>>>>>>>>>> FileMapInfo::write_archive_heap_regions(GrowableArray<MemRegion>
>>>>>>>>>> *regions,
>>>>>>>>>> 526        int first_region, int num_regions) {
>>>>>>>>>>
>>>>>>>>>> When I first read this function, I found it hard to follow,
>>>>>>>>>> especially this part that coalesces the trailing regions:
>>>>>>>>>>
>>>>>>>>>> 537 int len = regions->length();
>>>>>>>>>> 538         if (len > 1) {
>>>>>>>>>> 539 start = (char*)regions->at(1).start();
>>>>>>>>>> 540 size = (char*)regions->at(len - 1).end() - start;
>>>>>>>>>> 541         }
>>>>>>>>>> 542       }
>>>>>>>>>>
>>>>>>>>>> The rest of filemap.cpp always perform identical operations
>>>>>>>>>> on MemRegion arrays, which are either 1 or 2 in size.
>>>>>>>>>> However, this function doesn't follow that pattern; it also
>>>>>>>>>> has a very different notion of "region", and the confusing
>>>>>>>>>> part is regions->size() is not the same as num_regions.
>>>>>>>>>>
>>>>>>>>>> How about we change the API to something like the following?
>>>>>>>>>> Before calling this API, the caller needs to coalesce the
>>>>>>>>>> trailing G1 regions into a single MemRegion.
>>>>>>>>>>
>>>>>>>>>> FileMapInfo::write_archive_heap_regions(MemRegion *regions,
>>>>>>>>>> int first, int num_regions) {
>>>>>>>>>>        if (first == MetaspaceShared::first_string) {
>>>>>>>>>> assert(num_regons <=  MetaspaceShared::max_strings, "...");
>>>>>>>>>>        } else {
>>>>>>>>>> assert(first ==
>>>>>>>>>> MetaspaceShared::first_open_archive_heap_region, "...");
>>>>>>>>>> assert(num_regons <=
>>>>>>>>>> MetaspaceShared::max_open_archive_heap_region, "...");
>>>>>>>>>> }
>>>>>>>>>>        ....
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> I’ve reworked the function and simplified the code.
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> 756   if (!string_data_mapped) {
>>>>>>>>>> 757 StringTable::ignore_shared_strings(true);
>>>>>>>>>> 758 assert(string_ranges == NULL && num_string_ranges == 0,
>>>>>>>>>> "sanity");
>>>>>>>>>> 759   }
>>>>>>>>>> 760
>>>>>>>>>> 761   if (open_archive_heap_data_mapped) {
>>>>>>>>>> 762 MetaspaceShared::set_open_archive_heap_region_mapped();
>>>>>>>>>> 763   } else {
>>>>>>>>>> 764 assert(open_archive_heap_ranges == NULL &&
>>>>>>>>>> num_open_archive_heap_ranges == 0, "sanity");
>>>>>>>>>> 765   }
>>>>>>>>>>
>>>>>>>>>> Maybe the two "if" statements should be more consistent?
>>>>>>>>>> Instead of StringTable::ignore_shared_strings, how
>>>>>>>>>> about StringTable::set_shared_strings_region_mapped()?
>>>>>>>>>
>>>>>>>>> Fixed.
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> FileMapInfo::map_heap_data() --
>>>>>>>>>>
>>>>>>>>>> 818     char* addr = (char*)regions[i].start();
>>>>>>>>>> 819     char* base = os::map_memory(_fd, _full_path,
>>>>>>>>>> si->_file_offset,
>>>>>>>>>> 820         addr, regions[i].byte_size(), si->_read_only,
>>>>>>>>>> 821 si->_allow_exec);
>>>>>>>>>>
>>>>>>>>>> What happens when the first region succeeds to map but the
>>>>>>>>>> second region fails to map? Will both regions be unmapped? I
>>>>>>>>>> don't see where you store the return value (base) from
>>>>>>>>>> os::map_memory(). Does it mean the code assumes that (addr ==
>>>>>>>>>> base). If so, we need an assert here.
>>>>>>>>>
>>>>>>>>> If any of the region fails to map, we bail out and call
>>>>>>>>> dealloc_archive_heap_regions(), which handles the deallocation
>>>>>>>>> of any regions specified. If second region fails to map, all
>>>>>>>>> memory ranges specified by ‘regions’ array are deallocated. We
>>>>>>>>> don’t unmap the memory here since it is part of the java heap.
>>>>>>>>> Unmapping of heap memory are handled by GC code. The ‘if’
>>>>>>>>> check below makes sure base == addr.
>>>>>>>>>
>>>>>>>>>     if (base == NULL || base != addr) {
>>>>>>>>>       // dealloc the regions from java heap
>>>>>>>>> dealloc_archive_heap_regions(regions, region_num);
>>>>>>>>>   if (log_is_enabled(Info, cds)) {
>>>>>>>>> log_info(cds)("UseSharedSpaces: Unable to map at required
>>>>>>>>> address in java heap.");
>>>>>>>>>       }
>>>>>>>>>   return false;
>>>>>>>>>     }
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> constantPool.cpp
>>>>>>>>>>
>>>>>>>>>>     Handle refs_handle;
>>>>>>>>>>     ...
>>>>>>>>>> refs_handle = Handle(THREAD, (oop)archived);
>>>>>>>>>>
>>>>>>>>>> This will first create a NULL handle, then construct a
>>>>>>>>>> temporary handle, and then assign the temp handle back to the
>>>>>>>>>> null handle. This means two handles will be pushed onto
>>>>>>>>>> THREAD->metadata_handles()
>>>>>>>>>>
>>>>>>>>>> I think it's more efficient if you merge these into a single
>>>>>>>>>> statement
>>>>>>>>>>
>>>>>>>>>>     Handle refs_handle(THREAD, (oop)archived);
>>>>>>>>>
>>>>>>>>> Fixed.
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Is this experimental code? Maybe it should be removed?
>>>>>>>>>>
>>>>>>>>>> 664     if (tag_at(index).is_unresolved_klass()) {
>>>>>>>>>> 665 #if 0
>>>>>>>>>> 666 CPSlot entry = cp->slot_at(index);
>>>>>>>>>> 667 Symbol* name = entry.get_symbol();
>>>>>>>>>> 668 Klass* k = SystemDictionary::find(name, NULL, NULL, THREAD);
>>>>>>>>>> 669       if (k != NULL) {
>>>>>>>>>> 670 klass_at_put(index, k);
>>>>>>>>>> 671       }
>>>>>>>>>> 672 #endif
>>>>>>>>>> 673     } else
>>>>>>>>>
>>>>>>>>> Removed.
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> cpCache.hpp:
>>>>>>>>>>
>>>>>>>>>>     u8 _archived_references
>>>>>>>>>>
>>>>>>>>>> shouldn't this be declared as an narrowOop to avoid the type
>>>>>>>>>> casts when it's used?
>>>>>>>>>
>>>>>>>>> Ok.
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> cpCache.cpp:
>>>>>>>>>>
>>>>>>>>>>    add assert so that one of these is used only at dump time
>>>>>>>>>> and the other only at run time?
>>>>>>>>>>
>>>>>>>>>> 610 oop ConstantPoolCache::archived_references() {
>>>>>>>>>> 611   return
>>>>>>>>>> oopDesc::decode_heap_oop((narrowOop)_archived_references);
>>>>>>>>>> 612 }
>>>>>>>>>> 613
>>>>>>>>>> 614 void ConstantPoolCache::set_archived_references(oop o) {
>>>>>>>>>> 615 _archived_references = (u8)oopDesc::encode_heap_oop(o);
>>>>>>>>>> 616 }
>>>>>>>>>
>>>>>>>>> Ok.
>>>>>>>>>
>>>>>>>>> Thanks!
>>>>>>>>>
>>>>>>>>> Jiangli
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Thanks!
>>>>>>>>>> - Ioi
>>>>>>>>>>
>>>>>>>>>> On 7/27/17 1:37 PM, Jiangli Zhou wrote:
>>>>>>>>>>> Sorry, the mail didn’t handle the rich text well. I fixed
>>>>>>>>>>> the format below.
>>>>>>>>>>>
>>>>>>>>>>> Please help review the changes for JDK-8179302 (Pre-resolve
>>>>>>>>>>> constant pool string entries and cache resolved_reference
>>>>>>>>>>> arrays in CDS archive). Currently, the CDS archive can
>>>>>>>>>>> contain cached class metadata, interned java.lang.String
>>>>>>>>>>> objects. This RFE adds the constant pool
>>>>>>>>>>> ‘resolved_references’ arrays (hotspot specific) to the
>>>>>>>>>>> archive for startup/runtime performance enhancement.
>>>>>>>>>>> The ‘resolved_references' arrays are used to hold references
>>>>>>>>>>> of resolved constant pool entries including Strings,
>>>>>>>>>>> mirrors, etc. With the 'resolved_references’ being cached,
>>>>>>>>>>> string constants in shared classes can now be resolved to
>>>>>>>>>>> existing interned java.lang.Strings at CDS dump time. G1 and
>>>>>>>>>>> 64-bit platforms are required.
>>>>>>>>>>>
>>>>>>>>>>> The GC changes in the RFE were discussed and guided by
>>>>>>>>>>> Thomas Schatzl and GC team. Part of the changes were
>>>>>>>>>>> contributed by Thomas himself.
>>>>>>>>>>> RFE: https://bugs.openjdk.java.net/browse/JDK-8179302
>>>>>>>>>>> hotspot:
>>>>>>>>>>> http://cr.openjdk.java.net/~jiangli/8179302/webrev.hotspot.01/
>>>>>>>>>>> whitebox:
>>>>>>>>>>> http://cr.openjdk.java.net/~jiangli/8179302/webrev.whitebox.01/
>>>>>>>>>>>
>>>>>>>>>>> Please see below for details of supporting cached
>>>>>>>>>>> ‘resolved_references’ and pre-resolving string constants.
>>>>>>>>>>>
>>>>>>>>>>> Types of Pinned G1 Heap Regions
>>>>>>>>>>>
>>>>>>>>>>> The pinned region type is a super type of all archive region
>>>>>>>>>>> types, which include the open archive type and the closed
>>>>>>>>>>> archive type.
>>>>>>>>>>>
>>>>>>>>>>> 00100 0 [ 8] Pinned Mask
>>>>>>>>>>> 01000 0 [16] Old Mask
>>>>>>>>>>> 10000 0 [32] Archive Mask
>>>>>>>>>>> 11100 0 [56] Open Archive:   ArchiveMask | PinnedMask | OldMask
>>>>>>>>>>> 11100 1 [57] Closed Archive: ArchiveMask | PinnedMask |
>>>>>>>>>>> OldMask + 1
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Pinned Regions
>>>>>>>>>>>
>>>>>>>>>>> Objects within the region are 'pinned', which means GC does
>>>>>>>>>>> not move any live objects. GC scans and marks objects in the
>>>>>>>>>>> pinned region as normal, but skips forwarding live objects.
>>>>>>>>>>> Pointers in live objects are updated. Dead objects
>>>>>>>>>>> (unreachable) can be collected and freed.
>>>>>>>>>>>
>>>>>>>>>>> Archive Regions
>>>>>>>>>>>
>>>>>>>>>>> The archive types are sub-types of 'pinned'. There are two
>>>>>>>>>>> types of archive region currently, open archive and closed
>>>>>>>>>>> archive. Both can support caching java heap objects via the
>>>>>>>>>>> CDS archive.
>>>>>>>>>>>
>>>>>>>>>>> An archive region is also an old region by design.
>>>>>>>>>>>
>>>>>>>>>>> Open Archive (GC-RW) Regions
>>>>>>>>>>>
>>>>>>>>>>> Open archive region is GC writable. GC scans & marks objects
>>>>>>>>>>> within the region and adjusts (updates) pointers in live
>>>>>>>>>>> objects the same way as a pinned region. Live objects
>>>>>>>>>>> (reachable) are pinned and not forwarded by GC.
>>>>>>>>>>> Open archive region does not have 'dead' objects.
>>>>>>>>>>> Unreachable objects are 'dormant' objects. Dormant objects
>>>>>>>>>>> are not collected and freed by GC.
>>>>>>>>>>>
>>>>>>>>>>> Adjustable Outgoing Pointers
>>>>>>>>>>>
>>>>>>>>>>> As GC can adjust pointers within the live objects in open
>>>>>>>>>>> archive heap region, objects can have outgoing pointers to
>>>>>>>>>>> another java heap region, including closed archive region,
>>>>>>>>>>> open archive region, pinned (or humongous) region, and
>>>>>>>>>>> normal generational region. When a referenced object is
>>>>>>>>>>> moved by GC, the pointer within the open archive region is
>>>>>>>>>>> updated accordingly.
>>>>>>>>>>>
>>>>>>>>>>> Closed Archive (GC-RO) Regions
>>>>>>>>>>>
>>>>>>>>>>> The closed archive region is GC read-only region. GC cannot
>>>>>>>>>>> write into the region. Objects are not scanned and marked by
>>>>>>>>>>> GC. Objects are pinned and not forwarded. Pointers are not
>>>>>>>>>>> updated by GC either. Hence, objects within the archive
>>>>>>>>>>> region cannot have any outgoing pointers to another java
>>>>>>>>>>> heap region. Objects however can still have pointers to
>>>>>>>>>>> other objects within the closed archive regions (we might
>>>>>>>>>>> allow pointers to open archive regions in the future). That
>>>>>>>>>>> restricts the type of java objects that can be supported by
>>>>>>>>>>> the archive region.
>>>>>>>>>>> In JDK 9 we support archive Strings with the archive regions.
>>>>>>>>>>>
>>>>>>>>>>> The GC-readonly archive region makes java heap memory
>>>>>>>>>>> sharable among different JVM processes. NOTE:
>>>>>>>>>>> synchronization on the objects within the archive heap
>>>>>>>>>>> region can still cause writes to the memory page.
>>>>>>>>>>>
>>>>>>>>>>> Dormant Objects
>>>>>>>>>>>
>>>>>>>>>>> Dormant objects are unreachable java objects within the open
>>>>>>>>>>> archive heap region.
>>>>>>>>>>> A java object in the open archive heap region is a live
>>>>>>>>>>> object if it can be reached during scanning. Some of the
>>>>>>>>>>> java objects in the region may not be reachable during
>>>>>>>>>>> scanning. Those objects are considered as dormant, but not
>>>>>>>>>>> dead. For example, a constant pool 'resolved_references'
>>>>>>>>>>> array is reachable via the klass root if its container klass
>>>>>>>>>>> (shared) is already loaded at the time during GC scanning.
>>>>>>>>>>> If a shared klass is not yet loaded, the klass root is not
>>>>>>>>>>> scanned and it's constant pool 'resolved_reference' array
>>>>>>>>>>> (A) in the open archive region is not reachable. Then A is a
>>>>>>>>>>> dormant object.
>>>>>>>>>>>
>>>>>>>>>>> Object State Transition
>>>>>>>>>>>
>>>>>>>>>>> All java objects are initially dormant objects when open
>>>>>>>>>>> archive heap regions are mapped to the runtime java heap. A
>>>>>>>>>>> dormant object becomes live object when the associated
>>>>>>>>>>> shared class is loaded at runtime. Explicit call
>>>>>>>>>>> to G1SATBCardTableModRefBS::enqueue() needs to be made when
>>>>>>>>>>> a dormant object becomes live. That should be the case
>>>>>>>>>>> for cached objects with strong roots as well, since strong
>>>>>>>>>>> roots are only scanned at the start of GC marking (the
>>>>>>>>>>> initial marking) but not during Remarking/Final marking. If
>>>>>>>>>>> a cached object becomes live during concurrent marking
>>>>>>>>>>> phase, G1 may not find it and mark it live unless a call to
>>>>>>>>>>> G1SATBCardTableModRefBS::enqueue() is made for the object.
>>>>>>>>>>>
>>>>>>>>>>> Currently, a live object in the open archive heap region
>>>>>>>>>>> cannot become dormant again. This restriction simplifies GC
>>>>>>>>>>> requirement and guarantees all outgoing pointers are updated
>>>>>>>>>>> by GC correctly. Only objects for shared classes from the
>>>>>>>>>>> builtin class loaders (boot, PlatformClassLoaders, and
>>>>>>>>>>> AppClassLoaders) are supported for caching.
>>>>>>>>>>>
>>>>>>>>>>> Caching Java Objects at Archive Dump Time
>>>>>>>>>>>
>>>>>>>>>>> The closed archive and open archive regions are allocated
>>>>>>>>>>> near the top of the dump time java heap. Archived java
>>>>>>>>>>> objects are copied into the designated archive heap regions.
>>>>>>>>>>> For example, String objects and the underlying 'value'
>>>>>>>>>>> arrays are copied into the closed archive regions. All
>>>>>>>>>>> references to the archived objects (from shared class
>>>>>>>>>>> metadata, string table, etc) are set to the new heap
>>>>>>>>>>> locations. A hash table is used to keep track of all
>>>>>>>>>>> archived java objects during the copying process to make
>>>>>>>>>>> sure java object is not archived more than once if reached
>>>>>>>>>>> from different roots. It also makes sure references to the
>>>>>>>>>>> same archived object are updated using the same new address
>>>>>>>>>>> location.
>>>>>>>>>>>
>>>>>>>>>>> Caching Constant Pool resolved_references Array
>>>>>>>>>>>
>>>>>>>>>>> The 'resolved_references' is an array that holds references
>>>>>>>>>>> of resolved constant pool entries including Strings, mirrors
>>>>>>>>>>> and methodTypes, etc. Each loaded class has one
>>>>>>>>>>> 'resolved_references' array (in ConstantPoolCache). The
>>>>>>>>>>> 'resolved_references' arrays are copied into the open
>>>>>>>>>>> archive regions during dump process. Prior to copying the
>>>>>>>>>>> 'resolved_references' arrays, JVM iterates through constant
>>>>>>>>>>> pool entries and resolves all JVM_CONSTANT_String entries to
>>>>>>>>>>> existing interned Strings for all archived classes. When
>>>>>>>>>>> resolving, JVM only looks up the string table and finds
>>>>>>>>>>> existing interned Strings without inserting new ones. If
>>>>>>>>>>> a string entry cannot be resolved to an existing interned
>>>>>>>>>>> String, the constant pool entry remain as unresolved. That
>>>>>>>>>>> prevents memory waste if a constant pool string entry is
>>>>>>>>>>> never used at runtime.
>>>>>>>>>>>
>>>>>>>>>>> All String objects referenced by the string table are copied
>>>>>>>>>>> first into the closed archive regions. The string table
>>>>>>>>>>> entry is updated with the new location when each String
>>>>>>>>>>> object is archived. The JVM updates the resolved constant
>>>>>>>>>>> pool string entries with the new object locations when
>>>>>>>>>>> copying the 'resolved_references' arrays to the open archive
>>>>>>>>>>> regions. References to the 'resolved_references' arrays in
>>>>>>>>>>> the ConstantPoolCache are also updated.
>>>>>>>>>>> At runtime as part of
>>>>>>>>>>> ConstantPool::restore_unshareable_info() work, call
>>>>>>>>>>> G1SATBCardTableModRefBS::enqueue() to let GC know the
>>>>>>>>>>> 'resolved_references' is becoming live. A handle is created
>>>>>>>>>>> for the cached object and added to the loader_data's handles.
>>>>>>>>>>>
>>>>>>>>>>> Runtime Java Heap With Cached Java Objects
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> The closed archive regions (the string regions) and open
>>>>>>>>>>> archive regions are mapped to the runtime java heap at the
>>>>>>>>>>> same offsets as the dump time offsets from the runtime java
>>>>>>>>>>> heap base.
>>>>>>>>>>>
>>>>>>>>>>> Preliminary test execution and status:
>>>>>>>>>>>
>>>>>>>>>>> JPRT: passed
>>>>>>>>>>> Tier2-rt: passed
>>>>>>>>>>> Tier2-gc: passed
>>>>>>>>>>> Tier2-comp: passed
>>>>>>>>>>> Tier3-rt: passed
>>>>>>>>>>> Tier3-gc: passed
>>>>>>>>>>> Tier3-comp: passed
>>>>>>>>>>> Tier4-rt: passed
>>>>>>>>>>> Tier4-gc: passed
>>>>>>>>>>> Tier4-comp:6 jobs timed out, all other tests passed
>>>>>>>>>>> Tier5-rt: one test failed but passed when running locally,
>>>>>>>>>>> all other tests passed
>>>>>>>>>>> Tier5-gc: passed
>>>>>>>>>>> Tier5-comp: running
>>>>>>>>>>> hotspot_gc: two jobs timed out, all other tests passed
>>>>>>>>>>> hotspot_gc in CDS mode: two jobs timed out, all other tests
>>>>>>>>>>> passed
>>>>>>>>>>> vm.gc: passed
>>>>>>>>>>> vm.gc in CDS mode: passed
>>>>>>>>>>> Kichensink: passed
>>>>>>>>>>> Kichensink in CDS mode: passed
>>>>>>>>>>>
>>>>>>>>>>> Thanks,
>>>>>>>>>>> Jiangli
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

12
Loading...