RFR: 4890732: GZIPOutputStream doesn't support, in fact thwarts, use of optional GZIP fields

classic Classic list List threaded Threaded
20 messages Options
Reply | Threaded
Open this post in threaded view
|

RFR: 4890732: GZIPOutputStream doesn't support, in fact thwarts, use of optional GZIP fields

Lin Zang-2
4890732: GZIPOutputStream doesn't support, in fact thwarts, use of optional GZIP fields

-------------

Commit messages:
 - remove trailing spaces
 - 4890732: support optional GZIP fields in GZIPOutputStream

Changes: https://git.openjdk.java.net/jdk/pull/3072/files
 Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=3072&range=00
  Issue: https://bugs.openjdk.java.net/browse/JDK-4890732
  Stats: 184 lines in 1 file changed: 184 ins; 0 del; 0 mod
  Patch: https://git.openjdk.java.net/jdk/pull/3072.diff
  Fetch: git fetch https://git.openjdk.java.net/jdk pull/3072/head:pull/3072

PR: https://git.openjdk.java.net/jdk/pull/3072
Reply | Threaded
Open this post in threaded view
|

Re: RFR: 4890732: GZIPOutputStream doesn't support, in fact thwarts, use of optional GZIP fields

Lin Zang-2
On Thu, 18 Mar 2021 10:46:04 GMT, Lin Zang <[hidden email]> wrote:

> 4890732: GZIPOutputStream doesn't support, in fact thwarts, use of optional GZIP fields

Dear All,
This PR introduce new constructor of GZIPOutputStream to support adding extra field info in gzip file header, as decribed in RFC 1952 gzip spec (https://tools.ietf.org/html/rfc1952).
BTW, does a CSR required for this PR?  

Thanks!
Lin

-------------

PR: https://git.openjdk.java.net/jdk/pull/3072
Reply | Threaded
Open this post in threaded view
|

Re: RFR: 4890732: GZIPOutputStream doesn't support, in fact thwarts, use of optional GZIP fields

Lance Andersen-3
In reply to this post by Lin Zang-2
Hi Lin,

Thank you for your contribution.


I have no looked at this in any detail.  A few general comments:


  *   This will require a CSR
  *   Please update your PR to include test coverage demonstrating that the data can be written and then read back


Best,
Lance

On Mar 18, 2021, at 6:57 AM, Lin Zang <[hidden email]<mailto:[hidden email]>> wrote:

4890732: GZIPOutputStream doesn't support, in fact thwarts, use of optional GZIP fields

-------------

Commit messages:
- remove trailing spaces
- 4890732: support optional GZIP fields in GZIPOutputStream

Changes: https://git.openjdk.java.net/jdk/pull/3072/files
Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=3072&range=00
 Issue: https://bugs.openjdk.java.net/browse/JDK-4890732
 Stats: 184 lines in 1 file changed: 184 ins; 0 del; 0 mod
 Patch: https://git.openjdk.java.net/jdk/pull/3072.diff
 Fetch: git fetch https://git.openjdk.java.net/jdk pull/3072/head:pull/3072

PR: https://git.openjdk.java.net/jdk/pull/3072

[cid:E1C4E2F0-ECD0-4C9D-ADB4-B16CA7BCB7FC@home]



Lance Andersen| Principal Member of Technical Staff | +1.781.442.2037
Oracle Java Engineering
1 Network Drive
Burlington, MA 01803
[hidden email]<mailto:[hidden email]>



Reply | Threaded
Open this post in threaded view
|

Re: RFR: 4890732: GZIPOutputStream doesn't support, in fact thwarts, use of optional GZIP fields

Lin Zang-2
In reply to this post by Lin Zang-2
On Thu, 18 Mar 2021 10:50:05 GMT, Lin Zang <[hidden email]> wrote:

>> 4890732: GZIPOutputStream doesn't support, in fact thwarts, use of optional GZIP fields
>
> Dear All,
> This PR introduce new constructor of GZIPOutputStream to support adding extra field info in gzip file header, as decribed in RFC 1952 gzip spec (https://tools.ietf.org/html/rfc1952).
> BTW, does a CSR required for this PR?  
>
> Thanks!
> Lin

The CSR has been submitted at https://bugs.openjdk.java.net/browse/JDK-8263793

-------------

PR: https://git.openjdk.java.net/jdk/pull/3072
Reply | Threaded
Open this post in threaded view
|

Re: RFR: 4890732: GZIPOutputStream doesn't support, in fact thwarts, use of optional GZIP fields [v2]

Lin Zang-2
In reply to this post by Lin Zang-2
> 4890732: GZIPOutputStream doesn't support, in fact thwarts, use of optional GZIP fields

Lin Zang has updated the pull request incrementally with one additional commit since the last revision:

  add test case

-------------

Changes:
  - all: https://git.openjdk.java.net/jdk/pull/3072/files
  - new: https://git.openjdk.java.net/jdk/pull/3072/files/8cdc1fa6..533199ef

Webrevs:
 - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=3072&range=01
 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=3072&range=00-01

  Stats: 62 lines in 1 file changed: 58 ins; 0 del; 4 mod
  Patch: https://git.openjdk.java.net/jdk/pull/3072.diff
  Fetch: git fetch https://git.openjdk.java.net/jdk pull/3072/head:pull/3072

PR: https://git.openjdk.java.net/jdk/pull/3072
Reply | Threaded
Open this post in threaded view
|

Re: RFR: 4890732: GZIPOutputStream doesn't support, in fact thwarts, use of optional GZIP fields [v3]

Lin Zang-2
In reply to this post by Lin Zang-2
> 4890732: GZIPOutputStream doesn't support, in fact thwarts, use of optional GZIP fields

Lin Zang has updated the pull request incrementally with two additional commits since the last revision:

 - update copyright
 - reuse arguments constructor for non-argument one.

-------------

Changes:
  - all: https://git.openjdk.java.net/jdk/pull/3072/files
  - new: https://git.openjdk.java.net/jdk/pull/3072/files/533199ef..9b7dabfe

Webrevs:
 - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=3072&range=02
 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=3072&range=01-02

  Stats: 14 lines in 1 file changed: 0 ins; 11 del; 3 mod
  Patch: https://git.openjdk.java.net/jdk/pull/3072.diff
  Fetch: git fetch https://git.openjdk.java.net/jdk pull/3072/head:pull/3072

PR: https://git.openjdk.java.net/jdk/pull/3072
Reply | Threaded
Open this post in threaded view
|

Re: RFR: 4890732: GZIPOutputStream doesn't support, in fact thwarts, use of optional GZIP fields [v3]

Lance Andersen
On Fri, 19 Mar 2021 08:19:58 GMT, Lin Zang <[hidden email]> wrote:

>> 4890732: GZIPOutputStream doesn't support, in fact thwarts, use of optional GZIP fields
>
> Lin Zang has updated the pull request incrementally with two additional commits since the last revision:
>
>  - update copyright
>  - reuse arguments constructor for non-argument one.

Hi Lin,
 
Again, thank you for taking on this 19+ year feature request.

I have not done a deep dive on the CSR, but wanted to get a few comments back to you on a quick scan of the last PR update.

Personally, I would like to see more testing for a change such as this given the age of the code:

- Please do not modify the existing test, you can either create  a new test(s) or add tests to the existing test class
- We should capture Gzip files with these headers set from other tools and store the Gzip in an array within the test which can then be written to disk so the tests can validate interoperability.  Please see some of the other Zip tests for an example
- We should have tests that include some , but not all of the additional fields as I believe they are all optional according to the RFC.
- Please include some negative tests


I have also include some additional comments within the code

src/java.base/share/classes/java/util/zip/GZIPOutputStream.java line 193:

> 191:      * @param generateHeaderCRC
> 192:      *        if {@code true} the header will include the CRC16 of the header.
> 193:      * @param extraFieldBytes the byte array of extra filed,

filed -> field

src/java.base/share/classes/java/util/zip/GZIPOutputStream.java line 195:

> 193:      * @param extraFieldBytes the byte array of extra filed,
> 194:      *                        the generated header would calculate the byte[] size
> 195:      *                        and fill it before the byte[] in header.

This is not clear what you are trying to articulate here

src/java.base/share/classes/java/util/zip/GZIPOutputStream.java line 269:

> 267:      * @param generateHeaderCRC
> 268:      *        if {@code true} the header will include the CRC16 of the header.
> 269:      * @param extraFieldBytes the byte array of extra filed,

filed -> field

src/java.base/share/classes/java/util/zip/GZIPOutputStream.java line 320:

> 318:              *     +---+---+=================================+
> 319:              */
> 320:             int xlen = extraFieldBytes.length;

On a quick look at the RFC,  I noticed the following:

------------------------
2.3.1.1. Extra field

         If the FLG.FEXTRA bit is set, an "extra field" is present in
         the header, with total length XLEN bytes.  It consists of a
         series of subfields, each of the form:

            +---+---+---+---+==================================+
            |SI1|SI2|  LEN  |... LEN bytes of subfield data ...|
            +---+---+---+---+==================================+

         SI1 and SI2 provide a subfield ID, typically two ASCII letters
         with some mnemonic value.  Jean-Loup Gailly
         <[hidden email]> is maintaining a registry of subfield
         IDs; please send him any subfield ID you wish to use.  Subfield
         IDs with SI2 = 0 are reserved for future use.  The following
         IDs are currently defined:


---------------

This does not seem to be accounted for or is it?

src/java.base/share/classes/java/util/zip/GZIPOutputStream.java line 323:

> 321:             if (xlen > 0xffff) {
> 322:                 throw new ZipException("extra field size out of range");
> 323:             }

Where is the ZipException documented that is being thrown?

src/java.base/share/classes/java/util/zip/GZIPOutputStream.java line 342:

> 340:              *    +=========================================+
> 341:              */
> 342:             out.write(filename);

The RFC indicates:

             If FNAME is set, an original file name is present,
            terminated by a zero byte.  The name must consist of ISO
            8859-1 (LATIN-1) characters; on operating systems using
            EBCDIC or any other character set for file names, the name
            must be translated to the ISO LATIN-1 character set.  This
            is the original name of the file being compressed, with any
            directory components removed, and, if the file being
            compressed is on a file system with case insensitive names,
            forced to lower case. There is no original file name if the
            data was compressed from a source other than a named file;
            for example, if the source was stdin on a Unix system, there
            is no file name.


It looks like there is no validation being done so I am not sure what the expectation is.

src/java.base/share/classes/java/util/zip/GZIPOutputStream.java line 357:

> 355:              */
> 356:             out.write(fileComment);
> 357:             out.write(0);

The RFC states:

            If FCOMMENT is set, a zero-terminated file comment is
            present.  This comment is not interpreted; it is only
            intended for human consumption.  The comment must consist of
            ISO 8859-1 (LATIN-1) characters.  Line breaks should be
            denoted by a single line feed character (10 decimal).


So should the characters be validates as well as line breaks?

test/jdk/java/util/zip/GZIP/GZIPOutputStreamHeaderTest.java line 3:

> 1: /*
> 2:  * Copyright (c) 2021, Oracle and/or its affiliates. All rights reserved.
> 3:  * DO NOT ALTER OR REMOVE COPYRIGHT NOTICES OR THIS FILE HEADER.

The copyright would be 2020, 2021,

-------------

Changes requested by lancea (Reviewer).

PR: https://git.openjdk.java.net/jdk/pull/3072
Reply | Threaded
Open this post in threaded view
|

Re: RFR: 4890732: GZIPOutputStream doesn't support, in fact thwarts, use of optional GZIP fields [v3]

Lin Zang-2
On Mon, 22 Mar 2021 21:37:27 GMT, Lance Andersen <[hidden email]> wrote:

>> Lin Zang has updated the pull request incrementally with two additional commits since the last revision:
>>
>>  - update copyright
>>  - reuse arguments constructor for non-argument one.
>
> Hi Lin,
>  
> Again, thank you for taking on this 19+ year feature request.
>
> I have not done a deep dive on the CSR, but wanted to get a few comments back to you on a quick scan of the last PR update.
>
> Personally, I would like to see more testing for a change such as this given the age of the code:
>
> - Please do not modify the existing test, you can either create  a new test(s) or add tests to the existing test class
> - We should capture Gzip files with these headers set from other tools and store the Gzip in an array within the test which can then be written to disk so the tests can validate interoperability.  Please see some of the other Zip tests for an example
> - We should have tests that include some , but not all of the additional fields as I believe they are all optional according to the RFC.
> - Please include some negative tests
>
>
> I have also include some additional comments within the code

Hi Lance,
Thanks a lot for your review. I will update the PR ASAP.
May I ask your help to also review the CSR? Thanks!

BRs,
Lin

-------------

PR: https://git.openjdk.java.net/jdk/pull/3072
Reply | Threaded
Open this post in threaded view
|

Re: RFR: 4890732: GZIPOutputStream doesn't support, in fact thwarts, use of optional GZIP fields [v3]

Lance Andersen
On Wed, 24 Mar 2021 06:51:16 GMT, Lin Zang <[hidden email]> wrote:

>> Hi Lin,
>>  
>> Again, thank you for taking on this 19+ year feature request.
>>
>> I have not done a deep dive on the CSR, but wanted to get a few comments back to you on a quick scan of the last PR update.
>>
>> Personally, I would like to see more testing for a change such as this given the age of the code:
>>
>> - Please do not modify the existing test, you can either create  a new test(s) or add tests to the existing test class
>> - We should capture Gzip files with these headers set from other tools and store the Gzip in an array within the test which can then be written to disk so the tests can validate interoperability.  Please see some of the other Zip tests for an example
>> - We should have tests that include some , but not all of the additional fields as I believe they are all optional according to the RFC.
>> - Please include some negative tests
>>
>>
>> I have also include some additional comments within the code
>
> Hi Lance,
> Thanks a lot for your review. I will update the PR ASAP.
> May I ask your help to also review the CSR? Thanks!
>
> BRs,
> Lin

Hi Lin,

On Mar 24, 2021, at 2:51 AM, Lin Zang ***@***.******@***.***>> wrote:



Hi Lance,
Thanks a lot for your review. I will update the PR ASAP.
May I ask your help to also review the CSR?

I believe we need to flush out some of the issues I raised that were not test related as they will result in updates to the javadoc which will require an update to the CSR.



Thanks!

BRs,
Lin


You are receiving this because you commented.
Reply to this email directly, view it on GitHub<https://github.com/openjdk/jdk/pull/3072#issuecomment-805549600>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/ABQPFO5R4INTAUFZ3QS6WL3TFGDXHANCNFSM4ZMLU6SA>.

***@***.***



Lance Andersen| Principal Member of Technical Staff | +1.781.442.2037
Oracle Java Engineering
1 Network Drive
Burlington, MA 01803
***@***.******@***.***>

-------------

PR: https://git.openjdk.java.net/jdk/pull/3072
Reply | Threaded
Open this post in threaded view
|

Re: RFR: 4890732: GZIPOutputStream doesn't support, in fact thwarts, use of optional GZIP fields [v3]

Lin Zang-2
On Wed, 24 Mar 2021 10:25:44 GMT, Lance Andersen <[hidden email]> wrote:

>> Hi Lance,
>> Thanks a lot for your review. I will update the PR ASAP.
>> May I ask your help to also review the CSR? Thanks!
>>
>> BRs,
>> Lin
>
> Hi Lin,
>
> On Mar 24, 2021, at 2:51 AM, Lin Zang ***@***.******@***.***>> wrote:
>
>
>
> Hi Lance,
> Thanks a lot for your review. I will update the PR ASAP.
> May I ask your help to also review the CSR?
>
> I believe we need to flush out some of the issues I raised that were not test related as they will result in updates to the javadoc which will require an update to the CSR.
>
>
>
> Thanks!
>
> BRs,
> Lin
>
> —
> You are receiving this because you commented.
> Reply to this email directly, view it on GitHub<https://github.com/openjdk/jdk/pull/3072#issuecomment-805549600>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/ABQPFO5R4INTAUFZ3QS6WL3TFGDXHANCNFSM4ZMLU6SA>.
>
> ***@***.***
>
>
>
> Lance Andersen| Principal Member of Technical Staff | +1.781.442.2037
> Oracle Java Engineering
> 1 Network Drive
> Burlington, MA 01803
> ***@***.******@***.***>

Dear Lance,

Sorry that I reply so late, I got stuck at some work recently.

> * We should have tests that include some , but not all of the additional fields as I believe they are all optional according to the RFC.

I see Joe's comments in the CSR about the encode of the byte array of String-alike field such file name, I checked that the RFC has description it is "ISO8859-1". So do you think it is ok to change the argument type to String and add the encoding decription in the documented comments?

And Joe also suggested to make all optional header field as a class (I propose to name it "gzipHeaderMetaRecord" ). And then the constructor could accept it and process related flag and fields.  Moreover it seems this class cloud also be used in GzipInputStream,Do you think it is ok to also change it here?  Thanks!


BRs,
Lin

-------------

PR: https://git.openjdk.java.net/jdk/pull/3072
Reply | Threaded
Open this post in threaded view
|

Re: RFR: 4890732: GZIPOutputStream doesn't support, in fact thwarts, use of optional GZIP fields [v3]

Lin Zang-2
In reply to this post by Lance Andersen
On Mon, 22 Mar 2021 20:06:33 GMT, Lance Andersen <[hidden email]> wrote:

>> Lin Zang has updated the pull request incrementally with two additional commits since the last revision:
>>
>>  - update copyright
>>  - reuse arguments constructor for non-argument one.
>
> src/java.base/share/classes/java/util/zip/GZIPOutputStream.java line 320:
>
>> 318:              *     +---+---+=================================+
>> 319:              */
>> 320:             int xlen = extraFieldBytes.length;
>
> On a quick look at the RFC,  I noticed the following:
>
> ------------------------
> 2.3.1.1. Extra field
>
>          If the FLG.FEXTRA bit is set, an "extra field" is present in
>          the header, with total length XLEN bytes.  It consists of a
>          series of subfields, each of the form:
>
>             +---+---+---+---+==================================+
>             |SI1|SI2|  LEN  |... LEN bytes of subfield data ...|
>             +---+---+---+---+==================================+
>
>          SI1 and SI2 provide a subfield ID, typically two ASCII letters
>          with some mnemonic value.  Jean-Loup Gailly
>          <[hidden email]> is maintaining a registry of subfield
>          IDs; please send him any subfield ID you wish to use.  Subfield
>          IDs with SI2 = 0 are reserved for future use.  The following
>          IDs are currently defined:
>
>
> ---------------
>
> This does not seem to be accounted for or is it?

Yes, This should be considered, I will add logic in the code.
However, I did some search and found that there is only one subfield defined, which is mentioned in RFC:
SI1         SI2         Data
----------  ----------  ----
0x41 ('A')  0x70 ('P')  Apollo file type information

or SI2 = 0  is reserved.

It seems this subfield definition is no longe maintained. (FYI, https://stackoverflow.com/questions/65188890/what-gzip-extra-field-subfields-exist)

So is it ok if make it only accept these two cases for FLG.FEXTRA and reject others?

-------------

PR: https://git.openjdk.java.net/jdk/pull/3072
Reply | Threaded
Open this post in threaded view
|

Re: RFR: 4890732: GZIPOutputStream doesn't support, in fact thwarts, use of optional GZIP fields [v3]

Lin Zang-2
In reply to this post by Lance Andersen
On Mon, 22 Mar 2021 20:15:38 GMT, Lance Andersen <[hidden email]> wrote:

> on operating systems using
>         EBCDIC or any other character set for file names, the name
>         must be translated to the ISO LATIN-1 character set.  This
>         is the original name of the file being compressed, with any
>         directory components removed, and, if the file being
>         compressed is on a file system with case insensitive names,
>         forced to lower case. There is no original file name if the
>         data was compressed from a source other than a named file;
>         for example, if the source was stdin on a Unix system, there
>         is no file name.

IMHO, this looks much like the rule that user should take care of,  I think here we only need to accept a String and decode it with "ISO-8859-1" to byte array then write to the header.  Is this reasonable?

-------------

PR: https://git.openjdk.java.net/jdk/pull/3072
Reply | Threaded
Open this post in threaded view
|

Re: RFR: 4890732: GZIPOutputStream doesn't support, in fact thwarts, use of optional GZIP fields [v3]

Lin Zang-2
In reply to this post by Lance Andersen
On Mon, 22 Mar 2021 20:18:37 GMT, Lance Andersen <[hidden email]> wrote:

>> Lin Zang has updated the pull request incrementally with two additional commits since the last revision:
>>
>>  - update copyright
>>  - reuse arguments constructor for non-argument one.
>
> src/java.base/share/classes/java/util/zip/GZIPOutputStream.java line 357:
>
>> 355:              */
>> 356:             out.write(fileComment);
>> 357:             out.write(0);
>
> The RFC states:
>
>             If FCOMMENT is set, a zero-terminated file comment is
>             present.  This comment is not interpreted; it is only
>             intended for human consumption.  The comment must consist of
>             ISO 8859-1 (LATIN-1) characters.  Line breaks should be
>             denoted by a single line feed character (10 decimal).
>
>
> So should the characters be validates as well as line breaks?

Same as file name , I think it is the users responsibility to pass the right encoded String as an argument. if constructor is designed to accept a String here, I think it can decode to byte array with ISO 8859-1, and then write to the header.

Moreover,I can't figure out a way to testing the encoding information of the String, do you have any clue about validate it?

-------------

PR: https://git.openjdk.java.net/jdk/pull/3072
Reply | Threaded
Open this post in threaded view
|

Re: RFR: 4890732: GZIPOutputStream doesn't support, in fact thwarts, use of optional GZIP fields [v3]

Alan Bateman-2
In reply to this post by Lin Zang-2
On Fri, 26 Mar 2021 02:14:54 GMT, Lin Zang <[hidden email]> wrote:

>> Hi Lin,
>>
>> On Mar 24, 2021, at 2:51 AM, Lin Zang ***@***.******@***.***>> wrote:
>>
>>
>>
>> Hi Lance,
>> Thanks a lot for your review. I will update the PR ASAP.
>> May I ask your help to also review the CSR?
>>
>> I believe we need to flush out some of the issues I raised that were not test related as they will result in updates to the javadoc which will require an update to the CSR.
>>
>>
>>
>> Thanks!
>>
>> BRs,
>> Lin
>>
>> —
>> You are receiving this because you commented.
>> Reply to this email directly, view it on GitHub<https://github.com/openjdk/jdk/pull/3072#issuecomment-805549600>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/ABQPFO5R4INTAUFZ3QS6WL3TFGDXHANCNFSM4ZMLU6SA>.
>>
>> ***@***.***
>>
>>
>>
>> Lance Andersen| Principal Member of Technical Staff | +1.781.442.2037
>> Oracle Java Engineering
>> 1 Network Drive
>> Burlington, MA 01803
>> ***@***.******@***.***>
>
> Dear Lance,
>
> Sorry that I reply so late, I got stuck at some work recently.
>
>> * We should have tests that include some , but not all of the additional fields as I believe they are all optional according to the RFC.
>
> I see Joe's comments in the CSR about the encode of the byte array of String-alike field such file name, I checked that the RFC has description it is "ISO8859-1". So do you think it is ok to change the argument type to String and add the encoding decription in the documented comments?
>
> And Joe also suggested to make all optional header field as a class (I propose to name it "gzipHeaderMetaRecord" ). And then the constructor could accept it and process related flag and fields.  Moreover it seems this class cloud also be used in GzipInputStream,Do you think it is ok to also change it here?  Thanks!
>
>
> BRs,
> Lin

GZIPInputStream/GZIPOutputStream are JDK 1.1 era classes that aren't well specified and I think we'll have to do some improvements as part of extending the support. For example, the GZIPOutputStream constructors (existing and proposed) do not specify that they write the member header.

As regards the new constructors then I think new parameters will need to be clearly specified and/or linked to their description in the RFC. The types of some of the parameters may need to be re-examined, maybe the file name and comment should be provided as Strings (that will help with the discussion about the encoding to 8859_1). I think the javadoc will need to make it clear on what flags/values are allowed and the behavior when other values are used. It might be that the class will need enums and other classes to provide a better API.

-------------

PR: https://git.openjdk.java.net/jdk/pull/3072
Reply | Threaded
Open this post in threaded view
|

Re: RFR: 4890732: GZIPOutputStream doesn't support, in fact thwarts, use of optional GZIP fields [v3]

Lance Andersen-3


On Mar 27, 2021, at 12:44 PM, Alan Bateman <[hidden email]<mailto:[hidden email]>> wrote:


Dear Lance,

Sorry that I reply so late, I got stuck at some work recently.

* We should have tests that include some , but not all of the additional fields as I believe they are all optional according to the RFC.

I see Joe's comments in the CSR about the encode of the byte array of String-alike field such file name, I checked that the RFC has description it is "ISO8859-1". So do you think it is ok to change the argument type to String and add the encoding decription in the documented comments?

And Joe also suggested to make all optional header field as a class (I propose to name it "gzipHeaderMetaRecord" ). And then the constructor could accept it and process related flag and fields.  Moreover it seems this class cloud also be used in GzipInputStream,Do you think it is ok to also change it here?  Thanks!


BRs,
Lin

GZIPInputStream/GZIPOutputStream are JDK 1.1 era classes that aren't well specified and I think we'll have to do some improvements as part of extending the support. For example, the GZIPOutputStream constructors (existing and proposed) do not specify that they write the member header.

That is a reasonable suggestion to indicate what is and is not done.

As regards the new constructors then I think new parameters will need to be clearly specified and/or linked to their description in the RFC. The types of some of the parameters may need to be re-examined, maybe the file name and comment should be provided as Strings (that will help with the discussion about the encoding to 8859_1).

Agree and that will also align it with ZipEntry(String), Zipentry::setComment

I think the javadoc will need to make it clear on what flags/values are allowed and the behavior when other values are used.

The Gzip spec is kind of vague in this area for example with the SI1 and SI2 for the Extra Field only one value set is specified but I can imagine additional values being somewhere.

It might be that the class will need enums and other classes to provide a better API.

I am not sure how much if at all these extra header fields are used.  Certainly if the API now sets them, we should probably provide a way to access them as well.

-------------

PR: https://git.openjdk.java.net/jdk/pull/3072

[cid:E1C4E2F0-ECD0-4C9D-ADB4-B16CA7BCB7FC@home]



Lance Andersen| Principal Member of Technical Staff | +1.781.442.2037
Oracle Java Engineering
1 Network Drive
Burlington, MA 01803
[hidden email]<mailto:[hidden email]>



Reply | Threaded
Open this post in threaded view
|

Re: RFR: 4890732: GZIPOutputStream doesn't support optional GZIP fields [v4]

Lin Zang-2
In reply to this post by Lin Zang-2
> 4890732: GZIPOutputStream doesn't support optional GZIP fields

Lin Zang has updated the pull request incrementally with one additional commit since the last revision:

  add class GZIPHeaderData, refine testcases

-------------

Changes:
  - all: https://git.openjdk.java.net/jdk/pull/3072/files
  - new: https://git.openjdk.java.net/jdk/pull/3072/files/9b7dabfe..03b3e966

Webrevs:
 - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=3072&range=03
 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=3072&range=02-03

  Stats: 633 lines in 3 files changed: 421 ins; 183 del; 29 mod
  Patch: https://git.openjdk.java.net/jdk/pull/3072.diff
  Fetch: git fetch https://git.openjdk.java.net/jdk pull/3072/head:pull/3072

PR: https://git.openjdk.java.net/jdk/pull/3072
Reply | Threaded
Open this post in threaded view
|

Re: RFR: 4890732: GZIPOutputStream doesn't support optional GZIP fields [v3]

Lin Zang-2
In reply to this post by Lance Andersen
On Wed, 24 Mar 2021 10:25:44 GMT, Lance Andersen <[hidden email]> wrote:

>> Hi Lance,
>> Thanks a lot for your review. I will update the PR ASAP.
>> May I ask your help to also review the CSR? Thanks!
>>
>> BRs,
>> Lin
>
> Hi Lin,
>
> On Mar 24, 2021, at 2:51 AM, Lin Zang ***@***.******@***.***>> wrote:
>
>
>
> Hi Lance,
> Thanks a lot for your review. I will update the PR ASAP.
> May I ask your help to also review the CSR?
>
> I believe we need to flush out some of the issues I raised that were not test related as they will result in updates to the javadoc which will require an update to the CSR.
>
>
>
> Thanks!
>
> BRs,
> Lin
>
> —
> You are receiving this because you commented.
> Reply to this email directly, view it on GitHub<https://github.com/openjdk/jdk/pull/3072#issuecomment-805549600>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/ABQPFO5R4INTAUFZ3QS6WL3TFGDXHANCNFSM4ZMLU6SA>.
>
> ***@***.***
>
>
>
> Lance Andersen| Principal Member of Technical Staff | +1.781.442.2037
> Oracle Java Engineering
> 1 Network Drive
> Burlington, MA 01803
> ***@***.******@***.***>

Dear @LanceAndersen,
Here I added a class `GZIPHeaderData` which is defined to hold all gzip header members, and make it an argument of the GZIPOutputStream constructor.  Would you like to help review and check whether this change looks reasonable?

BTW, this pr still lack of the negative test cases, I plan to add it when the constructor API is fixed.

Thanks!
Lin

-------------

PR: https://git.openjdk.java.net/jdk/pull/3072
Reply | Threaded
Open this post in threaded view
|

Re: RFR: 4890732: GZIPOutputStream doesn't support optional GZIP fields [v3]

Lin Zang-2
On Fri, 2 Apr 2021 08:53:20 GMT, Lin Zang <[hidden email]> wrote:

>> Hi Lin,
>>
>> On Mar 24, 2021, at 2:51 AM, Lin Zang ***@***.******@***.***>> wrote:
>>
>>
>>
>> Hi Lance,
>> Thanks a lot for your review. I will update the PR ASAP.
>> May I ask your help to also review the CSR?
>>
>> I believe we need to flush out some of the issues I raised that were not test related as they will result in updates to the javadoc which will require an update to the CSR.
>>
>>
>>
>> Thanks!
>>
>> BRs,
>> Lin
>>
>> —
>> You are receiving this because you commented.
>> Reply to this email directly, view it on GitHub<https://github.com/openjdk/jdk/pull/3072#issuecomment-805549600>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/ABQPFO5R4INTAUFZ3QS6WL3TFGDXHANCNFSM4ZMLU6SA>.
>>
>> ***@***.***
>>
>>
>>
>> Lance Andersen| Principal Member of Technical Staff | +1.781.442.2037
>> Oracle Java Engineering
>> 1 Network Drive
>> Burlington, MA 01803
>> ***@***.******@***.***>
>
> Dear @LanceAndersen,
> Here I added a class `GZIPHeaderData` which is defined to hold all gzip header members, and make it an argument of the GZIPOutputStream constructor.  Would you like to help review and check whether this change looks reasonable?
>
> BTW, this pr still lack of the negative test cases, I plan to add it when the constructor API is fixed.
>
> Thanks!
> Lin

Dear All,
May I ask your help to review this change? Thanks!

BRs,
Lin

-------------

PR: https://git.openjdk.java.net/jdk/pull/3072
Reply | Threaded
Open this post in threaded view
|

Re: RFR: 4890732: GZIPOutputStream doesn't support optional GZIP fields [v3]

Alan Bateman-2
On Thu, 8 Apr 2021 08:29:36 GMT, Lin Zang <[hidden email]> wrote:

> Dear All,
> May I ask your help to review this change? Thanks!

@LanceAndersen Do you have cycles to help Lin? This proposal will require discussion, they may be case for the header to be a record for example. My personal view is that the PR should be set aside until there is at least at least some agreement on the API.

-------------

PR: https://git.openjdk.java.net/jdk/pull/3072
Reply | Threaded
Open this post in threaded view
|

Re: RFR: 4890732: GZIPOutputStream doesn't support optional GZIP fields [v3]

Lance Andersen-3
In reply to this post by Lin Zang-2
Hi Lin,

First thank you for your efforts on this feature.  As Alan suggested on March 26, we should take a step back and flush out the overall changes to the API before moving forward with additional PR reviews.  

We should look to see if it makes sense to use some of the more recent java features such as Record.

If we are adding a specific class to hold the header fields,  perhaps using the builder pattern should be considered vs  constructors.

If we are writing out these additional fields, what should we do with them when reading  a gzip file back?

Have you looked at other gzip api implementations to see what they provide in this area?    

Is there anyone who relies on this additional header info?  As I mentioned in an earlier comment, we should validate the changes against another implementation to ensure that we can read the data back correctly and that the additional header data that we write can be read by other tools.

Once we reach a consensus on the overall API changes and/or additions, then we can look to move forward with the next PR review.

Alan,  Yes I am happy to help Lin on moving this forward.

> On Apr 8, 2021, at 4:32 AM, Lin Zang <[hidden email]> wrote:
>
> On Fri, 2 Apr 2021 08:53:20 GMT, Lin Zang <[hidden email] <mailto:[hidden email]>> wrote:
>
>>> Hi Lin,
>>>
>>> On Mar 24, 2021, at 2:51 AM, Lin Zang ***@***.******@***.***>> wrote:
>>>
>>>
>>>
>>> Hi Lance,
>>> Thanks a lot for your review. I will update the PR ASAP.
>>> May I ask your help to also review the CSR?
>>>
>>> I believe we need to flush out some of the issues I raised that were not test related as they will result in updates to the javadoc which will require an update to the CSR.
>>>
>>>
>>>
>>> Thanks!
>>>
>>> BRs,
>>> Lin
>>>
>>> —
>>> You are receiving this because you commented.
>>> Reply to this email directly, view it on GitHub<https://github.com/openjdk/jdk/pull/3072#issuecomment-805549600>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/ABQPFO5R4INTAUFZ3QS6WL3TFGDXHANCNFSM4ZMLU6SA>.
>>>
>>> ***@***.***
>>>
>>>
>>>
>>> Lance Andersen| Principal Member of Technical Staff | +1.781.442.2037
>>> Oracle Java Engineering
>>> 1 Network Drive
>>> Burlington, MA 01803
>>> ***@***.******@***.***>
>>
>> Dear @LanceAndersen,
>> Here I added a class `GZIPHeaderData` which is defined to hold all gzip header members, and make it an argument of the GZIPOutputStream constructor.  Would you like to help review and check whether this change looks reasonable?
>>
>> BTW, this pr still lack of the negative test cases, I plan to add it when the constructor API is fixed.
>>
>> Thanks!
>> Lin
>
> Dear All,
> May I ask your help to review this change? Thanks!
>
> BRs,
> Lin
>
> -------------
>
> PR: https://git.openjdk.java.net/jdk/pull/3072 <https://git.openjdk.java.net/jdk/pull/3072>