Getting a live view of environment variables (Gradle and JDK 9)

classic Classic list List threaded Threaded
54 messages Options
123
Reply | Threaded
Open this post in threaded view
|

Getting a live view of environment variables (Gradle and JDK 9)

Cédric Champeau
Hi all,

I'm writing this on behalf of the Gradle team. This email is closely
related to the other thread just posted today, but just a timeline
coincidence (just like the email exchange I had today about this with Alan
Bateman ;)) and not exactly the same issue.

We are in the process of making sure Gradle runs properly on JDK 9, but
there's an issue which is unresolved so far, and probably requires a new
API. It's described at [1], and I have discussed this at Devoxx France with
Rémi Forax who suggested to post something here.

In short, Gradle is a build tool which supports building a variety of
different things, from Java to C++. The JVM happens to be its runtime
environment, and Gradle has what we call the Gradle Daemon [2] which is a
long running process, benefiting from the JIT, aimed at effectively running
builds. When the build starts, a client connects to the daemon and sends a
"build request". A daemon will run a single build at a time, and there are
several cases which would trigger a new daemon to spawn (the daemon JVM
arguments are one) but the environment variables are *not*. Since the
daemon is a long running process, it is possible that the environment
variables are mutated between the moment the daemon was spawned (in a
previous build) and the moment the build is executed.

What we do, now, is to send the environment variables of the client to the
daemon, which *mutates* the existing environment variables map provided by
System.getenv. This is exactly what is described in [1] as being sneaky (it
is) and broken in JDK 9 (since the underlying map doesn't exist anymore).
However, there are valid use cases for this:

   - in practice, environment variables are not immutable. It is especially
   true for long running process.
   - native programs can mutate the environment variables. Even if it's not
   recommended, it is possible and legal.
   - Gradle runs in a "blackbox": we don't know what plugins are doing.
   Even if we provide an API which gives access to "environment variables",
   and that those environment variables are not the ones returned by
   System.getenv, plugin authors would have to use this new API to get
   correct information. However, they may use libraries which access
   System.getenv directly, or use native APIs which would get out-of-sync
   information.
   - we need to propagate the environment to forked process (typically,
   forked compilers and worker daemons)

This means that today, we use JNI to effectively mutate the environment
variables of running process (that’s one of the purposes of the
native-platform project). Then, we mutate the backing map of the JDK to
reflect those changes, otherwise the mutation is not visible from Java code.

What can we do now?

   - Have the JDK honor the fact that environment variables *can* be
   mutated, because it just happens. In short, don't create an immutable copy
   of environment variables at startup, but provide a live view of the
   environment variables (using the existing APIs, System.getenv, would be the
   best thing because it would be immediately visible to all consumers,
   including 3rd party code run in plugins). In addition (but not mandatory),
   you could provide us with an API to set environment variables directly from
   Java. This would avoid JNI calls to do this. However, it’s not mandatory,
   because the live view of environment variables would just work in this case.
   - Last, but we would really, really avoid to do this, spawn a new daemon
   if we detect that the environment variables have changed (diff between what
   the client has and the daemon sees). The major drawback of this approach is
   that it kills performance, since a new daemon would have to be spawned. And
   it is likely to do so each time something (through native code, for
   example), mutates environment variables. A very simple example is
the PWD environment
   variables on Linux which contains the working directory. Basically changing
   the directory would be enough to spawn a new daemon. Another example is the
   TERM_SESSION_ID one, which means that 2 different terminals would force
   us to spawn 2 different Gradle daemons. We could, of course, have a list of
   “blessed” environments variables that we don’t trust, but it’s very easily
   broken, and no good design. That’s why, even if it’s possible, we don’t
   consider this a solution.

Thanks for considering our request, which is currently a blocker for us
(understand, have Gradle running properly under JDK 9).

[1] https://github.com/adammurdoch/native-platform/issues/16
[2] https://docs.gradle.org/current/userguide/gradle_daemon.html
Reply | Threaded
Open this post in threaded view
|

Re: Getting a live view of environment variables (Gradle and JDK 9)

Martin Buchholz-3
It seems to be the day for environment variable mutation.

System.getenv was always designed as an immutable snapshot, and more
recently there's emerging consensus that one must never ever mutate
environment variables in a multi-threaded Unix program.  One tends to
acquire a biased view on this after chasing down production programs
mysteriously crashing in getenv.

https://bugs.openjdk.java.net/browse/JDK-8173654
https://bugs.openjdk.java.net/browse/JDK-4953367

On Wed, May 10, 2017 at 2:37 PM, Cédric Champeau <[hidden email]>
wrote:

> Hi all,
>
> I'm writing this on behalf of the Gradle team. This email is closely
> related to the other thread just posted today, but just a timeline
> coincidence (just like the email exchange I had today about this with Alan
> Bateman ;)) and not exactly the same issue.
>
> We are in the process of making sure Gradle runs properly on JDK 9, but
> there's an issue which is unresolved so far, and probably requires a new
> API. It's described at [1], and I have discussed this at Devoxx France with
> Rémi Forax who suggested to post something here.
>
> In short, Gradle is a build tool which supports building a variety of
> different things, from Java to C++. The JVM happens to be its runtime
> environment, and Gradle has what we call the Gradle Daemon [2] which is a
> long running process, benefiting from the JIT, aimed at effectively running
> builds. When the build starts, a client connects to the daemon and sends a
> "build request". A daemon will run a single build at a time, and there are
> several cases which would trigger a new daemon to spawn (the daemon JVM
> arguments are one) but the environment variables are *not*. Since the
> daemon is a long running process, it is possible that the environment
> variables are mutated between the moment the daemon was spawned (in a
> previous build) and the moment the build is executed.
>
> What we do, now, is to send the environment variables of the client to the
> daemon, which *mutates* the existing environment variables map provided by
> System.getenv. This is exactly what is described in [1] as being sneaky (it
> is) and broken in JDK 9 (since the underlying map doesn't exist anymore).
> However, there are valid use cases for this:
>
>    - in practice, environment variables are not immutable. It is especially
>    true for long running process.
>    - native programs can mutate the environment variables. Even if it's not
>    recommended, it is possible and legal.
>    - Gradle runs in a "blackbox": we don't know what plugins are doing.
>    Even if we provide an API which gives access to "environment variables",
>    and that those environment variables are not the ones returned by
>    System.getenv, plugin authors would have to use this new API to get
>    correct information. However, they may use libraries which access
>    System.getenv directly, or use native APIs which would get out-of-sync
>    information.
>    - we need to propagate the environment to forked process (typically,
>    forked compilers and worker daemons)
>
> This means that today, we use JNI to effectively mutate the environment
> variables of running process (that’s one of the purposes of the
> native-platform project). Then, we mutate the backing map of the JDK to
> reflect those changes, otherwise the mutation is not visible from Java
> code.
>
> What can we do now?
>
>    - Have the JDK honor the fact that environment variables *can* be
>    mutated, because it just happens. In short, don't create an immutable
> copy
>    of environment variables at startup, but provide a live view of the
>    environment variables (using the existing APIs, System.getenv, would be
> the
>    best thing because it would be immediately visible to all consumers,
>    including 3rd party code run in plugins). In addition (but not
> mandatory),
>    you could provide us with an API to set environment variables directly
> from
>    Java. This would avoid JNI calls to do this. However, it’s not
> mandatory,
>    because the live view of environment variables would just work in this
> case.
>    - Last, but we would really, really avoid to do this, spawn a new daemon
>    if we detect that the environment variables have changed (diff between
> what
>    the client has and the daemon sees). The major drawback of this
> approach is
>    that it kills performance, since a new daemon would have to be spawned.
> And
>    it is likely to do so each time something (through native code, for
>    example), mutates environment variables. A very simple example is
> the PWD environment
>    variables on Linux which contains the working directory. Basically
> changing
>    the directory would be enough to spawn a new daemon. Another example is
> the
>    TERM_SESSION_ID one, which means that 2 different terminals would force
>    us to spawn 2 different Gradle daemons. We could, of course, have a
> list of
>    “blessed” environments variables that we don’t trust, but it’s very
> easily
>    broken, and no good design. That’s why, even if it’s possible, we don’t
>    consider this a solution.
>
> Thanks for considering our request, which is currently a blocker for us
> (understand, have Gradle running properly under JDK 9).
>
> [1] https://github.com/adammurdoch/native-platform/issues/16
> [2] https://docs.gradle.org/current/userguide/gradle_daemon.html
>
Reply | Threaded
Open this post in threaded view
|

Re: Getting a live view of environment variables (Gradle and JDK 9)

Martin Buchholz-3
You're already acquiring superpowers and voiding any warranty by calling
putenv from JNI.  Why not go further and do your reflective shenanigans via
JNI as well?  In practice, if hotspot has started up and only one java
thread is doing anything, putenv is unlikely to cause a crash (and probably
none of your users have reported any such crash).
Reply | Threaded
Open this post in threaded view
|

Re: Getting a live view of environment variables (Gradle and JDK 9)

David Holmes
In reply to this post by Cédric Champeau
Hi Cedric,

Like Martin I am somewhat surprised by the recent surge in interest in
the JVM environment :)

 From a pragmatic perspective it is far too late to do anything about
this in 9.

I am curious what kind of environment variables are being set by the
client, how they are shared with the daemon, and what code actually
reads their values?

As Martin suggests your only recourse may be to do everything via JNI.

David

On 11/05/2017 7:37 AM, Cédric Champeau wrote:

> Hi all,
>
> I'm writing this on behalf of the Gradle team. This email is closely
> related to the other thread just posted today, but just a timeline
> coincidence (just like the email exchange I had today about this with Alan
> Bateman ;)) and not exactly the same issue.
>
> We are in the process of making sure Gradle runs properly on JDK 9, but
> there's an issue which is unresolved so far, and probably requires a new
> API. It's described at [1], and I have discussed this at Devoxx France with
> Rémi Forax who suggested to post something here.
>
> In short, Gradle is a build tool which supports building a variety of
> different things, from Java to C++. The JVM happens to be its runtime
> environment, and Gradle has what we call the Gradle Daemon [2] which is a
> long running process, benefiting from the JIT, aimed at effectively running
> builds. When the build starts, a client connects to the daemon and sends a
> "build request". A daemon will run a single build at a time, and there are
> several cases which would trigger a new daemon to spawn (the daemon JVM
> arguments are one) but the environment variables are *not*. Since the
> daemon is a long running process, it is possible that the environment
> variables are mutated between the moment the daemon was spawned (in a
> previous build) and the moment the build is executed.
>
> What we do, now, is to send the environment variables of the client to the
> daemon, which *mutates* the existing environment variables map provided by
> System.getenv. This is exactly what is described in [1] as being sneaky (it
> is) and broken in JDK 9 (since the underlying map doesn't exist anymore).
> However, there are valid use cases for this:
>
>    - in practice, environment variables are not immutable. It is especially
>    true for long running process.
>    - native programs can mutate the environment variables. Even if it's not
>    recommended, it is possible and legal.
>    - Gradle runs in a "blackbox": we don't know what plugins are doing.
>    Even if we provide an API which gives access to "environment variables",
>    and that those environment variables are not the ones returned by
>    System.getenv, plugin authors would have to use this new API to get
>    correct information. However, they may use libraries which access
>    System.getenv directly, or use native APIs which would get out-of-sync
>    information.
>    - we need to propagate the environment to forked process (typically,
>    forked compilers and worker daemons)
>
> This means that today, we use JNI to effectively mutate the environment
> variables of running process (that’s one of the purposes of the
> native-platform project). Then, we mutate the backing map of the JDK to
> reflect those changes, otherwise the mutation is not visible from Java code.
>
> What can we do now?
>
>    - Have the JDK honor the fact that environment variables *can* be
>    mutated, because it just happens. In short, don't create an immutable copy
>    of environment variables at startup, but provide a live view of the
>    environment variables (using the existing APIs, System.getenv, would be the
>    best thing because it would be immediately visible to all consumers,
>    including 3rd party code run in plugins). In addition (but not mandatory),
>    you could provide us with an API to set environment variables directly from
>    Java. This would avoid JNI calls to do this. However, it’s not mandatory,
>    because the live view of environment variables would just work in this case.
>    - Last, but we would really, really avoid to do this, spawn a new daemon
>    if we detect that the environment variables have changed (diff between what
>    the client has and the daemon sees). The major drawback of this approach is
>    that it kills performance, since a new daemon would have to be spawned. And
>    it is likely to do so each time something (through native code, for
>    example), mutates environment variables. A very simple example is
> the PWD environment
>    variables on Linux which contains the working directory. Basically changing
>    the directory would be enough to spawn a new daemon. Another example is the
>    TERM_SESSION_ID one, which means that 2 different terminals would force
>    us to spawn 2 different Gradle daemons. We could, of course, have a list of
>    “blessed” environments variables that we don’t trust, but it’s very easily
>    broken, and no good design. That’s why, even if it’s possible, we don’t
>    consider this a solution.
>
> Thanks for considering our request, which is currently a blocker for us
> (understand, have Gradle running properly under JDK 9).
>
> [1] https://github.com/adammurdoch/native-platform/issues/16
> [2] https://docs.gradle.org/current/userguide/gradle_daemon.html
>
Reply | Threaded
Open this post in threaded view
|

Re: Getting a live view of environment variables (Gradle and JDK 9)

Martin Buchholz-3
Since you're OK with doing brain surgery on the JDK and you control the
entire process, consider controlling your daemon with a bytecode rewriting
agent (e.g. byteman) that changes the definition of System.getenv etc.

(Whose side are you on Martin?! ...  I confess I also wish for a faster
gradle ...)
Reply | Threaded
Open this post in threaded view
|

Re: Getting a live view of environment variables (Gradle and JDK 9)

Cédric Champeau
Thanks for the answers, folks, but I think they are kind of missing the
point. The fact is that environment variables *are* mutable. Java happens
to return an immutable view of the environment variables when the VM was
started, which is not the reality. We cannot trust what `System.geteenv`
returns. The Javadocs say "current system environment" but it's just not
true. The method should be called `getEnvWhenTheVMWasStarted`.

We also have the problem that the environment of the client and the daemon
differ over time, as I explained in the previous email. And it is pretty
easy to understand that _when the build runs_, we need to get the
environment variables of the _client_, not the daemon. Imagine something as
simple as this:

String toolPath = System.getenv('SOMETOOL_HOME')

and imagine that this code runs in the daemon. When the daemon is started,
there's absolutely no guarantee that this variable is going to exist.
However, we know that when we're going to execute the build, this variable
*has* to be defined. Obviously, it's going to be done from the CLI:

$ export SOMETOOL_HOME = ...
$ ./gradlew run ---> connects to the daemon, passes environment variables,
mutates those of the daemon, which then executes the build

Similarly, from a *single* CLI/terminal, it's very common to change the
values of environment variables (because, hey, they are mutable!):

$ export SOMETOOL_HOME = another path I want to try out
$ ./gradlew run --> ... must update env vars again

So, to work around the problem that Java doesn't provide a way to mutate
the current environment variables, we perform a JNI call to do it. *Then*
we need to mutate the "immutable view" that Java provides, or all Java code
is going to get wrong information, and we would never succeed. Note that
having to resort on JNI to mutate the environment is not the problem. We
can live with that. Once can think it's a bad idea to allow mutation, in
practice, it is necessary, but I reckon it could be a bad idea to have an
API for this. The *real* issue is that `System.getenv` does *not* return
the real values, because it's an immutable view of what existed at the VM
startup.

Note that it's not recommended to mutate environment variables in a
multi-threaded environment. But in practice, you can assimilate what we do
to the VM startup: we get environment variables, mutate them, _then_ the
build runs (and only at that moment we would effectively be
multi-threaded). We can live with potential issues of mutating the
environment: the benefits outperforms the drawbacks by orders of magnitude.
When the daemon is activated, we're not talking about 10% faster builds.
They can effectively be 3, 5 or 10x faster!

Now, back to the problem with JDK 9:

- first, the implementation has changed. But we could adapt our code.
- second, calling `setAccessible` is not allowed anymore. And this is a
MAJOR pita. The problem is that we're trying to access the java base
module, and reflection on that module is not allowed anymore. There are no
good solutions for this: we could write a JVM agent, like you suggested,
that rewrites bytecode. But since we're trying to rewrite a core class, it
would have to be found on the invocation of `java` command itself, and
cannot be dynamically loaded. In addition, we would have to add a flag to
open core classes to rewrite. There are multiple problems with this
approach: first, we don't know on which environment we run before we're
started. Gradle can run on JDK 7, 8, 9, ... and we cannot have a startup
script which tries to infer the version from whatever it finds, this is not
realistic and would add unacceptable latency on the client side (plus, a
lot of headaches to support the various environments we support: Linux,
Mac, Windows, ...). Second, we may not have a hand on the CLI at all. For
example, if the build runs in Travis CI, Amazon, or whatever cloudish thing
that spawns its own containers, we cannot tweak the VM arguments. We just
have to use whatever is there. Last, rewriting bytecode has a cost, that
you pay at startup.

Again, what we need is that Java just returns the live view of the
environment variables. Allowing *mutation* is not necessary, technically
speaking, because we can rely on JNI. However, I reckon that returning an
immutable view is done for performance reasons. That's why adding a way to
mutate "the view" would be an acceptable thing I think. No reflection, no
bytecode rewriting, just give a way to mutate the map that
`ProcessEnvironment` retains. The advantage of this is that you would not
effectively mutate the environment variables, so you, on the JVM side,
would be safe. What you would mutate is the view you are retaining.
Alternatively, provide us with an API to "refresh" the view. Like,
`System.getenv(true)`. The benefit would be that you would only have to get
the new view of environment variables if the user asks for it. And,
subsequent callers would get the refreshed view, so people using `gettenv`
in their code would be up-to-date.

I'm really concerned that this problem is not taken seriously. It's a major
performance problem for the most widely used build tool out there
(considering all users, from native to Java and Android). Just saying
"you're doing something nasty" is not a valid answer: we have to do it, and
do it for good reasons.


2017-05-11 4:50 GMT+02:00 Martin Buchholz <[hidden email]>:

> Since you're OK with doing brain surgery on the JDK and you control the
> entire process, consider controlling your daemon with a bytecode rewriting
> agent (e.g. byteman) that changes the definition of System.getenv etc.
>
> (Whose side are you on Martin?! ...  I confess I also wish for a faster
> gradle ...)
>
Reply | Threaded
Open this post in threaded view
|

Re: Getting a live view of environment variables (Gradle and JDK 9)

Alan Bateman
On 11/05/2017 08:02, Cédric Champeau wrote:

> Thanks for the answers, folks, but I think they are kind of missing the
> point. The fact is that environment variables *are* mutable. Java happens
> to return an immutable view of the environment variables when the VM was
> started, which is not the reality. We cannot trust what `System.geteenv`
> returns. The Javadocs say "current system environment" but it's just not
> true. The method should be called `getEnvWhenTheVMWasStarted`.
>
> We also have the problem that the environment of the client and the daemon
> differ over time, as I explained in the previous email. And it is pretty
> easy to understand that _when the build runs_, we need to get the
> environment variables of the _client_, not the daemon. Imagine something as
> simple as this:
>
> String toolPath = System.getenv('SOMETOOL_HOME')
>
> and imagine that this code runs in the daemon. When the daemon is started,
> there's absolutely no guarantee that this variable is going to exist.
> However, we know that when we're going to execute the build, this variable
> *has* to be defined. Obviously, it's going to be done from the CLI:
>
> $ export SOMETOOL_HOME = ...
> $ ./gradlew run ---> connects to the daemon, passes environment variables,
> mutates those of the daemon, which then executes the build
>
> Similarly, from a *single* CLI/terminal, it's very common to change the
> values of environment variables (because, hey, they are mutable!):
>
> $ export SOMETOOL_HOME = another path I want to try out
> $ ./gradlew run --> ... must update env vars again
>
One thing that isn't clear to me is what is reading the env variables in
the build.

If native tools are reading the env variables then the daemon must be
executing them in a child process, in which case there is no need for
the env variables to be in the daemon's environment.

On the other hand, maybe you mean tools written in Java that are invoked
in the same VM as the daemon (you mentioned warm up in your first mail).
Do these tools have any other way to specify the configuration? Can you
give specific examples? I'm just wondering if these in-process tools can
pick up configuration via system properties or their main method.

BTW: The discussion reminds me a bit of jtreg test harness that we use
for testing the JDK. It maintains a pool of agents VM and will reuse if
possible. If tests need special options then it may have to startup a
new agent VM to do the test that needs the special options.

-Alan
Reply | Threaded
Open this post in threaded view
|

Re: Getting a live view of environment variables (Gradle and JDK 9)

David Holmes
In reply to this post by Cédric Champeau
Hi Cedric,

On 11/05/2017 5:02 PM, Cédric Champeau wrote:
> Thanks for the answers, folks, but I think they are kind of missing the
> point. The fact is that environment variables *are* mutable. Java

Yes they are, provided you do it carefully and don't have to deal with
the fact that its basically just a malloc'd char* array with no
requirements for any kind of thread-safety. And given the JVM is
multi-threaded almost as soon as it is launched we're not off to a great
start in terms of safe and reliable access to a potentially concurrently
mutable environment.

> happens to return an immutable view of the environment variables when
> the VM was started, which is not the reality. We cannot trust what
> `System.geteenv` returns. The Javadocs say "current system environment"
> but it's just not true. The method should be called
> `getEnvWhenTheVMWasStarted`.

It takes the snapshot on first call, but yes the docs are wrong.

> We also have the problem that the environment of the client and the
> daemon differ over time, as I explained in the previous email. And it is
> pretty easy to understand that _when the build runs_, we need to get the
> environment variables of the _client_, not the daemon. Imagine something
> as simple as this:
>
> String toolPath = System.getenv('SOMETOOL_HOME')
>
> and imagine that this code runs in the daemon. When the daemon is
> started, there's absolutely no guarantee that this variable is going to
> exist. However, we know that when we're going to execute the build, this
> variable *has* to be defined. Obviously, it's going to be done from the CLI:
>
> $ export SOMETOOL_HOME = ...
> $ ./gradlew run ---> connects to the daemon, passes environment
> variables, mutates those of the daemon, which then executes the build

Thanks for clarifying the scenario. So IIUC there is a set of env vars
defined that provide the communication between the client and the
daemon. The value of those vars in the client are passed to the daemon
(via some normal IPC mechanism) and then JNI code on the daemon is used
to update the daemon's process env so that if the Java getenv looked it
would see the updated value.

> Similarly, from a *single* CLI/terminal, it's very common to change the
> values of environment variables (because, hey, they are mutable!):
> $ export SOMETOOL_HOME = another path I want to try out
> $ ./gradlew run --> ... must update env vars again
>
> So, to work around the problem that Java doesn't provide a way to mutate
> the current environment variables, we perform a JNI call to do it.
> *Then* we need to mutate the "immutable view" that Java provides, or all
> Java code is going to get wrong information, and we would never succeed.
> Note that having to resort on JNI to mutate the environment is not the
> problem. We can live with that. Once can think it's a bad idea to allow
> mutation, in practice, it is necessary, but I reckon it could be a bad
> idea to have an API for this. The *real* issue is that `System.getenv`
> does *not* return the real values, because it's an immutable view of
> what existed at the VM startup.

Understood.

> Note that it's not recommended to mutate environment variables in a
> multi-threaded environment. But in practice, you can assimilate what we
> do to the VM startup: we get environment variables, mutate them, _then_
> the build runs (and only at that moment we would effectively be
> multi-threaded). We can live with potential issues of mutating the
> environment: the benefits outperforms the drawbacks by orders of
> magnitude. When the daemon is activated, we're not talking about 10%
> faster builds. They can effectively be 3, 5 or 10x faster!
>
> Now, back to the problem with JDK 9:
>
> - first, the implementation has changed. But we could adapt our code.
> - second, calling `setAccessible` is not allowed anymore. And this is a
> MAJOR pita. The problem is that we're trying to access the java base
> module, and reflection on that module is not allowed anymore. There are

I was going to suggest --permit-illegal-access but from what you write
below it isn't feasible to manage this through VM startup options. I
don't know whether there is a programmatic equivalent to the "big kill
switch" - probably not given it will only be there for 9.

> no good solutions for this: we could write a JVM agent, like you
> suggested, that rewrites bytecode. But since we're trying to rewrite a
> core class, it would have to be found on the invocation of `java`
> command itself, and cannot be dynamically loaded. In addition, we would
> have to add a flag to open core classes to rewrite. There are multiple
> problems with this approach: first, we don't know on which environment
> we run before we're started. Gradle can run on JDK 7, 8, 9, ... and we
> cannot have a startup script which tries to infer the version from
> whatever it finds, this is not realistic and would add unacceptable
> latency on the client side (plus, a lot of headaches to support the
> various environments we support: Linux, Mac, Windows, ...). Second, we
> may not have a hand on the CLI at all. For example, if the build runs in
> Travis CI, Amazon, or whatever cloudish thing that spawns its own
> containers, we cannot tweak the VM arguments. We just have to use
> whatever is there. Last, rewriting bytecode has a cost, that you pay at
> startup.
>
> Again, what we need is that Java just returns the live view of the
> environment variables. Allowing *mutation* is not necessary, technically
> speaking, because we can rely on JNI. However, I reckon that returning
> an immutable view is done for performance reasons. That's why adding a
> way to mutate "the view" would be an acceptable thing I think. No
> reflection, no bytecode rewriting, just give a way to mutate the map
> that `ProcessEnvironment` retains. The advantage of this is that you
> would not effectively mutate the environment variables, so you, on the
> JVM side, would be safe. What you would mutate is the view you are
> retaining. Alternatively, provide us with an API to "refresh" the view.
> Like, `System.getenv(true)`. The benefit would be that you would only
> have to get the new view of environment variables if the user asks for
> it. And, subsequent callers would get the refreshed view, so people
> using `gettenv` in their code would be up-to-date.

That may be feasible. I like the idea of getenv(true) to request a
refresh of the underlying data structure.

> I'm really concerned that this problem is not taken seriously. It's a
> major performance problem for the most widely used build tool out there
> (considering all users, from native to Java and Android). Just saying
> "you're doing something nasty" is not a valid answer: we have to do it,
> and do it for good reasons.

It isn't a matter of not taking the problem seriously. You've raised an
issue for discussion and it is being discussed, but it is very late in
the JDK 9 end game to try and add new APIs. This issue has been known
for a long time and a Gradle issue was filed last September [1]. There
was some discussion on jigsaw-dev [2] last October, and Alan even
suggested back then to raise the "setenv" issue on core-libs-dev.

Cheers,
David

[1] https://issues.gradle.org/browse/GRADLE-3565
[2]
http://mail.openjdk.java.net/pipermail/jigsaw-dev/2016-October/009551.html

>
>
> 2017-05-11 4:50 GMT+02:00 Martin Buchholz <[hidden email]
> <mailto:[hidden email]>>:
>
>     Since you're OK with doing brain surgery on the JDK and you control
>     the entire process, consider controlling your daemon with a bytecode
>     rewriting agent (e.g. byteman) that changes the definition of
>     System.getenv etc.
>
>     (Whose side are you on Martin?! ...  I confess I also wish for a
>     faster gradle ...)
>
>
Reply | Threaded
Open this post in threaded view
|

Re: Getting a live view of environment variables (Gradle and JDK 9)

dalibor topic-2
In reply to this post by Cédric Champeau


On 11.05.2017 09:02, Cédric Champeau wrote:
> Thanks for the answers, folks, but I think they are kind of missing the
> point. The fact is that environment variables *are* mutable.

Unfortunately, they are not safely mutable in multi-threaded programs on
many operating system/libc combinations.

Consider a situation in which one thread calls a not multi-thread safe
setenv() with a long enough piece of data to stuff into an environment
variable that it takes more then an instant, while another thread
simultaneously calls a not multi-thread safe getenv() and reads the
unfinished, corrupted data, potentially reading beyond boundaries and
crashing the application.

In such an environment, the safest thing to do is to not update
environment variables at all. Such environments are very common,
unfortunately (Linux/glibc is a rather popular one).

Unfortunately, you might not really be able to stop JNI code from
calling setenv() or an equivalent (or just messing with char ** environ
directly for fun and profit) at any given point in time so you might not
be sure that the environment is safe to read at any given point in time,
either, except at startup time before anyone has had a chance to mess
with the environment in the first place.

So the one safe thing one can do is what Java already does.

Going beyond in a safe and portable fashion in the JDK itself might be
quite tricky and require a bit of thought, because environment variables
are quite a bit of mess to deal with  - see
http://www.club.cc.cmu.edu/~cmccabe/blog_the_setenv_fiasco.html for a
nice read.

And that's without even starting to think about the security surface of
such functionality. For a start, consider the various ways environment
variables can go wrong elaborated in
https://www.dwheeler.com/secure-programs/Secure-Programs-HOWTO.html#ENVIRONMENT-VARIABLES 
.

So while it may seem that we're missing the point, and you may very well
be right, I would naively suspect that providing functionality to
read/modify/update environment variables in a way that doesn't cause
problems down the road might not be quite as trivial as it might seem.

cheers,
dalibor topic
--
<http://www.oracle.com> Dalibor Topic | Principal Product Manager
Phone: +494089091214 <tel:+494089091214> | Mobile: +491737185961
<tel:+491737185961>

ORACLE Deutschland B.V. & Co. KG | Kühnehöfe 5 | 22761 Hamburg

ORACLE Deutschland B.V. & Co. KG
Hauptverwaltung: Riesstr. 25, D-80992 München
Registergericht: Amtsgericht München, HRA 95603

Komplementärin: ORACLE Deutschland Verwaltung B.V.
Hertogswetering 163/167, 3543 AS Utrecht, Niederlande
Handelsregister der Handelskammer Midden-Niederlande, Nr. 30143697
Geschäftsführer: Alexander van der Ven, Jan Schultheiss, Val Maher

<http://www.oracle.com/commitment> Oracle is committed to developing
practices and products that help protect the environment
Reply | Threaded
Open this post in threaded view
|

Re: Getting a live view of environment variables (Gradle and JDK 9)

Cédric Champeau
In reply to this post by David Holmes
2017-05-11 12:50 GMT+02:00 David Holmes <[hidden email]>:

> Hi Cedric,
>
> On 11/05/2017 5:02 PM, Cédric Champeau wrote:
>
>> Thanks for the answers, folks, but I think they are kind of missing the
>> point. The fact is that environment variables *are* mutable. Java
>>
>
> Yes they are, provided you do it carefully and don't have to deal with the
> fact that its basically just a malloc'd char* array with no requirements
> for any kind of thread-safety. And given the JVM is multi-threaded almost
> as soon as it is launched we're not off to a great start in terms of safe
> and reliable access to a potentially concurrently mutable environment.


Yes, we are aware of that, and it has never been a problem so far. As I
said the gain overwhelms the risks.

>
>
>>
> Thanks for clarifying the scenario. So IIUC there is a set of env vars
> defined that provide the communication between the client and the daemon.
> The value of those vars in the client are passed to the daemon (via some
> normal IPC mechanism) and then JNI code on the daemon is used to update the
> daemon's process env so that if the Java getenv looked it would see the
> updated value.


Yes, and to be very clear, the problem is code running in the daemon
itself, which can be build scripts, plugins, or code in 3rd party jars used
by the scripts or plugins. When we fork a process we have no such issue.

>
>>
> That may be feasible. I like the idea of getenv(true) to request a refresh
> of the underlying data structure.
>
> Do you think it's something that could be added in JDK 9? It seems to be a
rather limited change, but I understand what "feature freeze" means :) One
could argue that the current behavior of `getenv` is a bug :)


> It isn't a matter of not taking the problem seriously. You've raised an
> issue for discussion and it is being discussed, but it is very late in the
> JDK 9 end game to try and add new APIs. This issue has been known for a
> long time and a Gradle issue was filed last September [1]. There was some
> discussion on jigsaw-dev [2] last October, and Alan even suggested back
> then to raise the "setenv" issue on core-libs-dev.
>
>
Yes and I apologize for that: I've been talking to several people offline
about this problem, but without a proper email to this list, it was just
side discussions I'm afraid. I had other priorities... Anyway, if it is
impossible to get a fix for JDK 9, I'm afraid the only solution for us is
an agent that would rewrite the `System` class...


> Cheers,
> David
>
> [1] https://issues.gradle.org/browse/GRADLE-3565
> [2] http://mail.openjdk.java.net/pipermail/jigsaw-dev/2016-Octob
> er/009551.html
>
>>
>>
>> 2017-05-11 4:50 GMT+02:00 Martin Buchholz <[hidden email]
>> <mailto:[hidden email]>>:
>>
>>     Since you're OK with doing brain surgery on the JDK and you control
>>     the entire process, consider controlling your daemon with a bytecode
>>     rewriting agent (e.g. byteman) that changes the definition of
>>     System.getenv etc.
>>
>>     (Whose side are you on Martin?! ...  I confess I also wish for a
>>     faster gradle ...)
>>
>>
>>
Reply | Threaded
Open this post in threaded view
|

Re: Getting a live view of environment variables (Gradle and JDK 9)

Cédric Champeau
In reply to this post by dalibor topic-2
>
>
>>
> Unfortunately, they are not safely mutable in multi-threaded programs on
> many operating system/libc combinations.
>
> But the problem is less about mutating, that it is about reading: the VM
returns wrong values at some point, because it _assumes_ that the
environment variables are not mutated.
Reply | Threaded
Open this post in threaded view
|

Re: Getting a live view of environment variables (Gradle and JDK 9)

dalibor topic-2
On 11.05.2017 18:29, Cédric Champeau wrote:
>
>
>     Unfortunately, they are not safely mutable in multi-threaded
>     programs on many operating system/libc combinations.
>
> But the problem is less about mutating, that it is about reading: the VM
> returns wrong values at some point, because it _assumes_ that the
> environment variables are not mutated.

Right. Assuming that another thread could be simultaneously writing to
the same data structure holding environment variables (char **), reading
itself becomes problematic at such points in time, as you might read a
temporarily corrupted data structure.

I guess the question underneath is if there is a safe point in time when
reading the data could be preformed and no concurrent write from JNI
code corrupting the data when it's partially read is possible.

cheers,
dalibor topic
--
<http://www.oracle.com> Dalibor Topic | Principal Product Manager
Phone: +494089091214 <tel:+494089091214> | Mobile: +491737185961
<tel:+491737185961>

ORACLE Deutschland B.V. & Co. KG | Kühnehöfe 5 | 22761 Hamburg

ORACLE Deutschland B.V. & Co. KG
Hauptverwaltung: Riesstr. 25, D-80992 München
Registergericht: Amtsgericht München, HRA 95603

Komplementärin: ORACLE Deutschland Verwaltung B.V.
Hertogswetering 163/167, 3543 AS Utrecht, Niederlande
Handelsregister der Handelskammer Midden-Niederlande, Nr. 30143697
Geschäftsführer: Alexander van der Ven, Jan Schultheiss, Val Maher

<http://www.oracle.com/commitment> Oracle is committed to developing
practices and products that help protect the environment
Reply | Threaded
Open this post in threaded view
|

Re: Getting a live view of environment variables (Gradle and JDK 9)

David Holmes
Re-sending to core-libs-dev

On 13/05/2017 5:56 PM, David Holmes wrote:

> Hi Dalibor,
>
> On 12/05/2017 11:28 PM, dalibor topic wrote:
>> On 11.05.2017 18:29, Cédric Champeau wrote:
>>>
>>>
>>>     Unfortunately, they are not safely mutable in multi-threaded
>>>     programs on many operating system/libc combinations.
>>>
>>> But the problem is less about mutating, that it is about reading: the VM
>>> returns wrong values at some point, because it _assumes_ that the
>>> environment variables are not mutated.
>>
>> Right. Assuming that another thread could be simultaneously writing to
>> the same data structure holding environment variables (char **), reading
>> itself becomes problematic at such points in time, as you might read a
>> temporarily corrupted data structure.
>>
>> I guess the question underneath is if there is a safe point in time when
>> reading the data could be preformed and no concurrent write from JNI
>> code corrupting the data when it's partially read is possible.
>
> I'm afraid no such safe point guarantee exists at all - even for the
> initial reading of the process environment on the first call to System
> getenv(). There could always potentially be some JNI, or other native
> in-process code, mutating the environ char** at the same time as we
> first read it in the JVM.
>
> But we're not trying to protect against random concurrent updates in the
> current scenario, things are more structured:
> - request comes in with data that says to update certain env vars
> - JNI code updates the env vars
> - the daemon's java code (currently) causes the System.getenv map to be
> updated
> - the "client" code is executed and reads the env var and sees the right
> value
>
> There would have to be a caveat on System.getenv(true) if we went that
> path, that it is up to the user to ensure it is called in as safe a
> manner as possible having regard to any concurrent updates in their
> application code and how the environment is managed on a given platform.
>
> Cheers,
> David
> -----
>
>> cheers,
>> dalibor topic
Reply | Threaded
Open this post in threaded view
|

Re: Getting a live view of environment variables (Gradle and JDK 9)

dalibor topic-2
Thanks for the explanation, David. That doesn't sound much more risky than what we already do today in getenv.

Cheers,
Dalibor

--
<http://www.oracle.com> Dalibor Topic | Principal Product Manager
Phone: +494089091214<tel:+494089091214> | Mobile: +491737185961
<tel:+491737185961>

ORACLE Deutschland B.V. & Co. KG | Kühnehöfe 5 | 22761 Hamburg

ORACLE Deutschland B.V. & Co. KG
Hauptverwaltung: Riesstr. 25, D-80992 München
Registergericht: Amtsgericht München, HRA 95603

Komplementärin: ORACLE Deutschland Verwaltung B.V.
Hertogswetering 163/167, 3543 AS Utrecht, Niederlande
Handelsregister der Handelskammer Midden-Niederlande, Nr. 30143697
Geschäftsführer: Alexander van der Ven, Jan Schultheiss, Val Maher

<http://www.oracle.com/commitment> Oracle is committed to developing
practices and products that help protect the environment

> On 15. May 2017, at 07:14, David Holmes <[hidden email]> wrote:
>
> Re-sending to core-libs-dev
>
>> On 13/05/2017 5:56 PM, David Holmes wrote:
>> Hi Dalibor,
>>
>>> On 12/05/2017 11:28 PM, dalibor topic wrote:
>>>> On 11.05.2017 18:29, Cédric Champeau wrote:
>>>>
>>>>
>>>>    Unfortunately, they are not safely mutable in multi-threaded
>>>>    programs on many operating system/libc combinations.
>>>>
>>>> But the problem is less about mutating, that it is about reading: the VM
>>>> returns wrong values at some point, because it _assumes_ that the
>>>> environment variables are not mutated.
>>>
>>> Right. Assuming that another thread could be simultaneously writing to
>>> the same data structure holding environment variables (char **), reading
>>> itself becomes problematic at such points in time, as you might read a
>>> temporarily corrupted data structure.
>>>
>>> I guess the question underneath is if there is a safe point in time when
>>> reading the data could be preformed and no concurrent write from JNI
>>> code corrupting the data when it's partially read is possible.
>>
>> I'm afraid no such safe point guarantee exists at all - even for the
>> initial reading of the process environment on the first call to System
>> getenv(). There could always potentially be some JNI, or other native
>> in-process code, mutating the environ char** at the same time as we
>> first read it in the JVM.
>>
>> But we're not trying to protect against random concurrent updates in the
>> current scenario, things are more structured:
>> - request comes in with data that says to update certain env vars
>> - JNI code updates the env vars
>> - the daemon's java code (currently) causes the System.getenv map to be
>> updated
>> - the "client" code is executed and reads the env var and sees the right
>> value
>>
>> There would have to be a caveat on System.getenv(true) if we went that
>> path, that it is up to the user to ensure it is called in as safe a
>> manner as possible having regard to any concurrent updates in their
>> application code and how the environment is managed on a given platform.
>>
>> Cheers,
>> David
>> -----
>>
>>> cheers,
>>> dalibor topic
Reply | Threaded
Open this post in threaded view
|

Re: Getting a live view of environment variables (Gradle and JDK 9)

Cédric Champeau
In reply to this post by Cédric Champeau
Thanks Peter for your thoughts, but I don't think it's as simple as that.
As I explained in my original email, there are multiple ways the
environment variables can be mutated, and it can be done _externally_.
Typically, during a task execution, a JNI call performed by a random native
tool could mutate the environment. That's something we, as a build tool,
have to consider as a use case. It's very unlikely but it can happen. This
means that for the same classloader, the environment can change. And for
performance reasons, we cache classloaders between builds, and reuse them
as much as possible (because classloading is far from being cheap).

2017-05-14 19:51 GMT+02:00 Peter Levart <[hidden email]>:

> Hi Cedric,
>
> Chiming in with my thoughts...
>
> It's unfortunate that Gradle plugins or libraries used by plugins use
> environment variables at all. I wonder who was the first? Did Gradle
> introduce the feature of passing client environment to the daemon and
> establishing the view of System.getenv for plugins 1st or did libraries
> used by plugins happen to use environment variables using System.getenv and
> Gradle was just kind enough to provide them a dynamic view controlled by
> client?
>
> In the end it doesn't matter. The fact is that System.getenv is part of
> standard Java API and anyone using it should be aware that by doing so,
> they are limiting their software to be (re)configured only by spawning new
> process(es). UNIX environment was not designed to be mutated during the
> course of a long-running process. Maybe just at process startup/setup when
> conditions are under control (i.e. a single running thread) but later, the
> environment is meant to be read only.
>
> Maybe there is a solution for Gradle and othert though. But that solution,
> I think, is not in exposing a "live" view of process environment through
> System.getenv or new methods to "refresh" the view as you are proposing,
> since that would only encourage people to mutate the process environment
> which, as was established on this thread, is not safe in a long-running
> multi-threaded process, which Java processes usually are. Perhaps the
> solution is in extending the System.getenv API with ways to provide
> "custom" views of variables for code that runs in a "container".
>
> Gradle is, among other things, a container for plugins. Is Gradle running
> plugins in a separate ClassLoader? Does it use a separate ClassLoader for
> each "build" which might bring with it a new set of environment variables
> from the client? In such a setup, one could provide a separate set of
> environment variables for each ClassLoader, simply by passing them to the
> ClassLoader constructor. System.getenv would need to be a @CallerSensitive
> method which would return caller's ClassLoader view of variables, if any
> such view was established, or simply the view inherited from the parent
> ClassLoader.
>
> Such would be a "functional" approach, which would keep environment
> variables immutable, but allow child "contexts" to have different views of
> them. Such approach would also play well with idioms like:
>
> class AbcPlugin {
>     static final String SOME_SETTING = System.getenv("SOME_SETTING");
>
> ...and would also help other containers (such as app servers) support
> existing libraries that use environment variables to be configured
> differently in different apps deployed in the same VM process.
>
> Just a thought.
>
> Regards, Peter
>
>
>
> On 05/11/2017 09:02 AM, Cédric Champeau wrote:
>
> Thanks for the answers, folks, but I think they are kind of missing the
> point. The fact is that environment variables *are* mutable. Java happens
> to return an immutable view of the environment variables when the VM was
> started, which is not the reality. We cannot trust what `System.geteenv`
> returns. The Javadocs say "current system environment" but it's just not
> true. The method should be called `getEnvWhenTheVMWasStarted`.
>
> We also have the problem that the environment of the client and the daemon
> differ over time, as I explained in the previous email. And it is pretty
> easy to understand that _when the build runs_, we need to get the
> environment variables of the _client_, not the daemon. Imagine something as
> simple as this:
>
> String toolPath = System.getenv('SOMETOOL_HOME')
>
> and imagine that this code runs in the daemon. When the daemon is started,
> there's absolutely no guarantee that this variable is going to exist.
> However, we know that when we're going to execute the build, this variable
> *has* to be defined. Obviously, it's going to be done from the CLI:
>
> $ export SOMETOOL_HOME = ...
> $ ./gradlew run ---> connects to the daemon, passes environment variables,
> mutates those of the daemon, which then executes the build
>
> Similarly, from a *single* CLI/terminal, it's very common to change the
> values of environment variables (because, hey, they are mutable!):
>
> $ export SOMETOOL_HOME = another path I want to try out
> $ ./gradlew run --> ... must update env vars again
>
> So, to work around the problem that Java doesn't provide a way to mutate
> the current environment variables, we perform a JNI call to do it. *Then*
> we need to mutate the "immutable view" that Java provides, or all Java code
> is going to get wrong information, and we would never succeed. Note that
> having to resort on JNI to mutate the environment is not the problem. We
> can live with that. Once can think it's a bad idea to allow mutation, in
> practice, it is necessary, but I reckon it could be a bad idea to have an
> API for this. The *real* issue is that `System.getenv` does *not* return
> the real values, because it's an immutable view of what existed at the VM
> startup.
>
> Note that it's not recommended to mutate environment variables in a
> multi-threaded environment. But in practice, you can assimilate what we do
> to the VM startup: we get environment variables, mutate them, _then_ the
> build runs (and only at that moment we would effectively be
> multi-threaded). We can live with potential issues of mutating the
> environment: the benefits outperforms the drawbacks by orders of magnitude.
> When the daemon is activated, we're not talking about 10% faster builds.
> They can effectively be 3, 5 or 10x faster!
>
> Now, back to the problem with JDK 9:
>
> - first, the implementation has changed. But we could adapt our code.
> - second, calling `setAccessible` is not allowed anymore. And this is a
> MAJOR pita. The problem is that we're trying to access the java base
> module, and reflection on that module is not allowed anymore. There are no
> good solutions for this: we could write a JVM agent, like you suggested,
> that rewrites bytecode. But since we're trying to rewrite a core class, it
> would have to be found on the invocation of `java` command itself, and
> cannot be dynamically loaded. In addition, we would have to add a flag to
> open core classes to rewrite. There are multiple problems with this
> approach: first, we don't know on which environment we run before we're
> started. Gradle can run on JDK 7, 8, 9, ... and we cannot have a startup
> script which tries to infer the version from whatever it finds, this is not
> realistic and would add unacceptable latency on the client side (plus, a
> lot of headaches to support the various environments we support: Linux,
> Mac, Windows, ...). Second, we may not have a hand on the CLI at all. For
> example, if the build runs in Travis CI, Amazon, or whatever cloudish thing
> that spawns its own containers, we cannot tweak the VM arguments. We just
> have to use whatever is there. Last, rewriting bytecode has a cost, that
> you pay at startup.
>
> Again, what we need is that Java just returns the live view of the
> environment variables. Allowing *mutation* is not necessary, technically
> speaking, because we can rely on JNI. However, I reckon that returning an
> immutable view is done for performance reasons. That's why adding a way to
> mutate "the view" would be an acceptable thing I think. No reflection, no
> bytecode rewriting, just give a way to mutate the map that
> `ProcessEnvironment` retains. The advantage of this is that you would not
> effectively mutate the environment variables, so you, on the JVM side,
> would be safe. What you would mutate is the view you are retaining.
> Alternatively, provide us with an API to "refresh" the view. Like,
> `System.getenv(true)`. The benefit would be that you would only have to get
> the new view of environment variables if the user asks for it. And,
> subsequent callers would get the refreshed view, so people using `gettenv`
> in their code would be up-to-date.
>
> I'm really concerned that this problem is not taken seriously. It's a major
> performance problem for the most widely used build tool out there
> (considering all users, from native to Java and Android). Just saying
> "you're doing something nasty" is not a valid answer: we have to do it, and
> do it for good reasons.
>
>
> 2017-05-11 4:50 GMT+02:00 Martin Buchholz <[hidden email]> <[hidden email]>:
>
>
> Since you're OK with doing brain surgery on the JDK and you control the
> entire process, consider controlling your daemon with a bytecode rewriting
> agent (e.g. byteman) that changes the definition of System.getenv etc.
>
> (Whose side are you on Martin?! ...  I confess I also wish for a faster
> gradle ...)
>
>
>
>
Reply | Threaded
Open this post in threaded view
|

Re: Getting a live view of environment variables (Gradle and JDK 9)

Mario Torre-5
In reply to this post by David Holmes
2017-05-15 7:14 GMT+02:00 David Holmes <[hidden email]>:
>> There would have to be a caveat on System.getenv(true) if we went that
>> path, that it is up to the user to ensure it is called in as safe a
>> manner as possible having regard to any concurrent updates in their
>> application code and how the environment is managed on a given platform.

If we get a System.setEnv in Java we can synchronise that method
access, either internally or by asking the user to explicitly
synchronise.

That would not protect from direct system calls, but it with a proper
Java API that would be the recommended way to set environment
variables, thus lowering the risk.

The question is if this use case is really that compelling that
justify an official wrapper against set/putEnv.

Cheers,
Mario
--
pgp key: http://subkeys.pgp.net/ PGP Key ID: 80F240CF
Fingerprint: BA39 9666 94EC 8B73 27FA  FC7C 4086 63E3 80F2 40CF

Java Champion - Blog: http://neugens.wordpress.com - Twitter: @neugens
Proud GNU Classpath developer: http://www.classpath.org/
OpenJDK: http://openjdk.java.net/projects/caciocavallo/

Please, support open standards:
http://endsoftpatents.org/
Reply | Threaded
Open this post in threaded view
|

Re: Getting a live view of environment variables (Gradle and JDK 9)

Sanne Grinovero
In reply to this post by Cédric Champeau
Hi Cédric,

we use Gradle a lot in our team (Hibernate) and all your efforts to
make it faster are highly appreciated.

I can assure you that during a normal day of work while we might
rebuild a project a few hundred times, I'll rarely have to change
environment variables.

Obviously I don't represent all users, but as you said yourself the
need to change environment variables is both something you have to
support and something which is extremely rare.
So can't you simply keep the daemon around, as long as environment
variables don't change?

For the rare case in which environment variables have been mutated,
you could trigger a full restart.

As a second step, I should mention that on a typical workstation I
might have multiple terminals open, each configured with a different
environment and possibly working on a different project or different
branch - or just testing on a different environment.
It would be even nicer if one could have a different Gradle daemon
instances running for each environment; incidentally this might have
other benefits, like we often need to switch JDK build or even vendor.
I'd be happy to be able to give them some name as identifier.

I think this solution would improve Gradle and allow the JDK to move
on with this quite nice cleanup.

HTH

Thanks,
Sanne


On Tue, May 16, 2017 at 11:05 AM, Cédric Champeau
<[hidden email]> wrote:

> Thanks Peter for your thoughts, but I don't think it's as simple as that.
> As I explained in my original email, there are multiple ways the
> environment variables can be mutated, and it can be done _externally_.
> Typically, during a task execution, a JNI call performed by a random native
> tool could mutate the environment. That's something we, as a build tool,
> have to consider as a use case. It's very unlikely but it can happen. This
> means that for the same classloader, the environment can change. And for
> performance reasons, we cache classloaders between builds, and reuse them
> as much as possible (because classloading is far from being cheap).
>
> 2017-05-14 19:51 GMT+02:00 Peter Levart <[hidden email]>:
>
>> Hi Cedric,
>>
>> Chiming in with my thoughts...
>>
>> It's unfortunate that Gradle plugins or libraries used by plugins use
>> environment variables at all. I wonder who was the first? Did Gradle
>> introduce the feature of passing client environment to the daemon and
>> establishing the view of System.getenv for plugins 1st or did libraries
>> used by plugins happen to use environment variables using System.getenv and
>> Gradle was just kind enough to provide them a dynamic view controlled by
>> client?
>>
>> In the end it doesn't matter. The fact is that System.getenv is part of
>> standard Java API and anyone using it should be aware that by doing so,
>> they are limiting their software to be (re)configured only by spawning new
>> process(es). UNIX environment was not designed to be mutated during the
>> course of a long-running process. Maybe just at process startup/setup when
>> conditions are under control (i.e. a single running thread) but later, the
>> environment is meant to be read only.
>>
>> Maybe there is a solution for Gradle and othert though. But that solution,
>> I think, is not in exposing a "live" view of process environment through
>> System.getenv or new methods to "refresh" the view as you are proposing,
>> since that would only encourage people to mutate the process environment
>> which, as was established on this thread, is not safe in a long-running
>> multi-threaded process, which Java processes usually are. Perhaps the
>> solution is in extending the System.getenv API with ways to provide
>> "custom" views of variables for code that runs in a "container".
>>
>> Gradle is, among other things, a container for plugins. Is Gradle running
>> plugins in a separate ClassLoader? Does it use a separate ClassLoader for
>> each "build" which might bring with it a new set of environment variables
>> from the client? In such a setup, one could provide a separate set of
>> environment variables for each ClassLoader, simply by passing them to the
>> ClassLoader constructor. System.getenv would need to be a @CallerSensitive
>> method which would return caller's ClassLoader view of variables, if any
>> such view was established, or simply the view inherited from the parent
>> ClassLoader.
>>
>> Such would be a "functional" approach, which would keep environment
>> variables immutable, but allow child "contexts" to have different views of
>> them. Such approach would also play well with idioms like:
>>
>> class AbcPlugin {
>>     static final String SOME_SETTING = System.getenv("SOME_SETTING");
>>
>> ...and would also help other containers (such as app servers) support
>> existing libraries that use environment variables to be configured
>> differently in different apps deployed in the same VM process.
>>
>> Just a thought.
>>
>> Regards, Peter
>>
>>
>>
>> On 05/11/2017 09:02 AM, Cédric Champeau wrote:
>>
>> Thanks for the answers, folks, but I think they are kind of missing the
>> point. The fact is that environment variables *are* mutable. Java happens
>> to return an immutable view of the environment variables when the VM was
>> started, which is not the reality. We cannot trust what `System.geteenv`
>> returns. The Javadocs say "current system environment" but it's just not
>> true. The method should be called `getEnvWhenTheVMWasStarted`.
>>
>> We also have the problem that the environment of the client and the daemon
>> differ over time, as I explained in the previous email. And it is pretty
>> easy to understand that _when the build runs_, we need to get the
>> environment variables of the _client_, not the daemon. Imagine something as
>> simple as this:
>>
>> String toolPath = System.getenv('SOMETOOL_HOME')
>>
>> and imagine that this code runs in the daemon. When the daemon is started,
>> there's absolutely no guarantee that this variable is going to exist.
>> However, we know that when we're going to execute the build, this variable
>> *has* to be defined. Obviously, it's going to be done from the CLI:
>>
>> $ export SOMETOOL_HOME = ...
>> $ ./gradlew run ---> connects to the daemon, passes environment variables,
>> mutates those of the daemon, which then executes the build
>>
>> Similarly, from a *single* CLI/terminal, it's very common to change the
>> values of environment variables (because, hey, they are mutable!):
>>
>> $ export SOMETOOL_HOME = another path I want to try out
>> $ ./gradlew run --> ... must update env vars again
>>
>> So, to work around the problem that Java doesn't provide a way to mutate
>> the current environment variables, we perform a JNI call to do it. *Then*
>> we need to mutate the "immutable view" that Java provides, or all Java code
>> is going to get wrong information, and we would never succeed. Note that
>> having to resort on JNI to mutate the environment is not the problem. We
>> can live with that. Once can think it's a bad idea to allow mutation, in
>> practice, it is necessary, but I reckon it could be a bad idea to have an
>> API for this. The *real* issue is that `System.getenv` does *not* return
>> the real values, because it's an immutable view of what existed at the VM
>> startup.
>>
>> Note that it's not recommended to mutate environment variables in a
>> multi-threaded environment. But in practice, you can assimilate what we do
>> to the VM startup: we get environment variables, mutate them, _then_ the
>> build runs (and only at that moment we would effectively be
>> multi-threaded). We can live with potential issues of mutating the
>> environment: the benefits outperforms the drawbacks by orders of magnitude.
>> When the daemon is activated, we're not talking about 10% faster builds.
>> They can effectively be 3, 5 or 10x faster!
>>
>> Now, back to the problem with JDK 9:
>>
>> - first, the implementation has changed. But we could adapt our code.
>> - second, calling `setAccessible` is not allowed anymore. And this is a
>> MAJOR pita. The problem is that we're trying to access the java base
>> module, and reflection on that module is not allowed anymore. There are no
>> good solutions for this: we could write a JVM agent, like you suggested,
>> that rewrites bytecode. But since we're trying to rewrite a core class, it
>> would have to be found on the invocation of `java` command itself, and
>> cannot be dynamically loaded. In addition, we would have to add a flag to
>> open core classes to rewrite. There are multiple problems with this
>> approach: first, we don't know on which environment we run before we're
>> started. Gradle can run on JDK 7, 8, 9, ... and we cannot have a startup
>> script which tries to infer the version from whatever it finds, this is not
>> realistic and would add unacceptable latency on the client side (plus, a
>> lot of headaches to support the various environments we support: Linux,
>> Mac, Windows, ...). Second, we may not have a hand on the CLI at all. For
>> example, if the build runs in Travis CI, Amazon, or whatever cloudish thing
>> that spawns its own containers, we cannot tweak the VM arguments. We just
>> have to use whatever is there. Last, rewriting bytecode has a cost, that
>> you pay at startup.
>>
>> Again, what we need is that Java just returns the live view of the
>> environment variables. Allowing *mutation* is not necessary, technically
>> speaking, because we can rely on JNI. However, I reckon that returning an
>> immutable view is done for performance reasons. That's why adding a way to
>> mutate "the view" would be an acceptable thing I think. No reflection, no
>> bytecode rewriting, just give a way to mutate the map that
>> `ProcessEnvironment` retains. The advantage of this is that you would not
>> effectively mutate the environment variables, so you, on the JVM side,
>> would be safe. What you would mutate is the view you are retaining.
>> Alternatively, provide us with an API to "refresh" the view. Like,
>> `System.getenv(true)`. The benefit would be that you would only have to get
>> the new view of environment variables if the user asks for it. And,
>> subsequent callers would get the refreshed view, so people using `gettenv`
>> in their code would be up-to-date.
>>
>> I'm really concerned that this problem is not taken seriously. It's a major
>> performance problem for the most widely used build tool out there
>> (considering all users, from native to Java and Android). Just saying
>> "you're doing something nasty" is not a valid answer: we have to do it, and
>> do it for good reasons.
>>
>>
>> 2017-05-11 4:50 GMT+02:00 Martin Buchholz <[hidden email]> <[hidden email]>:
>>
>>
>> Since you're OK with doing brain surgery on the JDK and you control the
>> entire process, consider controlling your daemon with a bytecode rewriting
>> agent (e.g. byteman) that changes the definition of System.getenv etc.
>>
>> (Whose side are you on Martin?! ...  I confess I also wish for a faster
>> gradle ...)
>>
>>
>>
>>
Reply | Threaded
Open this post in threaded view
|

Re: Getting a live view of environment variables (Gradle and JDK 9)

Cédric Champeau
2017-05-16 18:57 GMT+02:00 Sanne Grinovero <[hidden email]>:

> Hi Cédric,
>
> we use Gradle a lot in our team (Hibernate) and all your efforts to
> make it faster are highly appreciated.
>
> I can assure you that during a normal day of work while we might
> rebuild a project a few hundred times, I'll rarely have to change
> environment variables.
>
> Obviously I don't represent all users, but as you said yourself the
> need to change environment variables is both something you have to
> support and something which is extremely rare.

So can't you simply keep the daemon around, as long as environment
> variables don't change?
>

That's one of the suggestions in the original email, but as you can see,
it's not uncommon to have environment variables mutated "behind you back",
even in a simple command line. Sometimes, it's as easy as having 2
different terminals open (which I think is far from being a rare case):
then you cannot share the same daemon instance, which is either very bad
for performance or memory usage. Why? Because the open terminal has some
specific variables (WINDOWID, ...).


>
> For the rare case in which environment variables have been mutated,
> you could trigger a full restart.
>
> As a second step, I should mention that on a typical workstation I
> might have multiple terminals open, each configured with a different
> environment and possibly working on a different project or different
> branch - or just testing on a different environment.
> It would be even nicer if one could have a different Gradle daemon
> instances running for each environment; incidentally this might have
> other benefits, like we often need to switch JDK build or even vendor.
> I'd be happy to be able to give them some name as identifier.
>
>
That's something we consider, but it's not the same use case at all (my
typical setup is pretty much the opposite: a lot of terminals open for the
same environment/build), so it has to be supported differently. I don't
think we should mix this in the game.


> I think this solution would improve Gradle and allow the JDK to move
> on with this quite nice cleanup.
>
> HTH
>
> Thanks,
> Sanne
>
>
> On Tue, May 16, 2017 at 11:05 AM, Cédric Champeau
> <[hidden email]> wrote:
> > Thanks Peter for your thoughts, but I don't think it's as simple as that.
> > As I explained in my original email, there are multiple ways the
> > environment variables can be mutated, and it can be done _externally_.
> > Typically, during a task execution, a JNI call performed by a random
> native
> > tool could mutate the environment. That's something we, as a build tool,
> > have to consider as a use case. It's very unlikely but it can happen.
> This
> > means that for the same classloader, the environment can change. And for
> > performance reasons, we cache classloaders between builds, and reuse them
> > as much as possible (because classloading is far from being cheap).
> >
> > 2017-05-14 19:51 GMT+02:00 Peter Levart <[hidden email]>:
> >
> >> Hi Cedric,
> >>
> >> Chiming in with my thoughts...
> >>
> >> It's unfortunate that Gradle plugins or libraries used by plugins use
> >> environment variables at all. I wonder who was the first? Did Gradle
> >> introduce the feature of passing client environment to the daemon and
> >> establishing the view of System.getenv for plugins 1st or did libraries
> >> used by plugins happen to use environment variables using System.getenv
> and
> >> Gradle was just kind enough to provide them a dynamic view controlled by
> >> client?
> >>
> >> In the end it doesn't matter. The fact is that System.getenv is part of
> >> standard Java API and anyone using it should be aware that by doing so,
> >> they are limiting their software to be (re)configured only by spawning
> new
> >> process(es). UNIX environment was not designed to be mutated during the
> >> course of a long-running process. Maybe just at process startup/setup
> when
> >> conditions are under control (i.e. a single running thread) but later,
> the
> >> environment is meant to be read only.
> >>
> >> Maybe there is a solution for Gradle and othert though. But that
> solution,
> >> I think, is not in exposing a "live" view of process environment through
> >> System.getenv or new methods to "refresh" the view as you are proposing,
> >> since that would only encourage people to mutate the process environment
> >> which, as was established on this thread, is not safe in a long-running
> >> multi-threaded process, which Java processes usually are. Perhaps the
> >> solution is in extending the System.getenv API with ways to provide
> >> "custom" views of variables for code that runs in a "container".
> >>
> >> Gradle is, among other things, a container for plugins. Is Gradle
> running
> >> plugins in a separate ClassLoader? Does it use a separate ClassLoader
> for
> >> each "build" which might bring with it a new set of environment
> variables
> >> from the client? In such a setup, one could provide a separate set of
> >> environment variables for each ClassLoader, simply by passing them to
> the
> >> ClassLoader constructor. System.getenv would need to be a
> @CallerSensitive
> >> method which would return caller's ClassLoader view of variables, if any
> >> such view was established, or simply the view inherited from the parent
> >> ClassLoader.
> >>
> >> Such would be a "functional" approach, which would keep environment
> >> variables immutable, but allow child "contexts" to have different views
> of
> >> them. Such approach would also play well with idioms like:
> >>
> >> class AbcPlugin {
> >>     static final String SOME_SETTING = System.getenv("SOME_SETTING");
> >>
> >> ...and would also help other containers (such as app servers) support
> >> existing libraries that use environment variables to be configured
> >> differently in different apps deployed in the same VM process.
> >>
> >> Just a thought.
> >>
> >> Regards, Peter
> >>
> >>
> >>
> >> On 05/11/2017 09:02 AM, Cédric Champeau wrote:
> >>
> >> Thanks for the answers, folks, but I think they are kind of missing the
> >> point. The fact is that environment variables *are* mutable. Java
> happens
> >> to return an immutable view of the environment variables when the VM was
> >> started, which is not the reality. We cannot trust what `System.geteenv`
> >> returns. The Javadocs say "current system environment" but it's just not
> >> true. The method should be called `getEnvWhenTheVMWasStarted`.
> >>
> >> We also have the problem that the environment of the client and the
> daemon
> >> differ over time, as I explained in the previous email. And it is pretty
> >> easy to understand that _when the build runs_, we need to get the
> >> environment variables of the _client_, not the daemon. Imagine
> something as
> >> simple as this:
> >>
> >> String toolPath = System.getenv('SOMETOOL_HOME')
> >>
> >> and imagine that this code runs in the daemon. When the daemon is
> started,
> >> there's absolutely no guarantee that this variable is going to exist.
> >> However, we know that when we're going to execute the build, this
> variable
> >> *has* to be defined. Obviously, it's going to be done from the CLI:
> >>
> >> $ export SOMETOOL_HOME = ...
> >> $ ./gradlew run ---> connects to the daemon, passes environment
> variables,
> >> mutates those of the daemon, which then executes the build
> >>
> >> Similarly, from a *single* CLI/terminal, it's very common to change the
> >> values of environment variables (because, hey, they are mutable!):
> >>
> >> $ export SOMETOOL_HOME = another path I want to try out
> >> $ ./gradlew run --> ... must update env vars again
> >>
> >> So, to work around the problem that Java doesn't provide a way to mutate
> >> the current environment variables, we perform a JNI call to do it.
> *Then*
> >> we need to mutate the "immutable view" that Java provides, or all Java
> code
> >> is going to get wrong information, and we would never succeed. Note that
> >> having to resort on JNI to mutate the environment is not the problem. We
> >> can live with that. Once can think it's a bad idea to allow mutation, in
> >> practice, it is necessary, but I reckon it could be a bad idea to have
> an
> >> API for this. The *real* issue is that `System.getenv` does *not* return
> >> the real values, because it's an immutable view of what existed at the
> VM
> >> startup.
> >>
> >> Note that it's not recommended to mutate environment variables in a
> >> multi-threaded environment. But in practice, you can assimilate what we
> do
> >> to the VM startup: we get environment variables, mutate them, _then_ the
> >> build runs (and only at that moment we would effectively be
> >> multi-threaded). We can live with potential issues of mutating the
> >> environment: the benefits outperforms the drawbacks by orders of
> magnitude.
> >> When the daemon is activated, we're not talking about 10% faster builds.
> >> They can effectively be 3, 5 or 10x faster!
> >>
> >> Now, back to the problem with JDK 9:
> >>
> >> - first, the implementation has changed. But we could adapt our code.
> >> - second, calling `setAccessible` is not allowed anymore. And this is a
> >> MAJOR pita. The problem is that we're trying to access the java base
> >> module, and reflection on that module is not allowed anymore. There are
> no
> >> good solutions for this: we could write a JVM agent, like you suggested,
> >> that rewrites bytecode. But since we're trying to rewrite a core class,
> it
> >> would have to be found on the invocation of `java` command itself, and
> >> cannot be dynamically loaded. In addition, we would have to add a flag
> to
> >> open core classes to rewrite. There are multiple problems with this
> >> approach: first, we don't know on which environment we run before we're
> >> started. Gradle can run on JDK 7, 8, 9, ... and we cannot have a startup
> >> script which tries to infer the version from whatever it finds, this is
> not
> >> realistic and would add unacceptable latency on the client side (plus, a
> >> lot of headaches to support the various environments we support: Linux,
> >> Mac, Windows, ...). Second, we may not have a hand on the CLI at all.
> For
> >> example, if the build runs in Travis CI, Amazon, or whatever cloudish
> thing
> >> that spawns its own containers, we cannot tweak the VM arguments. We
> just
> >> have to use whatever is there. Last, rewriting bytecode has a cost, that
> >> you pay at startup.
> >>
> >> Again, what we need is that Java just returns the live view of the
> >> environment variables. Allowing *mutation* is not necessary, technically
> >> speaking, because we can rely on JNI. However, I reckon that returning
> an
> >> immutable view is done for performance reasons. That's why adding a way
> to
> >> mutate "the view" would be an acceptable thing I think. No reflection,
> no
> >> bytecode rewriting, just give a way to mutate the map that
> >> `ProcessEnvironment` retains. The advantage of this is that you would
> not
> >> effectively mutate the environment variables, so you, on the JVM side,
> >> would be safe. What you would mutate is the view you are retaining.
> >> Alternatively, provide us with an API to "refresh" the view. Like,
> >> `System.getenv(true)`. The benefit would be that you would only have to
> get
> >> the new view of environment variables if the user asks for it. And,
> >> subsequent callers would get the refreshed view, so people using
> `gettenv`
> >> in their code would be up-to-date.
> >>
> >> I'm really concerned that this problem is not taken seriously. It's a
> major
> >> performance problem for the most widely used build tool out there
> >> (considering all users, from native to Java and Android). Just saying
> >> "you're doing something nasty" is not a valid answer: we have to do it,
> and
> >> do it for good reasons.
> >>
> >>
> >> 2017-05-11 4:50 GMT+02:00 Martin Buchholz <[hidden email]> <
> [hidden email]>:
> >>
> >>
> >> Since you're OK with doing brain surgery on the JDK and you control the
> >> entire process, consider controlling your daemon with a bytecode
> rewriting
> >> agent (e.g. byteman) that changes the definition of System.getenv etc.
> >>
> >> (Whose side are you on Martin?! ...  I confess I also wish for a faster
> >> gradle ...)
> >>
> >>
> >>
> >>
>
Reply | Threaded
Open this post in threaded view
|

RE: Getting a live view of environment variables (Gradle and JDK 9)

Uwe Schindler-4
In reply to this post by Cédric Champeau
Hi,

I still don't understand why you need to change the environment variables of the actual process. I was talking with Rémi about that last week, and it's not obvious to me why you need that. Sure, it is easier to change the environment of the actual process and then spawn a child process for some non-java build tool like GCC or even another standalone java program with option fork=true. But: Why not use the ProcessBuilder API when spawning those childs? So you just add an "official" build API inside Gradle (an official one, documented that allows Gradle plugins to modify the environment variables for the current build running). If you missed to add such an API from the beginning (IMHO its essential for a build system like Gradle), then you now have to tell your users: "Sorry we did something wrong and all our (bad) hacks to allow you to change environment variables are no longer working in the future, so please change your code. We are so sorry!"

If some plugin is not using that new API, then it's not your problem. You document that you *have* to use the Environment API, because plugins cannot rely on the fact that another plugin or maybe another build running at a later time is changing the Gradle process environment.

At some point you have to break backwards compatibility and tell users: Please update your code, otherwise plugin won't work anymore with Gradle version x.y (the one that's compatible to Java 9).

Uwe

-----
Uwe Schindler
[hidden email]
ASF Member, Apache Lucene PMC / Committer
Bremen, Germany
http://lucene.apache.org/

> -----Original Message-----
> From: core-libs-dev [mailto:[hidden email]] On
> Behalf Of Cédric Champeau
> Sent: Thursday, May 11, 2017 9:02 AM
> To: Martin Buchholz <[hidden email]>
> Cc: core-libs-dev <[hidden email]>
> Subject: Re: Getting a live view of environment variables (Gradle and JDK 9)
>
> Thanks for the answers, folks, but I think they are kind of missing the
> point. The fact is that environment variables *are* mutable. Java happens
> to return an immutable view of the environment variables when the VM was
> started, which is not the reality. We cannot trust what `System.geteenv`
> returns. The Javadocs say "current system environment" but it's just not
> true. The method should be called `getEnvWhenTheVMWasStarted`.
>
> We also have the problem that the environment of the client and the
> daemon
> differ over time, as I explained in the previous email. And it is pretty
> easy to understand that _when the build runs_, we need to get the
> environment variables of the _client_, not the daemon. Imagine something
> as
> simple as this:
>
> String toolPath = System.getenv('SOMETOOL_HOME')
>
> and imagine that this code runs in the daemon. When the daemon is started,
> there's absolutely no guarantee that this variable is going to exist.
> However, we know that when we're going to execute the build, this variable
> *has* to be defined. Obviously, it's going to be done from the CLI:
>
> $ export SOMETOOL_HOME = ...
> $ ./gradlew run ---> connects to the daemon, passes environment variables,
> mutates those of the daemon, which then executes the build
>
> Similarly, from a *single* CLI/terminal, it's very common to change the
> values of environment variables (because, hey, they are mutable!):
>
> $ export SOMETOOL_HOME = another path I want to try out
> $ ./gradlew run --> ... must update env vars again
>
> So, to work around the problem that Java doesn't provide a way to mutate
> the current environment variables, we perform a JNI call to do it. *Then*
> we need to mutate the "immutable view" that Java provides, or all Java code
> is going to get wrong information, and we would never succeed. Note that
> having to resort on JNI to mutate the environment is not the problem. We
> can live with that. Once can think it's a bad idea to allow mutation, in
> practice, it is necessary, but I reckon it could be a bad idea to have an
> API for this. The *real* issue is that `System.getenv` does *not* return
> the real values, because it's an immutable view of what existed at the VM
> startup.
>
> Note that it's not recommended to mutate environment variables in a
> multi-threaded environment. But in practice, you can assimilate what we do
> to the VM startup: we get environment variables, mutate them, _then_ the
> build runs (and only at that moment we would effectively be
> multi-threaded). We can live with potential issues of mutating the
> environment: the benefits outperforms the drawbacks by orders of
> magnitude.
> When the daemon is activated, we're not talking about 10% faster builds.
> They can effectively be 3, 5 or 10x faster!
>
> Now, back to the problem with JDK 9:
>
> - first, the implementation has changed. But we could adapt our code.
> - second, calling `setAccessible` is not allowed anymore. And this is a
> MAJOR pita. The problem is that we're trying to access the java base
> module, and reflection on that module is not allowed anymore. There are no
> good solutions for this: we could write a JVM agent, like you suggested,
> that rewrites bytecode. But since we're trying to rewrite a core class, it
> would have to be found on the invocation of `java` command itself, and
> cannot be dynamically loaded. In addition, we would have to add a flag to
> open core classes to rewrite. There are multiple problems with this
> approach: first, we don't know on which environment we run before we're
> started. Gradle can run on JDK 7, 8, 9, ... and we cannot have a startup
> script which tries to infer the version from whatever it finds, this is not
> realistic and would add unacceptable latency on the client side (plus, a
> lot of headaches to support the various environments we support: Linux,
> Mac, Windows, ...). Second, we may not have a hand on the CLI at all. For
> example, if the build runs in Travis CI, Amazon, or whatever cloudish thing
> that spawns its own containers, we cannot tweak the VM arguments. We just
> have to use whatever is there. Last, rewriting bytecode has a cost, that
> you pay at startup.
>
> Again, what we need is that Java just returns the live view of the
> environment variables. Allowing *mutation* is not necessary, technically
> speaking, because we can rely on JNI. However, I reckon that returning an
> immutable view is done for performance reasons. That's why adding a way
> to
> mutate "the view" would be an acceptable thing I think. No reflection, no
> bytecode rewriting, just give a way to mutate the map that
> `ProcessEnvironment` retains. The advantage of this is that you would not
> effectively mutate the environment variables, so you, on the JVM side,
> would be safe. What you would mutate is the view you are retaining.
> Alternatively, provide us with an API to "refresh" the view. Like,
> `System.getenv(true)`. The benefit would be that you would only have to get
> the new view of environment variables if the user asks for it. And,
> subsequent callers would get the refreshed view, so people using `gettenv`
> in their code would be up-to-date.
>
> I'm really concerned that this problem is not taken seriously. It's a major
> performance problem for the most widely used build tool out there
> (considering all users, from native to Java and Android). Just saying
> "you're doing something nasty" is not a valid answer: we have to do it, and
> do it for good reasons.
>
>
> 2017-05-11 4:50 GMT+02:00 Martin Buchholz <[hidden email]>:
>
> > Since you're OK with doing brain surgery on the JDK and you control the
> > entire process, consider controlling your daemon with a bytecode rewriting
> > agent (e.g. byteman) that changes the definition of System.getenv etc.
> >
> > (Whose side are you on Martin?! ...  I confess I also wish for a faster
> > gradle ...)
> >

Reply | Threaded
Open this post in threaded view
|

Re: Getting a live view of environment variables (Gradle and JDK 9)

Cédric Champeau
Hi Uwe,

I already explained multiple times why we need this. Short answer: because
we must. Slightly longer answer: because the build environment, the daemon,
has to reflect the environment from the CLI (or the IDE, in case we call
using the tooling API), at the moment the build is executed. Why? Because a
user wouldn't understand that a script which does:

println System.getenv('MY_VAR')

doesn't print "foo" after doing:

MY_VAR=foo gradle printVar

(that's a silly example, but the simplest we can come with). Not everything
runs in a separate process: there are things that execute in the daemon
itself. A lot of things (starting from build scripts). And yes, we can
provide a different API to get environment variables, but:

1. it's a breaking change
2. there are lots of plugins which use libraries, which do NOT depend on
the Gradle API, so that API wouldn't be available

I'm honestly starting to get tired of explaining again and again WHY we
need this. If it was easy, we would have done it differently for years. We
worked around a bug in the JDK, which doesn't return the true environment
variables but the ones snapshotted when the process started. Now in JDK 9
we cannot workaround anymore, which either just kills Gradle performance or
forces us to write even nastier code with bytecode manipulating agents to
replace what `System.getenv` does.

2017-05-16 19:46 GMT+02:00 Uwe Schindler <[hidden email]>:

> Hi,
>
> I still don't understand why you need to change the environment variables
> of the actual process. I was talking with Rémi about that last week, and
> it's not obvious to me why you need that. Sure, it is easier to change the
> environment of the actual process and then spawn a child process for some
> non-java build tool like GCC or even another standalone java program with
> option fork=true. But: Why not use the ProcessBuilder API when spawning
> those childs? So you just add an "official" build API inside Gradle (an
> official one, documented that allows Gradle plugins to modify the
> environment variables for the current build running). If you missed to add
> such an API from the beginning (IMHO its essential for a build system like
> Gradle), then you now have to tell your users: "Sorry we did something
> wrong and all our (bad) hacks to allow you to change environment variables
> are no longer working in the future, so please change your code. We are so
> sorry!"
>
> If some plugin is not using that new API, then it's not your problem. You
> document that you *have* to use the Environment API, because plugins cannot
> rely on the fact that another plugin or maybe another build running at a
> later time is changing the Gradle process environment.
>
> At some point you have to break backwards compatibility and tell users:
> Please update your code, otherwise plugin won't work anymore with Gradle
> version x.y (the one that's compatible to Java 9).
>
> Uwe
>
> -----
> Uwe Schindler
> [hidden email]
> ASF Member, Apache Lucene PMC / Committer
> Bremen, Germany
> http://lucene.apache.org/
>
> > -----Original Message-----
> > From: core-libs-dev [mailto:[hidden email]] On
> > Behalf Of Cédric Champeau
> > Sent: Thursday, May 11, 2017 9:02 AM
> > To: Martin Buchholz <[hidden email]>
> > Cc: core-libs-dev <[hidden email]>
> > Subject: Re: Getting a live view of environment variables (Gradle and
> JDK 9)
> >
> > Thanks for the answers, folks, but I think they are kind of missing the
> > point. The fact is that environment variables *are* mutable. Java happens
> > to return an immutable view of the environment variables when the VM was
> > started, which is not the reality. We cannot trust what `System.geteenv`
> > returns. The Javadocs say "current system environment" but it's just not
> > true. The method should be called `getEnvWhenTheVMWasStarted`.
> >
> > We also have the problem that the environment of the client and the
> > daemon
> > differ over time, as I explained in the previous email. And it is pretty
> > easy to understand that _when the build runs_, we need to get the
> > environment variables of the _client_, not the daemon. Imagine something
> > as
> > simple as this:
> >
> > String toolPath = System.getenv('SOMETOOL_HOME')
> >
> > and imagine that this code runs in the daemon. When the daemon is
> started,
> > there's absolutely no guarantee that this variable is going to exist.
> > However, we know that when we're going to execute the build, this
> variable
> > *has* to be defined. Obviously, it's going to be done from the CLI:
> >
> > $ export SOMETOOL_HOME = ...
> > $ ./gradlew run ---> connects to the daemon, passes environment
> variables,
> > mutates those of the daemon, which then executes the build
> >
> > Similarly, from a *single* CLI/terminal, it's very common to change the
> > values of environment variables (because, hey, they are mutable!):
> >
> > $ export SOMETOOL_HOME = another path I want to try out
> > $ ./gradlew run --> ... must update env vars again
> >
> > So, to work around the problem that Java doesn't provide a way to mutate
> > the current environment variables, we perform a JNI call to do it. *Then*
> > we need to mutate the "immutable view" that Java provides, or all Java
> code
> > is going to get wrong information, and we would never succeed. Note that
> > having to resort on JNI to mutate the environment is not the problem. We
> > can live with that. Once can think it's a bad idea to allow mutation, in
> > practice, it is necessary, but I reckon it could be a bad idea to have an
> > API for this. The *real* issue is that `System.getenv` does *not* return
> > the real values, because it's an immutable view of what existed at the VM
> > startup.
> >
> > Note that it's not recommended to mutate environment variables in a
> > multi-threaded environment. But in practice, you can assimilate what we
> do
> > to the VM startup: we get environment variables, mutate them, _then_ the
> > build runs (and only at that moment we would effectively be
> > multi-threaded). We can live with potential issues of mutating the
> > environment: the benefits outperforms the drawbacks by orders of
> > magnitude.
> > When the daemon is activated, we're not talking about 10% faster builds.
> > They can effectively be 3, 5 or 10x faster!
> >
> > Now, back to the problem with JDK 9:
> >
> > - first, the implementation has changed. But we could adapt our code.
> > - second, calling `setAccessible` is not allowed anymore. And this is a
> > MAJOR pita. The problem is that we're trying to access the java base
> > module, and reflection on that module is not allowed anymore. There are
> no
> > good solutions for this: we could write a JVM agent, like you suggested,
> > that rewrites bytecode. But since we're trying to rewrite a core class,
> it
> > would have to be found on the invocation of `java` command itself, and
> > cannot be dynamically loaded. In addition, we would have to add a flag to
> > open core classes to rewrite. There are multiple problems with this
> > approach: first, we don't know on which environment we run before we're
> > started. Gradle can run on JDK 7, 8, 9, ... and we cannot have a startup
> > script which tries to infer the version from whatever it finds, this is
> not
> > realistic and would add unacceptable latency on the client side (plus, a
> > lot of headaches to support the various environments we support: Linux,
> > Mac, Windows, ...). Second, we may not have a hand on the CLI at all. For
> > example, if the build runs in Travis CI, Amazon, or whatever cloudish
> thing
> > that spawns its own containers, we cannot tweak the VM arguments. We just
> > have to use whatever is there. Last, rewriting bytecode has a cost, that
> > you pay at startup.
> >
> > Again, what we need is that Java just returns the live view of the
> > environment variables. Allowing *mutation* is not necessary, technically
> > speaking, because we can rely on JNI. However, I reckon that returning an
> > immutable view is done for performance reasons. That's why adding a way
> > to
> > mutate "the view" would be an acceptable thing I think. No reflection, no
> > bytecode rewriting, just give a way to mutate the map that
> > `ProcessEnvironment` retains. The advantage of this is that you would not
> > effectively mutate the environment variables, so you, on the JVM side,
> > would be safe. What you would mutate is the view you are retaining.
> > Alternatively, provide us with an API to "refresh" the view. Like,
> > `System.getenv(true)`. The benefit would be that you would only have to
> get
> > the new view of environment variables if the user asks for it. And,
> > subsequent callers would get the refreshed view, so people using
> `gettenv`
> > in their code would be up-to-date.
> >
> > I'm really concerned that this problem is not taken seriously. It's a
> major
> > performance problem for the most widely used build tool out there
> > (considering all users, from native to Java and Android). Just saying
> > "you're doing something nasty" is not a valid answer: we have to do it,
> and
> > do it for good reasons.
> >
> >
> > 2017-05-11 4:50 GMT+02:00 Martin Buchholz <[hidden email]>:
> >
> > > Since you're OK with doing brain surgery on the JDK and you control the
> > > entire process, consider controlling your daemon with a bytecode
> rewriting
> > > agent (e.g. byteman) that changes the definition of System.getenv etc.
> > >
> > > (Whose side are you on Martin?! ...  I confess I also wish for a faster
> > > gradle ...)
> > >
>
>
123