Type inference across method chains

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Type inference across method chains

Luke Hutchison
I have been trying to figure out why type inference doesn't work across method chaining (see the code example below). I finally came across these old messages on lambda-dev:

On Tue Jul 16 15:08:05 PDT 2013, Dan Smith wrote:
> Without ruling out the possibility of enhancements that address situations like this, [...] the status quo is that when you type a dot, the compiler has to completely type-check the stuff to the left before it can proceed.  Inference can do a lot of cool tricks with context, but will not be influenced by "context" that occurs after the dot.

On Wed Jul 24 12:41:03 PDT 2013, John Rose wrote:
> The reason I'm wildly waving this flag is that (I think) I have seen this phenomenon active during the design of streams APIs, ongoing now.

Method chaining really is getting much more common in Java, with the advent of APIs like Java 8 streaming, Apache Spark and Apache Flink.

Type inference across chained calls in Flink seems to work sometimes, but not always, which is frustrating. You end up having to add type declarations in lots of places that shouldn't need them, or breaking method chains into separate declarations so that the compiler can understand the types.

Based on Dan Smith's comments in the above thread, it seems that it was decided that the numerous complex steps involved in receiver method resolution had to all be completed before parameter type inference. However, it seems that if the entire graph of all possible method signatures and all type constraints were built before any inference were performed, then all of these steps could be performed simultaneously and iteratively until convergence: in other words, all the types and methods in a method call chain could be solved as a single constraint satisfaction problem. If in the process of solving this constraint satisfaction problem, it was discovered that there is no single satisfying assignment of types to type variables, and/or if this did not result in a single method definition being selected per method name, then a type error would be reported.

Based on the discussion in the thread above, it sounds like the reason this hasn't been done yet is due to the complexity of the current compiler design, not because it's not actually possible to do this?

Some example code: the method test1() below really should typecheck OK, but gives an error on (l + 1) in Java 8. Providing types for the lambda params in test2() allows the types to propagate one method call further along the chain, but it is frustrating to always have to do that. Breaking up the chaining into two declarations in test3() allows the typechecking to work for both operations, but this defeats the purpose.


// Compile with these Maven dependencies:

package com.rentlogic.buildingscores.flink;

import org.apache.flink.api.java.DataSet;
import org.apache.flink.api.java.ExecutionEnvironment;
import org.apache.flink.api.java.tuple.Tuple2;
import org.apache.flink.util.Collector;

public class TestMethodChainInference {
    private static ExecutionEnvironment env = 
            ExecutionEnvironment.getExecutionEnvironment();

    private static DataSet<String> strs = env.fromElements("a", "xy", "pqr");

    // Type of l resolved as Object, so (l + 1) gives a type error
    public static void test1(String[] args) {
        DataSet<Integer> strLenPlusOne = strs  
            .flatMap((s, out) -> out.collect(s.length()))
            .flatMap((l, out) -> out.collect(l + 1));     // ERROR
    }

    // No error
    public static void test2(String[] args) {
        DataSet<Integer> strLenPlusOne = strs
            .flatMap((String s, Collector<Integer> out)-> out.collect(s.length()))
            .flatMap((l, out) -> out.collect(l + 1));
    }

    // No error
    public static void test3(String[] args) {
        DataSet<Integer> strLen = strs
            .flatMap((s, out) -> out.collect(s.length()));
        DataSet<Integer> strLenPlusOne = strLen
            .flatMap((l, out) -> out.collect(l + 1));
    }
}





Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Type inference across method chains

Brian Goetz-2
Its a pretty big overstatement to say "type inference doesn't work across method chaining."  As is inevitable with type inference, there will be limits, and people will always complain about the marginal example, and say "but this example could be accommodated too, with a more complex inference algorithm."  And that's usually correct, but then there will be new cases at the margin, about which people will complain.  So let's not fool ourselves that this is "the last case" -- it's just "the next case." 

There's nothing magic about type inference; it's just constraint solving.  The choices one has in designing a type inference algorithm amount to: what constraints to use, and over what scope to solve. 

There are also a number of hidden trade-offs besides complexity.  For example, the larger the scope over which constraints are solved, the more likely error messages will be something like "there was an error somewhere in Foo.java -- which users find unhelpful.

So, with that, let's look at your example.  The cases which make people complain "type inference doesn't work under chaining" invariably reduce to the case where there is a generic method used as a receiver for a chained invocation, and whose generic type parameters cannot be solved directly from its arguments, but instead would need type information that would have to be back-propagated from later links in the chain (which generally further require information from the target type.) 

As you correctly point out, there are simple workarounds, such as using explicit lambdas or exact method refs (test2), breaking up the chain (test3), or providing explicit type witnesses (not shown.) 

Type inference across chained calls in Flink seems to work sometimes, but not always, which is frustrating.

"sometimes but not always" is a fact of life with type inference; let's not pretend that "always" is a place we could get to.  It's just a question of where to draw the line, and accept that there will always be something just over the line. 

Method chaining really is getting much more common in Java, with the advent of APIs like Java 8 streaming, Apache Spark and Apache Flink.


You say this like you think we're not aware of this, or indeed that enabling broader use of chaining in APIs wasn't, like, one of the major goals of the work we did in Java 8 on type inference!  The reality is: there will always be limitations, and they will always be annoying when you encounter them. 


Based on the discussion in the thread above, it sounds like the reason this hasn't been done yet is due to the complexity of the current compiler design, not because it's not actually possible to do this?


It is indeed not impossible.  But, as you say, it adds additional complexity to a type inference algorithm that has already grown quite complex (and which now interacts heavily with another complex mechanism, overload selection).  Could we take one more step, and move the corner cases further out to the margins, at the cost of greater complexity (and perhaps decreased usability)?  Yes, we could.  In Java 8, we chose to leave it here, which I think was a sensible choice. 

We might someday come back to look at this -- but right now, it seems like this is not the highest-return use of limited and highly specialized resources. 


On 3/16/2017 10:55 PM, Luke Hutchison wrote:
I have been trying to figure out why type inference doesn't work across method chaining (see the code example below). I finally came across these old messages on lambda-dev:

On Tue Jul 16 15:08:05 PDT 2013, Dan Smith wrote:
> Without ruling out the possibility of enhancements that address situations like this, [...] the status quo is that when you type a dot, the compiler has to completely type-check the stuff to the left before it can proceed.  Inference can do a lot of cool tricks with context, but will not be influenced by "context" that occurs after the dot.

On Wed Jul 24 12:41:03 PDT 2013, John Rose wrote:
> The reason I'm wildly waving this flag is that (I think) I have seen this phenomenon active during the design of streams APIs, ongoing now.

Method chaining really is getting much more common in Java, with the advent of APIs like Java 8 streaming, Apache Spark and Apache Flink.

Type inference across chained calls in Flink seems to work sometimes, but not always, which is frustrating. You end up having to add type declarations in lots of places that shouldn't need them, or breaking method chains into separate declarations so that the compiler can understand the types.

Based on Dan Smith's comments in the above thread, it seems that it was decided that the numerous complex steps involved in receiver method resolution had to all be completed before parameter type inference. However, it seems that if the entire graph of all possible method signatures and all type constraints were built before any inference were performed, then all of these steps could be performed simultaneously and iteratively until convergence: in other words, all the types and methods in a method call chain could be solved as a single constraint satisfaction problem. If in the process of solving this constraint satisfaction problem, it was discovered that there is no single satisfying assignment of types to type variables, and/or if this did not result in a single method definition being selected per method name, then a type error would be reported.

Based on the discussion in the thread above, it sounds like the reason this hasn't been done yet is due to the complexity of the current compiler design, not because it's not actually possible to do this?

Some example code: the method test1() below really should typecheck OK, but gives an error on (l + 1) in Java 8. Providing types for the lambda params in test2() allows the types to propagate one method call further along the chain, but it is frustrating to always have to do that. Breaking up the chaining into two declarations in test3() allows the typechecking to work for both operations, but this defeats the purpose.


// Compile with these Maven dependencies:

package com.rentlogic.buildingscores.flink;

import org.apache.flink.api.java.DataSet;
import org.apache.flink.api.java.ExecutionEnvironment;
import org.apache.flink.api.java.tuple.Tuple2;
import org.apache.flink.util.Collector;

public class TestMethodChainInference {
    private static ExecutionEnvironment env = 
            ExecutionEnvironment.getExecutionEnvironment();

    private static DataSet<String> strs = env.fromElements("a", "xy", "pqr");

    // Type of l resolved as Object, so (l + 1) gives a type error
    public static void test1(String[] args) {
        DataSet<Integer> strLenPlusOne = strs  
            .flatMap((s, out) -> out.collect(s.length()))
            .flatMap((l, out) -> out.collect(l + 1));     // ERROR
    }

    // No error
    public static void test2(String[] args) {
        DataSet<Integer> strLenPlusOne = strs
            .flatMap((String s, Collector<Integer> out)-> out.collect(s.length()))
            .flatMap((l, out) -> out.collect(l + 1));
    }

    // No error
    public static void test3(String[] args) {
        DataSet<Integer> strLen = strs
            .flatMap((s, out) -> out.collect(s.length()));
        DataSet<Integer> strLenPlusOne = strLen
            .flatMap((l, out) -> out.collect(l + 1));
    }
}






Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Type inference across method chains

Luke Hutchison
On Fri, Mar 17, 2017 at 1:19 PM, Brian Goetz <[hidden email]> wrote:
So let's not fool ourselves that this is "the last case" -- it's just "the next case." 
[...]
So, with that, let's look at your example.  The cases which make people complain "type inference doesn't work under chaining" invariably reduce to the case where there is a generic method used as a receiver for a chained invocation, and whose generic type parameters cannot be solved directly from its arguments, but instead would need type information that would have to be back-propagated from later links in the chain (which generally further require information from the target type.) 

Right, nonetheless, if this single "next case"  were solved via bidirectional propagation of type information, rather than the current situation where method overloading before the dot is solved before type inference after the dot, I would argue that this would solve a large majority of remaining real-world cases where an expression should quite obviously typecheck (according to human judgment) but does not. It would certainly cause streaming APIs to work significantly better with generics. This seems to be the most obvious weakness in the current type inference algorithm.

It is indeed not impossible.  But, as you say, it adds additional complexity to a type inference algorithm that has already grown quite complex (and which now interacts heavily with another complex mechanism, overload selection).  Could we take one more step, and move the corner cases further out to the margins, at the cost of greater complexity (and perhaps decreased usability)?  Yes, we could.  In Java 8, we chose to leave it here, which I think was a sensible choice.

We might someday come back to look at this -- but right now, it seems like this is not the highest-return use of limited and highly specialized resources.

Can you please address in concrete terms, for this specific request only (i.e. for back-propagation of type information before the final decision about method overloading is made, as opposed to "for all cases" or "for the last case") -- just how difficult solving this would be, and what the technical limitations would be given the current type inference algorithm?
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Type inference across method chains

Brian Goetz-2

On 3/17/2017 4:41 PM, Luke Hutchison wrote:
Right, nonetheless, if this single "next case"  were solved via bidirectional propagation of type information, rather than the current situation where method overloading before the dot is solved before type inference after the dot, I would argue that this would solve a large majority of remaining real-world cases

You may be right, but bear in mind that it *always* looks this way.  You forgot the implicit "...that I currently know of" that needs to trail this sentence.  The definition of "real world" is implicitly bounded by "the stuff that works now", and always moves.  (Ten years ago, chaining was not considered a real world case at all.) 

Can you please address in concrete terms, for this specific request only (i.e. for back-propagation of type information before the final decision about method overloading is made, as opposed to "for all cases" or "for the last case") -- just how difficult solving this would be, and what the technical limitations would be given the current type inference algorithm?

I know you probably don't think you're asking this, but you're basically asking "can you solve the problem (or try until you give up), and then explain what you did."  So, that's not really practical, as much as I'd love to have a simple answer I can pull out that fits on a page and is accessible to non-type-system-experts.  Suffice to say it's doable, but it's not trivial, and the risks are not trivial. 

Also, bear in mind that "how difficult" is only one (and in reality, probably the least important) dimension of the problem.  "How difficult" corresponds to "how much will it cost Oracle", which is certainly a reasonable input into the decision process, but far from the only one.  How about what incremental perceived complexity for the users would it create?  What damage would it do to scrutability of diagnostics?  What are the compatibility risks?  (Adding new constraints into type inference algorithms can often cause something that was previously solvable to become overconstrained, breaking existing code.)  What is the consequence for the specification?  For implementors like Eclipse that have to implement from the specification?  In what ways might it interact with other features, present or future? 


Loading...