Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[POC] Syntactic redirection of identifiers as a way to move libraries forward #2102

Closed
wants to merge 6 commits into from

Conversation

alainfrisch
Copy link
Contributor

@alainfrisch alainfrisch commented Oct 11, 2018

This is a proof-of-concept of a proposal to allow breaking changes in the stdlib without actually breaking other libraries. With this PR, one can change from:

(* Stdlib version 4.07 *)
module Int32: sig
 ...
 val of_string: string -> int32 (* raise Failure in case of error *)
 ..
end

to:

(* Stdlib version 4.08 *)
module Int32: sig
 ...
 val of_string: string -> int32 option   [@@ocaml.redirect (stdlib < "4.08") of_string_exn] 
 val of_string_exn: string -> int32      (** @since 4.08 *)
 ...
end 

and most codebase written "against" the stdlib before 4.08 will still compile fine with 4.08, without any change to the code. The compiler, when resolving any call to Int32.of_string, will automatically rewrite this to Int32.of_string_exn and continue type-checking as if the user had actually written that identifier.
To enable this, one only needs to tell the compiler that the current code fed to it was written against, say, stdlib = "4.07". This is a best-effort approach and is not bullet-proof; for instance, if the code pass Int32 to a functor which expects a val of_string: string -> int32, this will break. But the simple approach should be enough to handle the vast majority of cases. It would also work for renaming inside modules returned by functions (e.g. Hashtbl.S.find => Hashtbl.S.find_exn).

The intention is not to encourage the author of such codebase to never upgrade their code, but rather to make it possible to introduce breaking changes in the stdlib without breaking the entire ecosystem at once.

Also, all the information is available to allow automated rewriting of code against the latest stdlib; a tool could easily parse the .cmt file and rewrite the source .ml accordingly. Library authors should definitely do that when they are ready to drop support for old versions of OCaml, and library consumers can always do that locally so as not to depend on the new rewriting hack.

I want to stress that this should not be seen as a change to the language; developers would normally not specify the expected version of the stdlib when developing the library. The information could for instance be added by OPAM package maintainers to unbreak existing libraries.

People on caml-devel suggested that this method could be implemented by an external tool, but in essence, the tool would be a compiler driver with an instrument type-checker extended with hooks. This is because the "redirection" needs to implement the same binding logic as the type-checker, but cannot be run as a post-processing pass (because without the redirection, type-checking fails). So in practice, I find it more convenient to implement that in the compiler directly and this will simplify the life for users. Let's consider we add to the compiler a "live" compatibility tool, without actually changing the official language.

The mechanism is actually not tied to the stdlib and could be used by other libraries facing the same challenges.

This PR is in a very early stage.

  • The language of "constraints" should be discussed and specified. Is it actually useful to support anything more complex than id <= "VV.VV.VV"?
  • The actual version of id must be passed to the compiler through an environment variable with that name (uppercased, e.g. "STDLIB=4.08 ocamlc -c ..."); if the variable is not defined, the condition evaluates to false (i.e. no redirection happens). This means that by default (if no variable is defined), the compiler behaves as before. Of course, this UI would need to be changed to something better (at least a command-line flag, and OCAMLPARAMS).
  • Multiple redirection attributes can be specified on a single declaration, with a first match policy (this allows supporting multiple renamings across versions).
  • Only value declarations are supported, but it would be easy to support other kinds of components.
  • Redirection targets can be local identifier or qualified longidents (interpreted as "absolute paths"). In the example above, one could write [@@ocaml.redirect (stdlib < "4.08") Stdlib.Int32.of_string_exn].
    It might be useful to allow "relative paths" ("../../M.t"), or not.
  • Many other details must be fleshed out.

@alainfrisch
Copy link
Contributor Author

I've added a commit (not to be merged with this PR!) to illustrate the concept. The stdlib is adapted for changes in Hashtbl.S.{find, find_opt, find_exn}, but not the compiler code base itself (which indeed makes use of Hashtbl.S.find). It can still be built by compiling it with STDLIB=4.07 through a simple addition to the build system (and one gets warnings on the console to show rewritings that happen under the hood).

| _ -> failwith "Invalid condition"
end
| Ident id ->
begin match Sys.getenv_opt (String.uppercase_ascii id) with
Copy link
Contributor

@dbuenzli dbuenzli Oct 11, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefix id with OCAML_REDIRECT_ (or OCAML_REDIRECT_VAR_ or something else)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, or rely only on OCAMLPARAMS (that's what I meant by "this UI would need to be changed to something better"). Remember, this is only a proof-of-concept. Currently, the reactions from maintainers on the overall approach (here and on caml-devel) look rather negative, so I don't intend to put more time on it.

@avsm
Copy link
Member

avsm commented Oct 15, 2018

I do prefer this solution to the alternative of providing code-transformation tools that will mechanically upgrade a codebase. The reason is that it puts less pressure on the need for carefully versioned releases in opam that have all the constraints on ocaml versions exactly correct. With this solution, a single codebase can compile in multiple versions of OCaml, and reduce the pressure on our publishing infrastructure (the constraint solver, and so on).

A second benefit to this approach is that it plays very well with Dune monorepos that embed lots of code into a single tree. With that approach to vendoring OCaml libraries, we do not need opam in the build path of development (reserving it for the publishing part). However, a downside to Dune vendoring is that we have to select a specific version of the OCaml compiler to vendor code against, since we no longer have an opam install step to run the constraint solver. The vast majority of incompatibilities are due to minor interface bumps across versions of the standard library, and the proposed PR would fix that quite effectively I think.

@alainfrisch
Copy link
Contributor Author

Thanks @avsm for your feedback.

The vast majority of incompatibilities are due to minor interface bumps across versions of the standard library, and the proposed PR would fix that quite effectively I think.

Do you have specific past examples in mind? It seems to me that we have been historically very conservative in renaming components of the stdlib. "Subtle" breakages, such as code relying on module type of might not be fixed by the current proposal.

Also: do you think the mechanism could be useful for other libraries as well?

@Drup
Copy link
Contributor

Drup commented Oct 15, 2018

@alainfrisch WRT to other libraries: I'm quite sure that this feature would have made Lwt's recent breaking changes quite a lot smoother. @aantron would probably be a better judge.

@aantron
Copy link
Contributor

aantron commented Oct 15, 2018

I wouldn't use this over existing deprecation in Lwt. Keeping in mind that some users of any library are other libraries that need to compile against multiple versions:

  • Users becoming forward-compatible requires user effort anyway.
  • It looks like this is about reusing the identifier, but we still don't gain backwards compatibility.

This may help slightly in some corner cases, but I can't think of a problem this solves that can't be solved about as well using existing features. Meanwhile, the cost of this feature is:

  • More "magic" in the language.
  • Complications to builds, since people need to add flags.
  • More diverse and confusing instructions for users. Right now, we always ask only for slight adjustments to code. If we use this feature, we will sometimes ask people to mess with their builds. Everyone is working with OCaml code, but not everyone is working with Dune.

Dragging build systems into this process seems like a big conflation.

@alainfrisch
Copy link
Contributor Author

but I can't think of a problem this solves that can't be solved about as well using existing features

How would you currently address the desire the rename say Lwt.foo into Lwt.bar, and possibly reuse Lwt.foo for something else? Imagine this is done while switching from Lwt version 1 to Lwt version 2. Now, a user depends on Library X, developed against Lwt version 1 and on Library Y, developed against Lwt version 2. I suppose that both versions cannot be linked together and/or that their abstract types are not compatible. Authors of Library X could be asked to update their library against Lwt version 2, but (i) as a user, you have no control over if/when this happens, and (ii) these authors will need to drop supporting users of Lwt version 1 going forward, or maintain two releases, or play conditional compilation tricks. Also, as a contributor to Lwt, there is no simple way to assess the breakages of Lwt version 2 unrelated to this renaming.

(One possible answer is that such renamings are only done in well-identified major versions, which come with other changes that wouldn't be dealt with the current proposal anyway. This might be the main difference with the stdlib, for which we don't really allow ourselves any major breaking version.)

@aantron
Copy link
Contributor

aantron commented Oct 15, 2018

See ocsigen/lwt#293, "Semantic versioning; safely breaking Lwt."

The short version is:

  • We alias Lwt.foo as Lwt.Versioned.foo_1. We deprecate Lwt.foo and tell users to switch to the versioned one or upgrade their code.
  • We create the future Lwt.foo as Lwt.Versioned.foo_2.
  • We release Lwt, and it remains in this state for three months.

One possible answer is that such renamings are only done in well-identified major versions

Then, we do a major version release (breakage is the only thing we do major versions for).

  • The implementation of Lwt.foo becomes that of Lwt.Versioned.foo_2.

So, basically, we try to push all the users (libraries and applications) forward gently during a period of time, and give one upgrade path that is extremely simple: blindly replace Lwt.foo by Lwt.Versioned.foo_1. There is a small intersection across the major versions, in which users relying on either major version can resolve all the constraints. This lets libraries upgrade gradually while remaining mutually compatible. We also notify maintainers to get their attention. Because of all the announcements, users tend to help out if one of their dependencies is slow to upgrade.

Of course, it's still never possible to link libraries written against APIs that changed between Lwt 2.x and 4.x in the same project (and opam constraints prevent this). We are not looking to support this in Lwt, however, so I wouldn't want this language feature for use in Lwt. Stdlib is in a much more complicated situation, but I would like it better if stdlib maintenance could become more similar to that of third-party libraries, than if a unique language feature was introduced for dealing with breaking it. I don't have a proposal, though. I'm not suggesting that the Lwt approach would work well for stdlib.

@aantron
Copy link
Contributor

aantron commented Oct 15, 2018

Also, as a contributor to Lwt, there is no simple way to assess the breakages of Lwt version 2 unrelated to this renaming.

I don't understand this statement.

@alainfrisch
Copy link
Contributor Author

Thanks for the detailed explanation.

Also, as a contributor to Lwt, there is no simple way to assess the breakages of Lwt version 2 unrelated to this renaming.

I meant: as soon as the next version in preparation comes with a "small" breakage (e.g. a renaming), it becomes hard to evaluate how much other breakages impact code in the wild, since all code in the wild needs to be adapted first to the "small" breakage. In the context of the stdlib, my concern is that any breaking change would just immediately "destroy" enough of the ecosystem that OPAM becomes basically useless as tool to assess the impact of other changes (in any other part of the core distribution), except if there is a good automated migration story which can be easily deployed in the large.

@alainfrisch
Copy link
Contributor Author

No positive opinion on this proposal from core maintainers, so no chance it's going to be accepted. Let's close it!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants