Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Port of clojure.pprint to Basilisp #895

Closed
wants to merge 4 commits into from

Conversation

ikappaki
Copy link
Contributor

@ikappaki ikappaki commented Mar 18, 2024

Hi,

can you please review patch to port clojure.pprint to Basilisp. It addresses #513

The approach taken aimed at minimizing modifications to the original code. There are two commits in this PR, the first is the original code taken from clojure.pprint, the second has all the necessary changes I've made to make it work in Basilisp.

The code spans across seven files loaded into the pprint namespace. I've organized them under basilisp.contrib.pprint to denote that this is a contributed package, rather than something we've implemented from scratch. However, I don't have a specific preference for the location of these files.

Since these files are load'ed at runtime rather than being required or imported, they inevitably cause noticeable delays when the namespace is initially imported, in line with the constraints outlined by #878. Two potential solutions come to mind to necessitate these files at compile time:

  1. Transform them into regular namespaces that are required by the primary pprint namespace, as suggested. However, I would then need to figure out a way to integrate these variables into the top-level pprint namespace. I'm not sure there is a super use like form that will not only bring them into the basilisp.pprint namespace but also export them as public variables from there?

  2. Merge the files into a single extensive pprint.lpy file.

What do you think?

The pprint tests follow the same approach and also need to be reorganized as above.

The list of changes over the original implementations are

pprint.lpy

utilities.lpy

column-writer.lpy

  • Changes are mostly around updating Clojure types to basilisp (e.g. String to py/str).

pprint_base

  • Introduce new *print-sort-unordered-colls* var, to order keys in
    map, useful for the testing phase.

cl_format.lpy

dispatch.lpy

  • implement support for the new *print-sort-unordered-colls* var, so that prinout map output is predictable for testing
  • Commented out agent print out support, Agents #413
  • Commented out clojure.lang.IPending print out support (not sure what that is)
  • Commented out .isArray support (not sure if this is relevant/supported?)
  • Commented out basilisp.lang.interfaces/PersistentQueue dispatch method, since it is not available

print_table.lpy

tests

`test_helper.lpy

  • introduce platform-newlines

test_cl_format.lpy

  • ~C use pr to print out characters and thus are output as strings rather than characters.
  • Commented out Decimal128 tests since they are not support in Python

Thanks

@ikappaki ikappaki marked this pull request as draft March 18, 2024 22:21
@ikappaki
Copy link
Contributor Author

I put temporary hack in the pprint tests (test_pprint.lpy, contrib/pprint/test_pretty.lpy and contrib/test_cl_format.lpy) because they may be called individually with coverage run (and as such break), instead of loaded by test_pprint.lpy as a whole. It should be resolved once we decide what to do with the `load'ed files.

@ikappaki ikappaki marked this pull request as ready for review April 7, 2024 13:35
@ikappaki
Copy link
Contributor Author

ikappaki commented Apr 7, 2024

Hi, I've taken the liberty to concatenate the loaded files in contrib.pprint under pprint.clj and test_pprint.clj. This significantly boost loading performance, and shouldn't affect maintainability because no significant changes are expected, the clj code is quite stable and battle tested for many years now.

It is prime for review.

Thanks

@ikappaki
Copy link
Contributor Author

Hi @chrisrink10,

any chance you could review the clojure.pprint port to Basilisp, or perhaps suggest otherwise? I've just rebased to the latest main, and added a changelog entry.

The commits are build upon each other in case you are interested in reviewing how this involved

  1. clojure.pprint files checked in verbatim to Basilisp, f15e9ef
  2. clojure.pprint and tests ported as basilisp.pprint with the least changes possible 67e51c3
  3. pprint files stiched together into one to surpass the load limitation a013724

Thanks

@chrisrink10
Copy link
Member

Hi @chrisrink10,

any chance you could review the clojure.pprint port to Basilisp, or perhaps suggest otherwise? I've just rebased to the latest main, and added a changelog entry.

Thank you for the PR. I appreciate your continued contributions to the project, but I have some fairly serious misgivings about this changeset which is why I have not moved forward with reviewing it. I apologize for the radio silence for so long, but I have struggled with putting my thoughts to paper.

My concerns fall into two broad categories: philosophical and practical.

As to the philosophical concerns, I am broadly against importing more code from other repositories due to licensing and copyright claims. This code comes from Clojure which should be license-compatible, but otherwise carries Rich Hickey's copyright claims. We have already taken the nbb bencode implementation (without copyright attribution which is another potential issue) and I have been uneasy about that ever since it was merged. Adding this new code introduces even more complexity in an area I'm not particularly knowledgeable in or interested in addressing.

It may not be an issue because certainly many portions of basilisp.core and the interface hierarchy are directly attributable to Rich as well since there is no Clojure spec and implementation specifics must often be derived from Clojure's source, but nevertheless it has always been my intent (although unstated) to implement things in my own way (with my own copyright) as long as it was possible to do so.

I also have 2 practical concerns. First is that this relies on proxies, which are as yet unimplemented. I know you made an attempt to do so in Basilisp (as #818), but it is my preference to actually do this work in the compiler using a special form since I don't prefer to spread the arity-mapping logic across core and the compiler (to the extent that is possible). The second concern is somewhat more selfish, but I'm concerned about importing a large Clojure library (and associated test suite) written in a style that is somewhat messier and more old-school than what I prefer. When changes to core or other libraries affect e.g. the nREPL server tests I often find myself having to dig deep to understand why the error has occurred. I'm concerned this issue will be amplified with a considerably more complex library and test suite.

In the absence of a clear path forward here, it may be worthwhile considering publishing this as a separate library which you (and others) can use until I can resolve what to do here. I'm sorry if this is not the answer you were looking for, but I just haven't been able to shake these concerns in the intervening months since you submitted this PR.

@ikappaki
Copy link
Contributor Author

Hi,

I completely understand and share your concerns, but unfortunately, I am not able to fully address them.

My original intention was to place any ported code under the contrib/ directory to signify that this code is not part of the core packages but rather contributed from external sources. This would also suggest that the code may have different coding styles and is compatible for use, though it may still be copyrighted.

The critical issue here seems to be the concept of "claim"—specifically, the possibility of someone asserting ownership of the code and making adverse demands regarding its usage, which is indeed concerning. In the case of the nrepl-server, we somewhat addressed this by involving the author of nbb in the review process and asking if any further attribution was necessary. He assured us that there was nothing more required, which was somewhat reassuring.

I also reviewed the EPL-1.0 license, particularly the sections on derived works, but the legal implications are not obvious from the language, and I didn’t gain much clarity from it. Unfortunately, I'm not any more knowledgeable on this topic after my brief review.

My attempt to mitigate the copyright concerns for pprint was to place the code under contrib/. However, this led to issues with the load function always compiling the code at runtime as discussed, so I optimized it into a single file in the main namespace.

Given the concerns raised, I think it would be best to publish the library as a separate package. I now feel confident in doing so after having published a couple of packages already.

The same approach could be taken with the nrepl-server if necessary, though we would lose the integration with the Basilisp CLI tool.

I will leave the proxy implementation over to you as well, as it will be more efficiently done by the compiler, as suggested. In the meantime, I can work around it by using the python/type directly in the code.

Thanks

@NoahTheDuke
Copy link
Contributor

Something that may or may not be appropriate to note here: a lot of the code for pprint comes from the implementation of cl-format, Common Lisp's string format function. Clojure's pprint's basic goal is to pretty print code structures, and I wonder if it would be doable to just write your own version of that functionality first in a more naive way, and port cl-format later (or never cuz it's slow and imo outdated).

@ikappaki
Copy link
Contributor Author

ikappaki commented Aug 18, 2024

Hi @NoahTheDuke

Something that may or may not be appropriate to note here: a lot of the code for pprint comes from the implementation of cl-format, Common Lisp's string format function. Clojure's pprint's basic goal is to pretty print code structures, and I wonder if it would be doable to just write your own version of that functionality first in a more naive way, and port cl-format later (or never cuz it's slow and imo outdated).

I agree that cl-format isn't a critical component of the typical interface that most people use or are familiar with. It seems to have been primarily used as a scaffolding tool for implementing the primary clojure.pprint interface, and I doubt many Clojure modules rely on it. Therefore, porting potential Clojure modules to Basilisp that use clojure.pprint isn't heavily impacted by cl-format.

I did explore the more self-contained cljs.pprint as a potential source for porting, but I encountered some JavaScript-specific nuances that I wanted to avoid, and it is also relying on .cljc conditionals.

The porting of clojure.pprint was valuable in identifying and fixing several bugs, inconsistencies, and gaps in Basilisp, so it has served an important purpose, even thought it is not committed to the codebase.

Feel free to start implementing basilisp.pprint from scratch. I don't think cl-format is a requirement for this, and we can comfortably move forward without it. In the meantime, I plan to release the package as mentioned earlier.

The primary reason I originally wanted basilisp.pprint was to enable pretty-printing collections in the nREPL server. However, I acknowledge that releasing it as a separate package won't address that specific need.

Thanks

@ikappaki ikappaki mentioned this pull request Aug 18, 2024
@ikappaki ikappaki closed this Aug 18, 2024
@chrisrink10
Copy link
Member

@NoahTheDuke @ikappaki what do you consider the fundamental API for clojure.pprint?

As I said above, I think the entire API is quite messy and confusing. I've never needed to use it though aside from occasionally calling (pprint ...) on some large data structure at the REPL, so just wanting to understand if there's a reason that there's so much complexity (bindings and dispatch functions and so on).

@NoahTheDuke
Copy link
Contributor

In my opinion and experience, the fundamental API of clojure.pprint is pprint and pp, with occasional usage of print-table. Some of the dynamic variables are used (*print-pretty*, *print-right-margin*), but mostly it's those two functions.

Looking at the code in clojure/src/clj/clojure/pprint, it's roughly half and half between cl-format support and pprint support, and there's a lot of messy old-school clojure code and java interop here that wouldn't need to be ported. Doing some mental math, it's like 1500-2000 lines just to support pprint: 400 in base, 550 in dispatch, 500 in pretty_writer, 100 in utilities (which is used in both pprint and cl-format), not to mention that multiple formatting functions are defined with cl-format's formatter-out function (which necessarily brings in all 2k lines of cl-format's functionality).

Compare with fipp, which is only 1010 lines of code and covers all of the same functionality. Supporting the dynamic variables of pprint with a simpler implementation shouldn't be too hard, just somewhat tedious to actually write (famous last words lol).

The bindings are important because 1) compatibility with all other clojure code, and 2) the API wasn't designed for a context map to be passed along. If pprint's signature was '([obj] [ctx obj]) then the bindings wouldn't be needed, but sadly, they didn't go that way.

The dispatch functions are necessary because the calls are mutually recursive: (pprint '(1 2 {:a :b #{:c}}) has to print a set inside of a map inside of a list. This is the core value proposition of dynamic dispatch. There are two dispatch functions because simple-dispatch is for basic formatting, and code-dispatch is for when you want to do some "code aware" formatting, for example using special formatting for defn forms.

Given that it was written for 1.2, pprint uses multimethods with class as the dispatch fn (simple-dispatch, code-dispatch). We now have protocols which are faster (and more idiomatic), but compatibility with existing Clojure code might make you want to implement them with multimethods too.

Comparison of future cljc code:

(defmethod clojure.pprint/simple-dispatch #?(:clj jvmObj :lpy pythonObj) custom-simple-pprint-function)
(defmethod clojure.pprint/code-dispatch #?(:clj jvmObj :lpy pythonObj) custom-code-pprint-function)

;; or

#?(:clj (do (defmethod clojure.pprint/simple-dispatch jvmObj custom-simple-pprint-function)
            (defmethod clojure.pprint/code-dispatch jvmObj custom-code-pprint-function))
   :lpy (extend-protocol PrettyPrint
          pythonObj
          (-simple-print [obj] (custom-simple-pprint-function obj))
          (-code-print [obj] (custom-code-pprint-function obj))))

@chrisrink10
Copy link
Member

That makes sense. This is good context. I'm wondering if it's possible to both support the dynamic variables and use a context object and protocol (by capturing the values of the bindings at invocation time if no context object is provided)? I haven't inspect the code myself though, so that may not be possible.

@NoahTheDuke
Copy link
Contributor

You'd probably end up with something like, which imo isn't too bad but isn't portable.

(defn- default-opts []
  {:print-length *print-length*
   ...})

(defn pprint
  ([object] (pprint nil object *out*)
  ([opt-or-object object-or-writer]
   (if (instance? java.io.PrintWriter object-or-writer)
     (pprint nil opt-or-object object-or-writer)
     (pprint opt-or-object object-or-writer *out*)))
  ([opts object writer]
   (let [opts (merge (default-opts) opts)]
    ...)))

@chrisrink10
Copy link
Member

chrisrink10 commented Aug 20, 2024

What do you mean by "portable"? If we still respect the values of the dynamic vars if no context object is provided, then isn't that the same as JVM Clojure (again, making some assumptions w.r.t. where those values are captured and used)?

@NoahTheDuke
Copy link
Contributor

By portable, I mean "Does this work without using reader conditionals?" (seq [1 2 3]) is portable, (java.io.StringWriter.) is not. clojure.pprint has two arities, [object] and [object writer]. If you want to add an opts argument to allow not using dynamic vars, you'll need to add a third arity which means that such calls will only work in Basilisp: #?(:clj (binding [...] (pp/pprint foo)) :lpy (pp/pprint {...} foo)).

@chrisrink10
Copy link
Member

By portable, I mean "Does this work without using reader conditionals?" (seq [1 2 3]) is portable, (java.io.StringWriter.) is not. clojure.pprint has two arities, [object] and [object writer]. If you want to add an opts argument to allow not using dynamic vars, you'll need to add a third arity which means that such calls will only work in Basilisp: #?(:clj (binding [...] (pp/pprint foo)) :lpy (pp/pprint {...} foo)).

I am just imagining adding the non-compatible args either as the final argument in a new arity or just creating a new function that pprint delegates to but which is also public and Basilisp only.

I'll take a look at some point and see if this is possible.

@ikappaki
Copy link
Contributor Author

Hi, I don't have much to add to @NoahTheDuke's excellent analysis and suggestions above. Personally, I'm a moderate user of pprint and print-table, both of which I find essential to the REPL experience.

I'd also like to highlight that there are some core variables that influence clojure.pprint, as document in https://clojuredocs.org/clojure.pprint/write, particularly *print-length*, *print-level*, and *print-right-margin*. I’d recommend including these as part of the standard behavior, especially the first two, to prevent the output from getting overwhelmed by large data structures, which is often the case when working from the REPL.

Thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants