Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: human-readable URLs based on query strings #389

Merged
merged 7 commits into from
Jul 16, 2024
Merged

Conversation

gwhitney
Copy link
Collaborator

By submitting this PR, I am indicating to the Numberscope maintainers that I have read and understood the contributing guidelines and that this PR follows those guidelines to the best of my knowledge. I have also read the pull request checklist and followed the instructions therein.


To implement the planned OEIS search bar, it will become necessary
to generate specimen encodings from sequence parameters without
actually constructing the Sequence objects that have those parameters.
That was not possible with the opaque base64 encoding.

Hence, this PR switches to a human-readable encoding based on URL query
strings as processed by the JavaScript standard URLSearchParams. It provides
some functions to construct such encodings directly (rather than only
mediated by existing Sequence and Visualizer objects). This PRI includes
a backwards compatibility facility so that saved base64 encodings can still
be interpreted. It also suppresses the Vue error trapping feature when
in Workbench mode, which was and should continue to be very helpful when
debugging.

Note I still need to update the Featured Gallery specimens to the new URL format, but this PR can be reviewed without that -- everything should still work because of the backwards-compatibility mode. I will push another commit with those featured URL updates as soon as I can.

[I am hoping we can merge this into ui2 soonish, and then I will rebase my cleaning up of Delft #7 onto the result, and then I can implement the the OEIS search bar as discussed elsewhere.]

gwhitney added 3 commits July 14, 2024 03:59
  To implement the planned URL search bar, it will become necessary
  to generate specimen encodings from sequence parameters without
  actually constructing the Sequence objects that have those parameters.
  That was not possible with the opaque base64 encoding.

  Hence, this PR switches to a human-readable encoding based on URL query
  strings as processed by the JavaScript standard URLSearchParams. It provides
  some functions to construct such encodings directly (rather than only
  mediated by existing Sequence and Visualizer objects). This PRI includes
  a backwards compatibility facility so that saved base64 encodings can still
  be interpreted. It also suppresses the Vue error trapping feature when
  in Workbench mode, which was and should continue to be very helpful when
  debugging.
@gwhitney
Copy link
Collaborator Author

OK, this is now in the intended state for proposed merge to ui2, ready for a full review. It would be good to do both a thorough code and behavior review, as the URL scheme is fairly pervasive through frontscope at this point. Thanks!

@gwhitney
Copy link
Collaborator Author

Oops, the simplistic way I was getting the query portion of a URL from the path supplied by the Vue router was incorrect. All should hopefully be good now.

Copy link
Member

@katestange katestange left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As far as behaviour, I haven't found any bugs.

src/components/SwitcherModal.vue Show resolved Hide resolved
@@ -27,6 +28,28 @@ function getCurrentDate(): string {
return new Intl.DateTimeFormat('en-US', options).format(currentDate)
}

// QUERY ENCODING OF SPECIMENS
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to use fewer fancy characters so the URL is even more human readable? For example, here's a URL.

http://localhost:5173/?name=SumTwoChaosB&viz=Chaos&seq=Formula&vizQ=corners%3D4%26frac%3D0.5%26walkers%3D2%26colorStyle%3D3%26gradientLength%3D%26highlightWalker%3D%26first%3D0%26last%3D%26dummyDotControl%3Dtrue%26circSize%3D1%26alpha%3D0.9%26pixelsPerFrame%3D400%26showLabels%3D%26darkMode%3D&seqQ=formula%3D%2528n%25252%2529*pickRandom%2528%255B0%252C0%252C1%252C2%252C3%255D%2529%252B%2528%2528n%252B1%2529%25252%2529*pickRandom%2528%255B0%252C1%252C1%252C2%252C2%252C3%252C3%255D%2529

Every parameter value has an equals sign that becomes %3D, e.g. : vizQ=corners%3D4%26frac%3D0.5
By contrast, if I go to Amazon.com, there are = signs in the URL bar without problem.
I'm not at all sure there are easy alternatives/solutions, so maybe the answer is no. But it just seems a pity, the url really isn't very human readable.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Excellent questions. The current scheme of this PR is that there are just five query parameters in a Numberscope URL: name, seq (the class of Sequence), viz (the class of Visualizer), seqQ (the Sequence parameters), and vizQ (the Visualizer parameters). That means the URL must look like ...?name=NAME&viz=VISUALIZER&seq=SEQUENCE&vizQ=STUFF1&seqQ=STUFF2 where STUFF1 and STUFF2 cannot have any equals or ampersands in them, since equals and ampersands are delimiters for query parameters. Hence any equals one would use to specify sequence and visualizer parameters have to be encoded in a way that do not use the equals symbol, and currently this PR is using the "standard" percent-encoding of special characters (that is more or less enforced by the JavaScript standard URLSearchParams API the PR is using to generate and interpret URL query parameters).

The basic underlying difficulty is that URL query parameters are explicitly by HTML standards *not hierarchical, but we have a hierarchical situation here: the Specimen has three top-level parameters (name, visualizer, and sequence); and then the visualizer has its whole own set of parameters, and the sequence has a wholly another set of parameters. And you can't just hoist all the parameters to the top level: maybe your visualizer and sequence each have a parameter called 'iterations', say, which would then clash.

All that said, this was just the path of least resistance, using standard tools in the swiftest way to get something significantly better than base64 strings running. There are certainly methods to encode our parameters into the URL that will use fewer %-encodings. Here are some options; let me know whether any of the following sound like they would be worth it to you, and if so, what sounds the most appealing.

A) Handle the hierarchy by using parameter prefixes rather than by re-encoding all of the sub-parameters into a single top-level parameter. For example, except for the top-level parameters of name, seq, and viz, we could just say that all sequence parameter names get an 'S' inserted at the beginning, and all visualizer parameter names would get a 'V' inserted at the beginning. Using your URL above as the example, under this scheme the URL for that specimen would be

http://localhost:5173/?name=SumTwoChaosB&viz=Chaos&seq=Formula&Vcorners=4&Vfrac=0.5&Vwalkers=2&VcolorStyle=3&VgradientLength=&VhighlightWalker=&Vfirst=0&Vlast=&VdummyDotControl=true&VcircSize=1&Valpha=0.9&VpixelsPerFrame=400&VshowLabels=&VdarkMode=&Sformula=%28n%252%29*pickRandom%28%5B0%2C0%2C1%2C2%2C3%5D%29%2B%28%28n%2B1%29%252%29*pickRandom%28%5B0%2C1%2C1%2C2%2C2%2C3%2C3%5D%29

[As an aside, if we do go down any of these paths I think it would also be worth it not to encode any empty parameters or parameters equal to the defaults, which in this case would lead to the shortening:

http://localhost:5173/?name=SumTwoChaosB&viz=Chaos&seq=Formula&Vwalkers=2&VcolorStyle=3&VdummyDotControl=true&Sformula=%28n%252%29*pickRandom%28%5B0%2C0%2C1%2C2%2C3%5D%29%2B%28%28n%2B1%29%252%29*pickRandom%28%5B0%2C1%2C1%2C2%2C2%2C3%2C3%5D%29

So for brevity in all the further examples I am just going to capture this shortened list of parameter values.

And then interpreting such a URL would involve extracting top-level name, seq, viz, and then extracting all parameters with prefix V, stripping the prefix, and reassembling them into a structure for initializing the Visualizer, and doing the same with S parameters for the Sequence.

(B) Instead, take advantage of the fact that a URL is a string, and hence the parameters are intrinsically ordered. In this plan, we would just take care that the URL query parameters were listed in a specific order, say name first, then viz, then all of the visualizer parameters, then seq, and then all of the sequence parameters. In other words, the seq=Formula [or whatever kind of sequence] query parameter would become the delimiter between the visualizer and sequence parameters. We would simply have to make the stringently enforced convention that no visualizer could have a parameter named 'seq'. In this scheme, the previous URL would become:

http://localhost:5173/?name=SumTwoChaosB&viz=Chaos&walkers=2&colorStyle=3&dummyDotControl=true&seq=Formula&formula=%28n%252%29*pickRandom%28%5B0%2C0%2C1%2C2%2C3%5D%29%2B%28%28n%2B1%29%252%29*pickRandom%28%5B0%2C1%2C1%2C2%2C2%2C3%2C3%5D%29

Interpreting this format would simply require combing through each parameter sequentially, rather than just throwing them all into an object as is happening now.

(C) Leave the scheme of just five query parameters, but choose a different representation of the visualizer and sequence parameters that uses many fewer characters that need url encoding. For example, within the visualizer parameters use '+' to separate parameter names from values, and '~' to separate parameters. In this scheme, the above URL would become

http://localhost:5173/?name=SumTwoChaosB&viz=Chaos&seq=Formula&vizQ=walkers+2~colorStyle+3~dummyDotControl+true&seqQ=formula+%28n%252%29*pickRandom%28%5B0%2C0%2C1%2C2%2C3%5D%29%2B%28%28n%2B1%29%252%29*pickRandom%28%5B0%2C1%2C1%2C2%2C2%2C3%2C3%5D%29

In this scheme, we'd have to write custom encoder/decoders for Paramables to strings to supply and interpret those vizQ and seqQ URL query parameters.

There are probably other options, but those are the ones that come to mind.


Note that in all of the above options, there are still a very large number of %-encoded characters in the value of the formula parameter for the sequence. That's because formulas in their natural format use lots of special characters that are not allowed in URL query strings. In the particular case under consideration, the actual formula is (n%2)*pickRandom([0,0,1,2,3])+((n+1)%2)*pickRandom([0,1,1,2,2,3,3]) Of these, the characters that are being uri-encoded are ( % ) * [ , ] +. There are ways we could try to reduce this, some more or less radical.

(i) The character ',' doesn't strictly need to be uri-encoded; it is merely recommended, and performed by the standard routines. We could just do our own uri encoding and skip it.

(ii) we could use alphanumeric common names for the operators; in other words, before uri encoding, first transform to (n mod 2) times pickRandom([0,0,1,2,3]) plus ((n plus 1) mod 2) times pickRandom([0,1,1,2,2,3,3])

Now what remains are just parentheses and brackets. The radical way to eliminate parentheses is to switch to (Reverse) Polish Notation; but that might sacrifice readability every bit as much as the percent-encoding itself ( something like n 2 mod [0,0,1,2,3] pickRandom times n 1 plus 2 mod [0,1,1,2,2,3,3] pickRandom times plus if we go RPN as opposed to PN).
Another scheme for totally eliminating parentheses is to use subexpressions and then because of that omit parentheses in function calls as well, e.g. e1=n mod 2&e2=pickRandom [0,0,1,2,3]&e3=e1 times e2&e4=n plus 1&e5=e4 mod 2&e6=pickRandom [0,1,1,2,2,3,3]&e7=e5 times e6&formula=e3 plus e7
If we did all of those things (including either RPN or naming each subexpression) then all that would remain would be brackets. The only thing I have been able to think of for these would be to use some strings of the few special characters that are allowed in query strings, like ~. for [ and .~ for ] so that e2 above would become pickRandom ~. 0,0,1,2,3 .~. But this looks pretty silly and it's not clear that it is any more readable than the usual uri encoding pickRandom %5B0,0,1,2,3%5D. So I think in the end we will likely want to go ahead and use at least a handful of %-encodings, at least in formulas.

OK, those are all of my thoughts. Looking forward for your feedback on what seems worthwhile to pursue. The two main thrusts of the above discussion, dealing with hierarchy and dealing with formulas, are pretty much independent. Happy to implement whatever you'd like to try out, and right now is the easiest time to experiment, before lots of URLs are stored places and we have to support them backwards-compatibly.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After sleeping on it, option (B) of my first three in the last comment seems to me clearly the best, and simply a win over the current query strings. So in the absence of any feedback to the contrary, a bit later today I will implement (B) and push a commit to that effect for folks to try. After that we can decide whether or not we want to do anything about making formulas more readable in URLs.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMPORTANT: I have now pushed a commit that switches to the scheme outlined in (B). When you pull it, note that you must npm install (oddly I had to upgrade TypeScript to use some RegExp features); note that you will now get warnings about TypeScript versions in lint (we are stuck on an older version of lint because of prettier-eslint, hopefully we can address at some point), and finally note that your saved specimens have likely lost all of their parameters, sorry. I will post a separate comment about possible further reductions in special characters

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Possible further reductions in %-encodings in our URLs:

Having eliminated the double-uri-encoding that was the previous way to handle the nesting, the special characters/%-encoding situation is much improved. However, they do still show up for a number of reasons, each described with something we could consider doing to reduce that.

  1. Colors. We are currently representing them by strings that consist of a '#' followed by six hex digits. That's a pretty common convention, but in fact, we could simply record them as six hex digits.

  2. Commas. Outside of formulas, we use commas as the separators in lists and for the two components of P5 coordinates (like the start position of Turtle). There are two ways we could go here. One as mentioned above is just don't uri-encode commas, as it's not strictly required to do so by the standard. But it is recommended, to the point that the standard JavaScript API UrlSearchParams does uri-encode them. On reflection, I am reluctant to use something besides the standard encoder -- seems like asking for trouble. So the other way to go is just use spaces to separate items in lists and coordinates. This option further bifurcates: We could use that representation in the UI itself, so that in the Visiualizer parameters tab for Turtle the angles list would look like 30 45 60 90 120 instead of 30, 45, 60, 90, 120 and the starting point would be like 0 0. Alternatively, we could leave the commas as they are in the UI, and just for the sake of putting them into URLs take out the commas and then restore them when we read the URLs.

  3. Formulas. I don't have much in the way of new thoughts beyond the comments on formulas above, but we could still do some, all, or none of those things. Note that formulas also use commas, but inside formulas we can't just turn them into spaces -- mathjs already has a pretty extensive syntax in which commas and spaces need to be distinct. So if we keep using the standard URLSearchParams, and we want to prevent %-encodings coming from commas, we would need some other technique. I suppose if we aren't using ~ for anything else, we could just substitute that for commas, so that the URL might contain 0~0~1~2~3 where the formula actually has 0,0,1,2,3.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm happy to do the Colors (item 1 in the previous list) and I could go either way on taking the commas out of our lists. I don't really know what direction to go on formulas -- they're really very much not conducive to being put in URLs in terms of the mathjs syntax as designed (or really any typical formula syntax, almost all of which use parentheses and characters that require URI encoding for the operations). I don't think there's going to be any format legal for inclusion in URL query strings that will be straightforward for people to just read and intuitively understand what algebraic expression is meant. On the other hand, it would be easy to make an RPN dump of the formula parse tree without using any special characters, if you'd just prefer a notation that doesn't use any %-encodings, even if it's pretty hard for a human reader to "see" the intended formula just from the URL. So I am fine with either just letting formulas encode naturally with lots of %-encodings, or doing an alphanumeric-ish RPN dump -- happy to have your guidance on this.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, just catching up on all this! The new commit it great. I would be happy to take the # out of colors; that seems like a small thing. And I would be happy to take the commas out of the lists both in the interface and the URL bar -- they don't serve much purpose. So I'm happy with both of those if you want to. For formulas, after reading through all that, it seems like a big kettle of fish, and I would be ok leaving it as is.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, both of those changes are made (note the commas will still be in lists in saved specimens but will be absent in the featured gallery and when you switch visualizer/sequence). If we are just leaving formulas for now then feel free to merge when you are satisfied.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, tried it and it looks good, and I can read the URLs! Going to merge.

@katestange katestange merged commit 06ab30b into ui2 Jul 16, 2024
2 checks passed
@gwhitney
Copy link
Collaborator Author

great, thanks for merging. Will get back to cleaning and fixing implementation of OEIS search bar per https://github.com/orgs/numberscope/discussions/7

gwhitney added a commit that referenced this pull request Jan 20, 2025
* feat: human-readable URLs based on query strings

  To implement the planned URL search bar, it will become necessary
  to generate specimen encodings from sequence parameters without
  actually constructing the Sequence objects that have those parameters.
  That was not possible with the opaque base64 encoding.

  Hence, this PR switches to a human-readable encoding based on URL query
  strings as processed by the JavaScript standard URLSearchParams. It provides
  some functions to construct such encodings directly (rather than only
  mediated by existing Sequence and Visualizer objects). This PRI includes
  a backwards compatibility facility so that saved base64 encodings can still
  be interpreted. It also suppresses the Vue error trapping feature when
  in Workbench mode, which was and should continue to be very helpful when
  debugging.

* chore: remove unused hasStringFields function

* chore: update featured specimens to new URL scheme

* fix: parse query from path properly

* fix: Fewer url encodings -> more readable URLs

* fix: Suppress leading # in color query params

* feat: Lists separated by whitespace by default
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants