Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multiple text formatters #2293

Closed
jensscherbl opened this issue Dec 7, 2014 · 51 comments
Closed

Multiple text formatters #2293

jensscherbl opened this issue Dec 7, 2014 · 51 comments

Comments

@jensscherbl
Copy link
Member

Created a proof of concept for allowing multiple text formatters to be added and chained to a textarea field.

This is also a best practice example (at least from my point of view) for solving the issues around encoding, CDATA and inconsistent field behavior and could solve the disagreements regarding whitespace trimming further down the road.

Maybe the proposed changes could be considered for future core integration.

@michael-e
Copy link
Member

From your README:

Text formatters are a presentational thing and should ideally be part of the templating layer.

I am not sure if agree with that. But apart from "the perfect concept", adding textformatters upon saving improves performance. PHP Markdown is rather slow, and applying it on every formatted content during page load is slowing down the page output.

@jensscherbl
Copy link
Member Author

adding textformatters upon saving improves performance. PHP Markdown is rather slow, and applying it on every formatted content during page load is slowing down the page output.

Agreed. But the formatter result is not "your content" from a conceptual point of view, so it shouldn't be stored as field content and handled by the caching mechanism instead. Really just a minor difference in implementation. And it has a huge advantage. Formatters can be added, changed or removed without resaving entries.

I am not sure if agree with that.

I'm always open to arguments and examples... ;)

@nitriques
Copy link
Member

The thing is: you are both right.

@jensscherbl
Copy link
Member Author

The thing is: you are both right.

Explain... :)

Despite the disagreement above, what do you think of

  • multiple, more modular text formatters in general
  • the UI change for this (taglist instead of select box)
  • behavior regarding encoding, CDATA and unaltered formatter output

Thing is, these things could still work without the more controversial changes.

The idea is to provide Markdown, Markdown Extra and HTML Purifier as standalone text formatters.

@nilshoerrmann
Copy link
Contributor

If a future Symphony version offered proper caching, moving the formatting to the templating layer might be an interesting idea – especially if you need to apply a different formatter later on. But it depends on your project setup, if you consider the formatted or the unformatted value your actual content.

  • multiple, more modular text formatters in general
  • the UI change for this (taglist instead of select box)
  • behavior regarding encoding, CDATA and unaltered formatter output

+1

@nilshoerrmann
Copy link
Contributor

A note regarding your readme:

Input field has validation and doesn't need formatters …

If you'd like to apply typographic features to entry titles, this is not true. It's one reason why we usually use single line Textbox fields instead of default input fields because it enables us to apply Smartypants and the like to titles.

@jensscherbl
Copy link
Member Author

If you'd like to apply typographic features to entry titles

it enables us to apply Smartypants and the like to titles

That's what I consider a task for a modifier. The difference to a formatter would be that formatters are expected to produce XML and don't alter the original content, while a modifier would persistently replace characters or modify the content in other ways and the output would be treated as regular CDATA content, not code.

@nitriques
Copy link
Member

Explain... :)

Both strategies are good: Cached data into the DB is valid (performance) and transforming the data on demand (flexibility) is valid too. As I always say: cache invalidation is the hardest thing in computer science.

But since we do not have any good core caching layer, the later has been implemented.

That's what I consider a task for a modifier

I really like this idea, but again, if you want to do it on the fly, we need a better caching layer. And it should be opt-outable, since I still have to run Symphony on poor shared hosts.

@nilshoerrmann
Copy link
Contributor

That's what I consider a task for a modifier.

Ah, okay, I understand the concept now. The terminology is a bit confusing to me because text formatting usually includes typographic refinement like phone number grouping and similar. The latter would be a modification in your concept because you'd add thin spaces.

Have you thought about cases where you'd like to add a converted value to the XML (as an attribute for instance)? Like adding the md5 hash to an e-mail adress that can be used for Gravatar URL.

@jensscherbl
Copy link
Member Author

The terminology is a bit confusing to me because text formatting usually includes typographic refinement like phone number grouping and similar.

True...

The latter would be a modification in your concept because you'd add thin spaces.

Yep, that was the idea.

Have you thought about cases where you'd like to add a converted value to the XML (as an attribute for instance)? Like adding the md5 hash to an e-mail adress that can be used for Gravatar URL.

Nope. That would indeed be a good usecase for an input field with a text formatter. The "formatted" value could look something like this...

<email mode="formatted">
    <value hash="f8b28f035add93b3164ad85d36ba1a6d">
        <![CDATA[[email protected]]]>
    </value>
</email>

Maybe the differentiation between formatters and modifiers isn't such a good idea after all. I mean, formatters could still just provide a normal, CDATA wrapped string where appropriate. Would be perfectly valid XML as well.

@jensscherbl
Copy link
Member Author

On the other hand, this would also be a good example for the modifiers concept...

Don't mess up the email field with XML, keep it simple. Instead, create another field for the hash with a custom modifier that grabs the value from the email field (reflection style) and simply stores the hashed value.

Output would look like this...

<email>
    <![CDATA[[email protected]]]>
</email>
<hash>
    <![CDATA[f8b28f035add93b3164ad85d36ba1a6d]]>
</hash>

Or just allow modifiers to add attributes directly...

<email hash="f8b28f035add93b3164ad85d36ba1a6d">
    <![CDATA[[email protected]]]>
</email>

@nilshoerrmann
Copy link
Contributor

Maybe the differentiation between formatters and modifiers isn't such a good idea after all. I mean, formatters could still just provide a normal, CDATA wrapped string where appropriate. Would be perfectly valid XML as well.

I think the differentiation is good to define what "formatters" and "modifiers" do, which input is given to them and which output is expected. But while this is important from an engineering perspective, I personally think it makes things a lot more complicated for Symphony developers:

  • If formatters and modifiers were separate things, how could they be combined given the different kind of behaviour and the different types of output?
  • If formatters and modifieres could not be chained easily, what's the benefit of allowing chained formatters at all while sourcing out actions that might only make sense if applies consecutive (thinking of microtypography)?

From my own user perspective, I'd like to have one thing: chainable text formatters. These text formatters need to handle different things and this is where your concept really helps: there are formatters that convert to XML (structural changes) and others that alter the content (textual changes) – but this is something under the hood, something the developer of the formatter has to think about and take care of. This is nothing that I as a user want to think about. So maybe Symphony needs to provide different programming interfaces but only a single user interface.

@nilshoerrmann
Copy link
Contributor

By the way, have you thought about completely moving the formatters away from the field settings? What if formatters were never defined on the field but per data source? This would abtract the concept of a formatter and would allow me to alter the content of any field:

Think of Gravatars again: an e-mail address could be stored in an input field, could be referenced by an association field or could be provided by the author field. If there was a general hash formatter, I could use the field that best suits my data structure and I would still be able to get the hashed value. This would simplify all fields feature requirements because content conversion or modification would be handled on output and not on input.

@jensscherbl
Copy link
Member Author

By the way, have you thought about completely moving the formatters away from the field settings? What if formatters were never defined on the field but per data source?

Nope, never crossed my mind, but makes a lot of sense now.

Will have to think about it.

@jonmifsud
Copy link
Member

Adding it into the datasource is very interesting - it would I guess more or less be a replacement/alternate version of the :formatted option that exists at the moment.

@nitriques
Copy link
Member

completely moving the formatters away from the field settings?

I'd love that. Really, really love that idea... It could also make the the $mode parameter kind of useless and could be the change needed to implement the kind of behavior that you hacked in Subsection manager (multiple values in the DS included element merged together)

@brendo
Copy link
Member

brendo commented Dec 10, 2014

Some thoughts:

  • Multiple formatters are a great idea
  • Taglist is a little clumsy, but makes use of existing UI and is pretty quick to pick up
  • Doing formatting at presentation time is quite interesting and would solve the problem of changing a formatter on a field.
  • I think putting the formatted result in a cache is a great idea and may mitigate performance loss. I don't think it would totally though, as at the moment a datasource takes multiple queries to build the data, and then it'd require additional queries to find the formatted values for this data.
  • I wonder if there are any extensions that rely on 'formatted' values from other functions (not just appendFormattedElement)
  • Not totally convinced on modifiers yet, I think it add complication. I can see the value, but for most scenarios a formatter would suffice
  • I'm not opposed to moving formatters to the Datasource... but it needs a little more though. For example, how would you know to display the Markdown UI in the publish area?

@nitriques
Copy link
Member

I wonder if there are any extensions that rely on 'formatted' values from other functions

Yes, I do!

For example, how would you know to display the Markdown UI in the publish area?

I think it is a separate issue. Right now, the markdown UIs are not really tied up with the DS result. (Example editor_for_symphony, which uses a javascript markdown parser, gives to the user a different result than markdown smarty pants). So I would keep the existing field param only for UI

@brendo
Copy link
Member

brendo commented Dec 10, 2014

I think it is a separate issue. Right now, the markdown UIs are not really tied up with the DS result. (Example editor_for_symphony, which uses a javascript markdown parser, gives to the user a different result than markdown smarty pants). So I would keep the existing field param only for UI

Hang on a tick. Let's say you have a Markdown Formatter set via the Section Editor (current behaviour). At the moment, this will allow a markdown class to be added to the field's wrapper so other extensions can inject a Markdown GUI. The Markdown will be then be saved as value, and the Formatter will run as well saving it's result in value_formatted.

Now if you allow a second Formatter to be added at the Datasource level, what exactly can it do? It can't apply 'Textile Formatter', because the raw value is Markdown. It could do the little tasks described in this thread like whitespace trimming, generating Gravatar hashes etc., but I wonder in practice how much could be changed with a Datasource level formatter.

@jensscherbl
Copy link
Member Author

For example, how would you know to display the Markdown UI in the publish area?

UIs could be added to textareas separately from text formatters in the section editor.

Similar to what Nils did with Association Field.

See github.com/jensscherbl/textareanew#ui

@jensscherbl
Copy link
Member Author

Now if you allow a second Formatter to be added at the Datasource level

All formatters would run at the datasource level.

It could do the little tasks described in this thread like whitespace trimming, generating Gravatar hashes etc., but I wonder in practice how much could be changed with a Datasource level formatter.

It's mostly a win from a developer perspective. Would allow for more modular extensions. If you want to do certain things before or after transforming Markdown, you can still use an existing Markdown formatter and only create a new formatter for your custom task. Right now, if I want to use Smarty Pants or HTML Purifier with CommonMark or Textile instead of the default Markdown formatter, I have to include and maintain these libraries in each formatter extension separately.

@brendo
Copy link
Member

brendo commented Dec 10, 2014

I'm not saying multiple formatters is not a good idea. I agree it's a good
idea. What I'm still pondering is the usefulness if they are only applied
at the datasource level.

If the textarea implemented an interface like Associations, that still
doesn't answer how a formatter could deal with another formatter's work. If
your publish level formatter was Markdown and your DS level formatter was
Textile, what do you think would happen? The original value would be
written in Markdown, so the Textile formatter wouldn't be able to do
anything right?

Do you then have incompatible formatters and not allow them to be selected?

@nitriques
Copy link
Member

If your publish level formatter was Markdown

The publish level must not be a formatter IMO. It should only be related to the backend UI. Like I've said, it's already not compatible since the editor can allow things the formatter does not understand (or vice versa). Value in the DB must always be what the user typed, nothing more, nothing less.

Right now, if I want to use Smarty Pants or HTML Purifier with CommonMark or Textile instead of the default Markdown formatter, I have to include and maintain these libraries in each formatter extension separately.

And it's a real pain. We could parse to markdown first and then run another formatter on top and it should work (since markdown supports html content in it)

Do you then have incompatible formatters and not allow them to be selected?

I do not think so. I would be the developer's responsibility to chain them in a way that works.

@jonmifsud
Copy link
Member

We could parse to markdown first and then run another formatter on top and it should work (since markdown supports html content in it)

Correct and not correct - I've used Templated Text Formatters pretty much the concept of chaining we have here. If you want to do particular transformations, they would have to be done before markdown. I can't remember exactly but I think I had some xml formatted elements which would be removed from the output (could be bad design). But the extension basically allows one to define the chain/order. I guess it would be up to who is doing the chaining to make sure that they place them in the right order.

@nitriques
Copy link
Member

they would have to be done before markdown

Well, with what we are talking about, it would be possible...

@jonmifsud
Copy link
Member

Ah my bad re-read your comment :)

I guess what you mean is you change how Smarty Pants or HTML Purifier work by having them as separate modules to be plugged in after Markdown or any other alternative.

Also reserving a thought for those who like WYSIWYG editors, though not sure how they work, would I be thinking properly if it would mean that they would save the HTML version in the standard field value? Since they generate the code in the user's browser.

@nitriques
Copy link
Member

thinking properly if it would mean that they would save the HTML version in the standard field value?

Yes it would mean that. But then you could apply a formatter (like HTML tidy) to clean it up in your Data Source.

@jensscherbl
Copy link
Member Author

If your publish level formatter was Markdown and your DS level formatter was
Textile, what do you think would happen?

There wouldn't be any publish level formatters, only UIs.

The original value would be written in Markdown, so the Textile formatter wouldn't be able to do
anything right?

The situation would be exactly the same as today. What if you enter Markdown formattet text, but the field has a Textile formatter. Just wouldn't work as expected. Doesn't matter if you enter Markdown or HTML directly to a textarea or with a UI.

The publish level must not be a formatter IMO. It should only be related to the backend UI. Like I've said, it's already not compatible since the editor can allow things the formatter does not understand (or vice versa). Value in the DB must always be what the user typed, nothing more, nothing less.

Exactly.

If you want to do particular transformations, they would have to be done before markdown.

My concept implementation also allows for chaining in any order you like.

I guess what you mean is you change how Smarty Pants or HTML Purifier work by having them as separate modules to be plugged in after Markdown or any other alternative.

I guess it would be up to who is doing the chaining to make sure that they place them in the right order.

would be the developer's responsibility to chain them in a way that works.

Yep.

Also reserving a thought for those who like WYSIWYG editors, though not sure how they work, would I be thinking properly if it would mean that they would save the HTML version in the standard field value? Since they generate the code in the user's browser.

As far as I know, all WYSIWYG extensions for Symphony already work that way, since formatters are only allowed to modify the formatted value. A WYSIWYG editor generates HTML clientside, so it would have to convert the HTML to unformatted plaintext and store the result as unformatted value. That's exactly the opposite of what a Markdown or Textile formatter does and isn't possible at the moment.

@nitriques
Copy link
Member

@jensscherbl Looks like we totally agree 👍

@jensscherbl
Copy link
Member Author

Mentioned before, but to revive the discussion...

What if text formatters become part of the presentation layer (templating) instead of the data layer (field, data source)?

Ideally and from a purely conceptual point of view, shouldn't formatting and transformation be as task for XSLT (utilities)?

Now I certainly don't want to make the effort to implement a markdown parser in XSLT. But what about having formatters available as XSLT functions in your templates?

Something like this...

<article>

    <!-- example 1 -->

    <h1>
        <xsl:value-of select="sym:smartypants(article/entry/title)"/>
    </h1>

    <!-- example 2 -->

    <img alt="Art Vandelay" src="//www.gravatar.com/avatar/{sym:hash(article/entry/author/email)}"/>      

    <!-- example 3 -->

    <xsl:copy-of select="sym:commonmark(article/entry/text)/*"/>

    <!-- example 4 -->

    <xsl:for-each select="sym:commonmark(article/entry/text)/*">
        <xsl:copy-of select="p|ul|ol|blockquote"/>
    </xsl:for-each>

</article>

Good or bad idea?

Regarding implementation, I found EXSL Function Manager to be good start.

@michael-e
Copy link
Member

I like the idea, it seems really logical to me. But as I already mentioned, I am bit afraid of the performance drop. Unfortunately, I don't have any valid figures; I just noticed once that PHP Markdown, for example, is rather slow.

@jensscherbl
Copy link
Member Author

I am bit afraid of the performance drop

Obviously, text formatters would implement some sort of internal caching.

@nitriques
Copy link
Member

performance drop.

Obviously, text formatters would implement some sort of internal caching.

No the cache must be created at save time, that's the only way to not pay a hard price on first load...

@jensscherbl
Copy link
Member Author

No the cache must be created at save time, that's the only way to not pay a hard price on first load...

I don't agree. Most caches work on demand. Image caches, output caches, data source caches, bytecode caches...

@nitriques
Copy link
Member

They all have a first time cost. The idea is where you put it.

I would rather create something different (and keep current formatters)... They could be called output formatters. I totally agree that current formatters lack some pretty useful features, but sacrifying their perf is not a good idea.

@jensscherbl
Copy link
Member Author

@nitriques Let's see what other people prefer then. I'd either keep it or change it. Introducing yet another additional way to do things wouldn't be a good idea.

@nitriques
Copy link
Member

additional way to do things wouldn't be a good idea.

Why ? Because of maintenance ? (I am curious, I am not against it :) )

@michael-e
Copy link
Member

It might be a wise decision to:

  • first enable the core to handle output formatters (and caching?)
  • then let people create some output formatters
  • then deprecate current formatters and corresponding core code (if any)
  • and finally retire the current formatters as extensions and remove corresponding core code (if applicable)

Regarding caching I agree with @jensscherbl that "first load" caching might be fine. We should, however, make some profound tests to see how fast or slow those formatters actually are and if they have any significant impact on the first load of a page.

@nitriques
Copy link
Member

I would be sad to loose "permanent" formatters...

@michael-e
Copy link
Member

Why?

@nitriques
Copy link
Member

Because of perf. Both in the frontend and in my crazy ways I can poke with Symphony ;) But mainly because it's predictable and makes my end result deterministic, i.e., the formatted output is constant.

@michael-e
Copy link
Member

I get your point. If formatters are "in the frontend" (i.e. they work when the content is actually used), an update of the formatter could change the result. But the same thing can happen if you resave your entries for certain reasons while using a "classic" formatter.

@nitriques
Copy link
Member

But the same thing can happen if you resave your entries for certain reasons while using a "classic" formatter.

You're right. But when this happens, I get a phone call from my client. If it happens for a visitor of the website, I won't automatically know it.

@michael-e
Copy link
Member

I meant using the Resave Entries extension, as a developer. I sometimes need this, and in these situations I don't even think of HTML changes that could happen because of a Markdown extension update which has happened some weeks before. :-)

@nitriques
Copy link
Member

Ah, you're right. But I guess that you tested it after ;)

@jensscherbl
Copy link
Member Author

Apparently, Symphony can use custom text formatters from a ../workspace/text-formatters/ folder. I didn't know that. It's very similar to custom events or data sources...

<?php

class FormatterMarkdown extends TextFormatter
{
    private $commonMark;
    private $htmlPurifier;

    public function __construct()
    {
        $this->commonMark   = TextformatterManager::create('commonmark');
        $this->htmlPurifier = TextformatterManager::create('htmlpurifier');
    }

    public function about()
    {
        return array('name' => 'Markdown');
    }

    public function run($string)
    {
        $string = strip_tags($string);
        $string = $this->commonMark->run($string);
        $string = $this->htmlPurifier->run($string);
        $string = str_replace(array("\r", "\n"), '', $string);

        return $string;
    }
}

This is pretty much all I actually needed. Closing this.

@nitriques
Copy link
Member

😄 It's fun to stumble upon those things. Thanks for sharing.

@jensscherbl
Copy link
Member Author

It's fun to stumble upon those things.

Would be more fun to stumble upon those things in some kind of documentation instead of by chance while digging through the core... ;)

@nitriques
Copy link
Member

some kind of documentation i

Totally. Wanna create a wiki about it ? 😄

@jensscherbl
Copy link
Member Author

Wanna create

Nope. Just as lazy as everyone else when it comes to documentation... ;P

a wiki

Also, I would prefer a GitBook.

@nitriques
Copy link
Member

Yeah git book is great, but if we are all too lazy to write it, GitBook does not solve anything.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants