Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Preprocessor to embed markdown images #1067

Merged
merged 13 commits into from
Feb 17, 2018

Conversation

juhasch
Copy link
Member

@juhasch juhasch commented Aug 25, 2017

Images in markdown ![](myimage.png) will be added as attachments to the cell.

TODO

  • Test
  • Documentation

@soamaven
Copy link
Contributor

Just to be explicit, looks like this only embeds MD referenced images. HTML embedding still needs to be done. Glad to see the progress though :D

@soamaven
Copy link
Contributor

For some reason, I cant get attachments working with my jupyter nb (v5.0.0), so I can't test some things.

Is it possible to refer to an attachment in the html? E.g.:
'<img src="...' width="50%">
becomes
'<img src=":myimage.png.' width="50%">

if so that makes embedded images actually very nice to deal with in MD cells, and gives full html control over display.

@juhasch
Copy link
Member Author

juhasch commented Aug 26, 2017

Attachments only work with markdown images. If you use base64 images in html img tags, they will be sanitized by the notebook.

So it will need to be a 2 step solution:

  1. a preprocessor to embed markdown images as attachments. This allows
    jupyter nbconvert -to notebook mynotebook.ipynb --EmbedImagesPreprocessor.embed_images=True to create a copy of the notebook with embedded images.
  2. an html exporter, that will embed all images.

@soamaven
Copy link
Contributor

soamaven commented Aug 26, 2017

If you use base64 images in html img tags, they will be sanitized by the notebook.

Thanks, this is an important detail. When does this happen? Upon render and kernel startup?

This PR #1067 helps to embed MD images -- the htmlexporter.py didn't do this.

What I am looking to achieve is:

  1. embed html AND/OR markdown images before export
  2. export to slides. I want to have one presentation file I can send to collaborators that works in their browser. Very similar to how a powerpoint presentation is one file.

It seems like step one is not possible for html base64 because of sanitization, which means step 2 will fail. Is the only option to re-implement the --to slides exporter to also embedhtml images?

@juhasch
Copy link
Member Author

juhasch commented Aug 26, 2017

I believe you need a new exporter to embed images and convert to slides.

Maybe @mpacer can share his thoughts what a good approach would be ?

@gabyx
Copy link
Contributor

gabyx commented Aug 30, 2017

So would the goal for the export_embedded extension be that for
"Download Embedded HTML"

  • call embedhtml.py exporter which uses this preprocessor first (is that possible to hook up), and afterwards replaces all remaining image tags and outputs .html

"Download Embedded Notebook"

  • call exporter notebook with your preprocessor (which should also handle inside Markdown and store them as attachements)

So the path would be -> first transform the notebook to a new notebook that all images are embedded -> export to HTML

So then for the HTML export we can use the default normal HTML exporter

@gabyx
Copy link
Contributor

gabyx commented Aug 30, 2017

@juhasch Could you please have a look at my branch (where I tried to use your preprocessor inside embed_html) it is called but somehow does not work, key error in self.attachements, do you know why?
Link to Commit

@juhasch
Copy link
Member Author

juhasch commented Aug 31, 2017

Actually, all you need to do is copy the html_embed exporter and replace the base class:

class EmbedSlidesExporter(SlidesExporter):

And instantiate your exporter in the from_notebook_node function accordingly

    def from_notebook_node(self, nb, resources=None, **kw):
        output, resources = super(
            EmbedSlidesExporter, self).from_notebook_node(nb, resources)
...

Now, what I don't like is that we now have the code for embedding twice, so we should find a solution for that.

You don't need the preprocessor. It is useful if you want to stay in higher-level formats, i.e. notebook to notebook conversion.

@gabyx
Copy link
Contributor

gabyx commented Sep 26, 2017

Bugfix:
bugfix.patch.zip

match.string is not doing what it should do, match.group(0) is the whole match, whereas
match.string is the whole line strangely....

Embedding images in notebooks
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

.. autoclass:: EmbedPreprocessor
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should be EmbedImagesPreprocessor, no?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would say so too, its more explicit :-)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

its more explicit

but more importantly, at the moment I don't think EmbedPreprocessor is an extant class, so I don't think it'll produce anything at all...

@jcb91
Copy link
Member

jcb91 commented Oct 16, 2017

So where is this sat, at the moment? @gabyx are you saying there's a bug in the current implementation? Can you give a little more explanation as to what that is, and maybe an example of how to trigger it?

else:
return match.string
elif url.startswith('attachment'):
return match.string
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[asd](attachement:asd.jpg) # this comment...
will result here in
[asd](attachement:asd.jpg) # this comment... # this comment...
it should be match.group(0)

Copy link
Contributor

@gabyx gabyx Oct 16, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

match.string is not what should be returned if we do not want to substitute... (no?)
match.string is the string passed to match/search... https://docs.python.org/3/library/re.html#re.match.string

if self.embed_remote_images:
data = urlopen(url).read()
else:
return match.string
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dito as below

@mpacer
Copy link

mpacer commented Oct 17, 2017

  1. If this is getting a full EmbedPreprocessor working that is compatible with the existing HTMLExporter and SlidesExporter (and ideally the MarkdownExporter); I'd be interested to restart the conversation on how that would fit into nbconvert core.
  2. as far as needing a separate exporter, if the only difference is enabling the EmbedPreprocessor, you might want to wait for [WIP] Download converted documents with uploadable configuration  jupyter/notebook#2413 to land, since it creates a new endpoint that you can use to include a config file (json) and nbconvert will attach that config file to the exporter, meaning you could set this by just having prespecified config files associated with the extension

@juhasch juhasch merged commit 86050e8 into ipython-contrib:master Feb 17, 2018
@juhasch juhasch deleted the feature/pre_embedimages branch February 17, 2018 12:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants