Skip to content
This repository has been archived by the owner on Jan 3, 2018. It is now read-only.

Discuss: improving our build system #349

Closed
gvwilson opened this issue Feb 28, 2014 · 10 comments
Closed

Discuss: improving our build system #349

gvwilson opened this issue Feb 28, 2014 · 10 comments
Assignees

Comments

@gvwilson
Copy link
Contributor

Discussion of #298 (build takes too long), #329 (how to distinguish input from output), #342 (use Pandoc instead of Jekyll) and #347 (use Jekyll with translated files) should take place here. See also http://zonca.github.io/2014/02/build-software-carpentry-with-pelican.html.

Our lesson materials, and the template for bootcamp web sites, are in https://github.com/swcarpentry/bc. We keep the two together because instructors sometimes want to modify lessons for particular bootcamps.

We used to store all lessons as pure HTML. Some are now IPython Notebooks instead, and the rest are Markdown because people believe it's easier to write and manage than HTML.

GitHub uses Jekyll to translate Markdown and/or HTML into a web site for the repo (which is visible at github.io). It does not run other tools, or even Jekyll extensions, for security reasons.

This means that our IPython Notebooks don't show up in bootcamp web sites as HTML pages (which is probably OK). It also means that if people want to do local builds, they have to install IPython, which is probably also OK, except the number of tools we need to install is going to grow as we use other input formats for lessons on other subjects. We'd like a general solution.

Meanwhile, the Markdown supported by Jekyll (and hence by GitHub) has some significant limitations. In particular, we cannot add attributes to fenced code blocks: while Pandoc and some other tools support this:

~~~ {.in}
~~~

to create:

<pre class="in">
</pre>

Jekyll doesn't. This means that we can't (easily) distinguish input and output cells in our shell and Git lessons (which are stored as Markdown). Learners don't like this.

We can use explicit <pre class="in"> tags in our Markdown source files, but if we do, we have to replace special characters like < with their ampersand equivalents. Authors don't like this.

We want:

  1. To use existing tools (that we don't have to write/maintain) as much as possible.
  2. To make writing/updating lessons as painless as possible.
  3. To make lessons clear to read (which means distinguishing input from output).
  4. To allow people to use new lesson file formats without completely rethinking our build system.

#342 (posted earlier this week) attempted to fix some of this by getting rid of Jekyll and:

  • Using IPython's nbconvert to turn notebooks into Markdown files.
  • Using Pandoc to turn Markdown into HTML.

This lets us add class attributes to fenced code blocks, but:

  1. GitHub won't run Pandoc for us, so bootcamp websites won't auto-build lessons.
  2. Some versions of Pandoc are very slow.

Andrea Zonca then built http://zonca.github.io/2014/02/build-software-carpentry-with-pelican.html, which uses a different website compiler called Pelican to do conversation. It also uses a continuous integration (CI) tool called Travis to automatically rebuild sites when changes are committed, rather than relying on GitHub's built-in tool. This:

  1. Increases the number of moving parts in the build system that we have to maintain.
  2. Requires instructors to configure Travis for their bootcamps.

#347 (posted today) is a third run at this. It expects people to commit Markdown versions of non-Markdown content to the repository, then uses Jekyll as before. This:

  1. Doesn't solve the styles-for-code-blocks problem.
@ahmadia
Copy link
Contributor

ahmadia commented Feb 28, 2014

You've pointed out some of the advantages of pandoc, I just want to make sure you understand some of its limitations as well.

I can't address all of these issues, but I think you should be very careful about using pandoc as a default MarkDown parser for IPython Notebooks. Due to its inability to pass raw LaTeX from Markdown -> HTML, pandoc doesn't support embedding mathematics as smoothly as Marked. The IPython team now uses Marked + node (which they use on the front-end IPython Notebook for MarkDown processing) to generate static views of their notebooks.

@ahmadia
Copy link
Contributor

ahmadia commented Feb 28, 2014

You can't do the following:

~~~ {.in}
cd here
~~~

But you can do:

<div class="input">

 ```
 cd here
 ```

 </div>

Unfortunately, those blank lines between the div statements are syntactically significant.

I don't think people were opposed to using HTML when needed, I just don't think they wanted to write in HTML.

@rgaiacs
Copy link

rgaiacs commented Feb 28, 2014

  1. Some versions of Pandoc are very slow.

This shouldn't be a big problem now because this problem was fixed this week.

@zonca
Copy link
Contributor

zonca commented Mar 1, 2014

The annoying thing about setting up Travis-CI is encrypting the Github token that allows pushing the website online, because it requires a Ruby environment to install the travis gem (quite easy with ubuntu/debian sudo gem install travis, but not on other systems).
So I've created a javascript application that does that in the browser calling the Travis-CI APIs:
http://travis-encrypt.github.io/

Now instructors can just go there, put their organization/repository, then GH_TOKEN=longgithubtoken, and have it encrypted in their browser.

Once the instructors have the encrypted token, they can just paste it to the .travis.yml and set travis up with a couple of clicks from http://travis-ci.org.

For pelican there is no software to maintain, we just have to configure pelican itself.
On the other hand this gives a lot of flexibility.

For example it already builds the ipython notebooks to html, but in the future I would recommend to keep under version control only the notebooks with no output, then have Travis run all of them and store in the website the .ipynb with outputs and the HTML.

Also if we need to run R, we can run it on Travis.

@jkitzes
Copy link

jkitzes commented Mar 1, 2014

If we do move in the direction of requiring local builds of the bootcamp repo's, I'm +1 for Pelican over Jekyll (don't have as much experience with pandoc so can't comment on that), mostly because almost all of the instructors use Python, which suggests that Pelican would be easier to maintain and extend in the long run by the community.

@gvwilson
Copy link
Contributor Author

gvwilson commented Mar 1, 2014

@jkitzes If we adopt #347, then instructors will only have to build files locally if:

  1. the source for that file is not Markdown or HTML (e.g., it's an IPython Notebook), and
  2. they have modified it in some way (i.e., it's different from what's in this repo), and
  3. they want learners to see a nicely-rendered HTML version of the file.

Putting it another way:

  1. If the file is HTML or Markdown, it will be rendered on the bootcamp website automatically by Jekyll.
  2. If it's in some other format, and they haven't modified it, the Markdown or HTML that Jekyll needs will be in their repo (pulled in from this one).
  3. If they don't care whether their learners see a nice web page (e.g., if they just want an IPython Notebook for people to download) then this entire discussion is moot.

The second of these isn't quite true yet: the cached versions of Markdown-ized IPython Notebooks are currently stored in a cached directory, so they'll show up in the "wrong" place in an automatic Jekyll build on GitHub, but I know how to fix that, and will do so if this looks like a good plan.

The major disadvantage of #347 is the whole "classes for pre blocks" thing, but the only solution to that is to use something other than Jekyll to build the pages. That gets us back to either:

  1. a continuous integration tool (like Travis, which Andrea is recommending) that rebuilds everything for each bootcamp repo when something is pushed, or
  2. building everything ourselves (e.g., with Pandoc), committing the generated files to the repo, and having GitHub render those without any Jekyllization.

I don't like either of these:

  1. The first adds more moving parts that require knowledge most of our instructors don't have. Yes, they can "just" follow an algorithm to hook things up, but we've seen that fail in other contexts.
  2. The second puts all the build burden on us, instead of just the build burden for filetypes that Jekyll doesn't handle.

@jkitzes
Copy link

jkitzes commented Mar 2, 2014

@gvwilson, that's a good summary, and I both see what you're saying and I'm convinced by your conclusions about extra moving parts and local builds. At this point, I personally like #347 and think it's the closest we've gotten so far - I think it nicely meets the needs of almost all instructors (while providing a relatively easy way for those who want that last piece, HTML customized IPython notebooks, to get that too). I also don't think it's a big deal to have to use the raw HTML for the fenced code blocks.

A separate question on notebook rendering - have we considered just using nbviewer.ipython.org to do the rendering? For example, we create an introductory Markdown page that lists all of the notebooks, and then provide a link both (1) the raw notebook file, and (2) nbviewer.ipython.org/notebook-path (the latter is the alternative to the cached/notebook-name link that would reference our in-repo file). Wouldn't use the SWC style sheet and only works online, but would avoid local builds and would provide the pre/post indicators as they appear in the notebook.

@gvwilson
Copy link
Contributor Author

gvwilson commented Mar 3, 2014

@jkitzes (and everyone else): I have updated #347 to use a combination of div and ~~~ for input and output blocks in Markdown files. It's clunky, but:

  1. It works.
  2. If we ever get a version of Jekyll that'll take class attributes on fences, it'll be trivial to convert.

More importantly, the PR now also puts the .md files generated from .ipynb files in the same directories as those .ipynb files, so that we don't have to move things around from a cached directory to the source directory during or after the build. I'm a bit uncomfortable putting build products in a source directory, but Jekyll...

@gvwilson
Copy link
Contributor Author

Closed by #347.

@cboettig
Copy link

cboettig commented Aug 1, 2014

Hi folks,

Sorry to jump in here over my head, was hoping someone could clarify the rendering workflow ultimately settled on here for me, as I can't seem to reconstruct it from the referenced pull requests.

@jdblischak pointed me to this thread in regards to our discussion of how knitr should be rendering code blocks in the markdown. He currently uses the approach described here: #524 (comment) which puts a lot of html tags into the markdown, resulting in a syntax that displays incorrectly on Github's markdown rendering (in particular, swallowing the first line of input code). I'm a bit concerned that the heavy mixing of html and markdown results in a chimerical format whose rendering is hard to predict. If this is just an intermediary format that no one should look at, maybe pure html would be best? If anyone is intended to look at these .md files, it seems it should be possible to write them in a cleaner kramdown-flavored syntax? (e.g ~~~r in place of the <code> tags that don't declare a language).

He does this for consistency with the python markdown format, but I'm afraid I don't understand what markdown flavour is being rendered for the python examples or how it is being generated. Ultimately I understand that you are using Jekyll with kramdown to render markdown syntax, but the example python md files I see look primarily like HTML: https://raw.githubusercontent.com/swcarpentry/bc/master/novice/python/01-numpy.md ? Seems like both that example and the knitr markdown outputs would be happier as pure html or as pure kramdown (possibly with a div for the in and out class)?

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

6 participants