Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RFC] Start task for publishing progit2 books from git #917

Merged
merged 3 commits into from
Jan 18, 2017

Conversation

jnavila
Copy link
Contributor

@jnavila jnavila commented Jan 3, 2017

Right now, it seems there is a local problem with utf-8 encoding in URLs.

This task is already language aware.

@peff
Copy link
Member

peff commented Jan 4, 2017

Thanks so much for starting on this! I don't really have enough expertise to offer intelligent comments on the implementation itself. I think the interesting thing would be to import a version we already have (from the existing publishing procedure) and then diff the results. That may be a bit complicated because it exists only in the database, but perhaps we can dump it into a filesystem representation (or get fancy with a Ruby diff gem).

@jnavila
Copy link
Contributor Author

jnavila commented Jan 4, 2017

The html generated by Atlas is quite different from the one generated by asciidoctor. So a simple diff on the generated html can't be performed.

As for dumping the values, I can do it on my dev db and post the two versions of some chapters for human comparison. Would there be a way to test-run this import? Maybe @schacon could help here and tell if this PR is close to what he was thinking.

Fiddling with the code, I realized that I haven't seen any code for polling and detecting changes in Atlas. I guess the task that imports books is triggered by an external hook from atlas, in which case the import from git won't be triggered for now.

@peff
Copy link
Member

peff commented Jan 4, 2017

Heroku runs rake genbook2 every 10 minutes. Though I think there is also an HTTP endpoint you can hit to add new books.

So as long as the rake task polls the progit2 repo for updates, I think we should be fine. That can run periodically, or it can be kicked off manually.

@jnavila
Copy link
Contributor Author

jnavila commented Jan 5, 2017

OK, so I don't really understand how the update of the books used to work. But I know for sure that I have to modify my task to enable the polling of the git repos.

@peff
Copy link
Member

peff commented Jan 5, 2017

I don't know how terrible it would be to simply have a file in the repository containing the sha1 of the progit2 commit we want to build, and update it manually. Sort of a poor-man's submodule.

There's a slight hassle because it requires an update here to pull in the new content. But it also means that we would not be surprised if the book makes changes that require us to tweak our formatting, etc.

I dunno. To be honest, the whole rails-app thing seems like huge overkill for what is essentially a static site. I kind of wonder if we could move to a system where the whole thing could be built locally, driven by make or something similar, and pushed up to a static host like GitHub Pages (or even the whole thing could be Jekyll-ified, and let GitHub do the build, though I think we are pushing the limits there between asciidoctoring the manpages and assembling the progit2 material).

@jnavila
Copy link
Contributor Author

jnavila commented Jan 6, 2017

The code already uses the ORM to manage the pages. The SHA1 of the HEADs can be stored in DB for later comparison.

As for the pseudo-module system, the path to book generation was copied on the way the git manual pages are generated: use octokit to access the github repos and retrieve the necessary files. The point here is that the book is sliced into sections and subsections which are served as standalone pages from DB, with the need to maintain the cross references between those pages (using a special service using a cross-ref table). Jekyll-ifying it would require a similar process, just that the resulting chunks would be stored to the filesystem instead of the DB, with the added burden to statically generate the correct cross-ref URLs. From what I infer, writing to the filesystem was avoided, I guess for security or due to Heroku limitations.

@peff
Copy link
Member

peff commented Jan 6, 2017

From what I infer, writing to the filesystem was avoided, I guess for security or due to Heroku limitations.

Yeah, my secret goal would actually be to take it off of Heroku. The site is largely static except for occasionally pulling in content from a few other sources. We pay Heroku $230/mo for hosting it (three 2x dynos, plus $80 in addons), which seems silly. It gets a lot of hits, I'm sure, but a CDN in front of a static site could handle that easily.

Anyway, that does not need to be resolved for this PR. I mostly just wanted to throw it out there as a possible future direction.

@jnavila
Copy link
Contributor Author

jnavila commented Jan 6, 2017

OK, Here is how it's supposed to work with Atlas:

  • the atlas build has a hook and it pushes to the site, via an http request for update processed in app/controllers/books_controller.rb which updates in DB the status of the book and the download URLs.
  • Then the crontab task generates the books according to the processed flag of each book in DB.

I plan to replace this by a polling loop of the repos. The thing is that I don't want to force an update of the schema of the database. So, I would reuse the existing ebook_html field to store the SHA1 of the HEAD of each repo.

@peff
Copy link
Member

peff commented Jan 6, 2017

That sounds like a good plan. The reuse of ebook_html is a little funny, but I think it conceptually makes sense, and it should be easy to distinguish the two cases by content if need be. I don't know how much pain there is in a schema update, though.

@jnavila jnavila force-pushed the local_progit2 branch 2 times, most recently from 79f0974 to 355396d Compare January 8, 2017 14:46
@jnavila
Copy link
Contributor Author

jnavila commented Jan 8, 2017

So I went ahead and made all the necessary changes to make this workflow workable. The task works with all the languages.

I plan to allow other languages without requiring the repos to be part of progit. For instance, the german teams has been asking to be incorporated for quite a long time now, and it would be quite easy to pull from the repo they designed from now on.

@peff
Copy link
Member

peff commented Jan 10, 2017

I plan to allow other languages without requiring the repos to be part of progit. For instance, the german teams has been asking to be incorporated for quite a long time now, and it would be quite easy to pull from the repo they designed from now on.

Yeah, that sounds like a good plan.

So what's the status of this PR? I was able to pull and build the book locally by setting GENLANG manually. So it seems to work OK, but wouldn't auto-update as the progit2 repo is updated, even if I added a schedule rake remote_genbook2 task.

I'm also not sure what will happen to the non-HTML versions of the book. They'll gradually go out of date with respect to what's on the site, right?

@jnavila
Copy link
Contributor Author

jnavila commented Jan 10, 2017

The status of this PR: that's good that you were able to reproduce my build. I have to check again all the cross links (they are all managed by the script) because it's not sure at all that they were all caught for all languages.

Now, the task polls the remote repo and rebuilds the pages in case of update. So it can be added to the schedule. The genbook2 must be disabled at the same time.

The non-web versions are not rebuilt at all. Asciidoctor would allow such a thing, but a lot of additional packages are missing, mostly for PDF.

I hope to let you know soon that the PR is ok for me. If you happen to know somebody versed in Ruby for a code review, that would be great.

This task is already language aware.
The new workflow is creating a single file, so the references to
images are all at one single place.
The URL contain translated texts which are UTF-8, so they need to be
escaped properly.
@jnavila
Copy link
Contributor Author

jnavila commented Jan 11, 2017

I had a bit more of work than expected:

  • paragraphs at the start of chapters were missing
  • some internal links were not collected
  • the xref and pre-link rewriters were buggy

For me, it's good now, I've checked with French that all the links work.

@jnavila
Copy link
Contributor Author

jnavila commented Jan 18, 2017

ping @peff . Seems OK to me.

@peff
Copy link
Member

peff commented Jan 18, 2017

Yeah, sorry, I've just been busy. I'll merge/deploy later today, and switch the cronjob over to your new rake task.

@peff peff merged commit 6325d8e into git:master Jan 18, 2017
@peff
Copy link
Member

peff commented Jan 18, 2017

OK, merged and deployed, and the book content rebuilt. I updated the cron job to run this rather than genbook2.

It was previously running every 10 minutes, which seemed like quite a lot. I've bumped it to daily, which is how often we built new manpages. If the really is getting updates that frequently, we can perhaps switch it to hourly.

@jnavila
Copy link
Contributor Author

jnavila commented Jan 18, 2017

OK, thanks. I'll monitor the issues for new issues.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants