Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update for expanded manubot package #48

Merged
merged 6 commits into from
Aug 9, 2017
Merged
Show file tree
Hide file tree
Changes from 5 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
13 changes: 6 additions & 7 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -1,12 +1,11 @@
# Generated manuscript output files
output/index.html
output/manuscript.pdf
output/manuscript.docx
output/*.ots
output/*
!output/README.md

# Generated reference files
references/generated/*
!references/generated/README.md
webpage/*.ots

# Manubot cache directory
ci/cache

# Python
__pycache__/
Expand Down
2 changes: 1 addition & 1 deletion .travis.yml
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@ script:
- sh build/build.sh
cache:
directories:
- references/generated
- ci/cache
after_success:
- test $TRAVIS_BRANCH = "master" &&
test $TRAVIS_PULL_REQUEST = "false" &&
Expand Down
146 changes: 125 additions & 21 deletions CONTRIBUTING.md
Original file line number Diff line number Diff line change
@@ -1,30 +1,134 @@
# Manuscript contribution guidelines
# Manubot usage & contribution guidelines

## Markdown
This repository uses the [Manubot](https://github.com/greenelab/manubot) to automatically produce a manuscript from its source.

The paper will be written using markdown. Markdown files use the `.md` extension.
Check out the [CommonMark Help](http://commonmark.org/help/) page for an introduction to formatting options provided by markdown.
In addition, to standard markdown features, we support [markdown tables](https://help.github.com/articles/organizing-information-with-tables/ "GitHub Help: Organizing information with tables") and a custom citation syntax.
Check out [Tables Generator](http://www.tablesgenerator.com/markdown_tables) for creating markdown tables.
## Manubot markdown

The custom citation guidelines are as follows:
Manuscript text should be written in markdown files, which should be located in [`content`](content) with a digit prefix for ordering (e.g. `01.`, `02.`, etc.) and a `.md` extension.

1. Always use a DOI for the version of record if available.
Cite DOIs like `[@doi:10.15363/thinklab.4]`.
2. If the version of record doesn't have a DOI but does have a PubMed ID, cite like `[@pmid:26158728]`.
3. If the article is an _arXiv_ preprint, cite like `[@arxiv:1508.06576]`.
4. If and only if the article has none of the above, cite with the URL like `[@url:http://openreview.net/pdf?id=Sk-oDY9ge]`.
For basic formatting, check out the [CommonMark Help](http://commonmark.org/help/) page for an introduction to the formatting options provided by standard markdown.
In addition, manubot supports an extended version of markdown, tailored for scholarly writing.

You cite multiple items at once like `[@doi:10.15363/thinklab.4 @pmid:26158728 @arxiv:1508.06576]`.
### Tables

The system also supports tags, which may be helpful when a single reference is cited many times.
For example, you can add a reference to the tag `study_x` using the following syntax: `[@tag:study_x]`.
If you add references that use tags, make sure to add those tags and their corresponding citations to [`references/tags.tsv`](references/tags.tsv).
Manubot supports [markdown tables](https://help.github.com/articles/organizing-information-with-tables/ "GitHub Help: Organizing information with tables").

If the automatically extracted reference information contains errors, it can be [manually overridden](references/README.md#reference-overrides).
```md
| Column 1 | Column 2 | Column 3 |
|----------|----------|----------|
| value_a | 1 | 47 |
| value_b | 2 | 56 |

## Authorship information
Table: Caption for this example table. {#tbl:example-id}
```

Authorship information and order is extracted from [`authors.tsv`](../content/authors.tsv).
To add an author, insert a row into this table.
We recommend authors add themselves to `authors.tsv` via pull request (when requested by a maintainer), thereby signaling that they've read and approved the manuscript.
Support for table numbering and citation is provided by [`pandoc-tablenos`](https://github.com/tomduck/pandoc-tablenos).
Above, `{#tbl:example-id}` sets the table ID, which creates an HTML anchor and allows citing the table like `@tbl:example-id`.
For easy creation of markdown tables, check out the [Tables Generator](http://www.tablesgenerator.com/markdown_tables) webapp.

### Figures

Figures can be included with the following markdown:

```md
![Caption for the example figure.](url_or_path_to_figure){#fig:example-id}
```

Support for figure numbering and citation is provided by [`pandoc-fignos`](https://github.com/tomduck/pandoc-fignos).
This figure can be cited in the text using `@fig:example-id`.
In context, a figure citation may look like: `Figure {@fig:example-id}B shows …`.

For images created by the manuscript authors that are hosted elsewhere on GitHub, we recommend using a [versioned](https://help.github.com/articles/getting-permanent-links-to-files/) GitHub URL to embed figures, thereby preserving exact image provenance.
When embedding SVG images hosted on GitHub, passing the URL through [RawGit](https://rawgit.com/) is necessary.
An example of a URL that has been passed through RawGit is:

```
https://cdn.rawgit.com/greenelab/scihub/572d6947cb958e797d1a07fdb273157ad9154273/figure/citescore.svg
```

Figures placed in the [`content/images`](content/images) directory can be embedded using their relative path.
For example, we embed an [ORCID](https://orcid.org/) icon inline using:

```md
![ORCID icon](images/orcid.svg){height="13px"}
```

The bracketed text following the image declaration is interpreted by Pandoc's [`link_attributes`](http://pandoc.org/MANUAL.html#extension-link_attributes) extension.
For example, the following will override the figure number to be "S1" and set the image width to 5 inches:

```md
{#fig:supplement tag="S1" width="5in"}
```

### Citations

Manubot supports Pandoc [citations](http://pandoc.org/MANUAL.html#citations) via `pandoc-citeproc`.
However, Manubot performs automated citation processing and metadata retrieval on raw citations.
Therefore, citations must be of the following form: `@source:identifier`, where `source` is one of the options described below.
When choosing which source to use for a citation, we recommend the following order:

1. DOI (Digital Object Identifier), cite like `@doi:10.15363/thinklab.4`.
2. PubMed ID, cite like `@pmid:26158728`.
3. _arXiv_ ID, cite like `@arxiv:1508.06576v2`.
4. URL / webpage, cite like `@url:http://openreview.net/pdf?id=Sk-oDY9ge`.

Cite multiple items at once like:

```md
Here is a sentence with several citations [@doi:10.15363/thinklab.4; @pmid:26158728; @arxiv:1508.06576].
```

Note that multiple citations must be semicolon separated.

#### Citation tags

The system also supports citation tags, which are recommended for the following applications:

1. A citation's identifier contains forbidden characters, such as `;` or ending with a non-alphanumeric character other than `/`.
In these instances, you must use a tag.
2. A single reference is cited many times.
Therefore, it might make sense to define a tag, so if the citation updates (e.g. a newer version becomes available), only a single change is required.

Tags should be defined in [`content/citation-tags.tsv`](content/citation-tags.tsv).
If `citation-tags.tsv` defines the tag `study-x`, then this study can be cited like `@tag:study-x`.

## Reference metadata

The Manubot workflow requires the bibliographic details for references (the set of all cited works) as CSL (Citation Style Language) Items (also known as [citeproc JSON](http://citeproc-js.readthedocs.io/en/latest/csl-json/markup.html#csl-json-items)).
The Manubot attempts to automatically retrieve metadata and generate valid citeproc JSON for references, which is exported to `output/references.json`.
However, in some cases the Manubot fails to retrieve metadata or generates incorrect or incomplete citeproc metadata.
Errors are most common for `url` references.
For these references, you can manually specify the correct citeproc in [`content/manual-references.json`](content/manual-references.json), which will override the automatically generated reference data.
To do so, create a new citeproc record that contains the field `"standard_citation"` with the appropriate reference identifier as its value.
The identifier can be obtained from the `standard_citation` column of `citations.tsv`, which is located in the `output` branch or in the `output` subdirectory of local builds.
As an example, `manual-references.json` contains:

```json
"standard_citation": "url:https://github.com/greenelab/manubot-rootstock"
```

For guidance on what CSL JSON should be like for different document types, refer to [these examples](https://github.com/aurimasv/zotero-import-export-formats/blob/a51c342e66bebd97b73a7230047b801c8f7bb690/CSL%20JSON.json).

## Manuscript metadata

[`content/metadata.yaml`](content/metadata.yaml) contains manuscript metadata that gets passed through to Pandoc, via a [`yaml_metadata_block`](http://pandoc.org/MANUAL.html#extension-yaml_metadata_block).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could state here what author information is required and what is optional.

`metadata.yaml` should contain the manuscript `title`, `authors` list, and `keywords`.
Additional metadata, such as `date`, will automatically be created by the Manubot.

We recommend authors add themselves to `metadata.yaml` via pull request (when requested by a maintainer), thereby signaling that they've read and approved the manuscript.
The following YAML shows the supported key–value pairs for an author:

```yaml
github: dhimmel # strongly suggested
name: Daniel S. Himmelstein # mandatory
initials: DSH # strongly suggested
orcid: 0000-0002-3012-7446 # mandatory
twitter: dhimmel # optional
email: [email protected] # suggested
affiliations: Department of Systems Pharmacology and Translational Therapeutics, University of Pennsylvania # strongly suggested
funders: GBMF4552 # optional
```

## Manubot feedback

If you experience any issues with the Manubot or would like to contribute to its source code, please visit [`greenelab/manubot`](https://github.com/greenelab/manubot) or [`greenelab/manubot-rootstock`](https://github.com/greenelab/manubot-rootstock).
37 changes: 32 additions & 5 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,10 +15,36 @@ To see what's incoming, check the [open pull requests](https://github.com/greene
Instructions for using Manubot Rootstock for your own manuscript are still evolving.
The recommended approach is to clone this repository, as detailed [here](https://github.com/greenelab/manubot-rootstock/issues/6#issuecomment-314541837).

## Source
## Repository directories & files

The manuscript source is located in [`content`](content).
Text should be written in markdown files, with a digit prefix for ordering (e.g. `01.`, `02.`, etc.) and a `.md` suffix.
See [`CONTRIBUTING.md`](CONTRIBUTING.md) for documentation on how to create or contribute to a manuscript.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could split CONTRIBUTING.md so that CONTRIBUTING.md contains only the Manubot feedback section and a new USAGE.md (or similar) contains all of the documentation.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think fewer files the better.

USAGE.md is more accurate than CONTRIBUTING.md. Note that GitHub has integration with CONTRIBUTING files. Whenever a user opens a new issue or PR, they will see the link the contributing docs.

Is this good or bad? Perhaps it's a bit misleading to label usage guidelines and contributing guidelines?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

USAGE.md seems to fit the contents better, but the GitHub integration is nice. I was imagining that a cloned manuscript may want one guide describing what type of contributions are welcome, what authorship criteria are being used, etc. and a separate document with instructions. In deep review we ended up putting that information in the readme instead of CONTRIBUTING.md though.

I don't have a strong opinion so go with whatever you feel is best.


The directories are as follows:

+ [`content`](content) contains the manuscript source, which includes markdown files as well as inputs for citations and references.
+ [`output`](output) contains the outputs (generated files) from the manubot including the resulting manuscripts.
You should not edit these files manually, because they will get overwritten.
+ [`webpage`](webpage) is a directory meant to be rendered as a static webpage for viewing the HTML manuscript.
+ [`build`](build) contains commands and tools for building the manuscript.
+ [`ci`](ci) contains files necessary for deployment via continuous integration.
For the CI configuration, see [`.travis.yml`](.travis.yml).

## Local execution

To run the Manubot locally, install the [conda](https://conda.io) environment as described in [`build`](build).
Then, you can build the manuscript on POSIX systems by running the following commands.

```sh
# Activate the manubot conda environment
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this link or refer to the build directory to explain what the manubot environment is?

source activate manubot

# Build the manuscript
sh build/build.sh

# View the manuscript locally at http://localhost:8000/
cd webpage
python -m http.server
```

## Continuous Integration

Expand All @@ -28,7 +54,7 @@ When you make a pull request, Travis CI will test whether your changes break the
The build process aims to detect common errors, such as invalid references.
If your build fails, see the Travis CI logs for the cause of failure and revise your pull request accordingly.

When a pull request is merged, Travis CI performs the build and writes the results to the [`gh-pages`](https://github.com/greenelab/manubot-rootstock/tree/gh-pages) and [`references`](https://github.com/greenelab/manubot-rootstock/tree/references) branches.
When a pull request is merged, Travis CI performs the build and writes the results to the [`gh-pages`](https://github.com/greenelab/manubot-rootstock/tree/gh-pages) and [`output`](https://github.com/greenelab/manubot-rootstock/tree/output) branches.
The `gh-pages` branch hosts the following URLs:

+ **HTML manuscript** at https://greenelab.github.io/manubot-rootstock/
Expand All @@ -49,7 +75,7 @@ All files matched by the following glob patterns are dual licensed under CC BY 4

+ `*.sh`
+ `*.py`
+ `*.yml`
+ `*.yml` / `*.yaml`
+ `*.json`
+ `*.bib`
+ `*.tsv`
Expand All @@ -60,5 +86,6 @@ All other files are only available under CC BY 4.0, including:
+ `*.md`
+ `*.html`
+ `*.pdf`
+ `*.docx`

Please open [an issue](https://github.com/greenelab/manubot-rootstock/issues) for any question related to licensing.
File renamed without changes.
File renamed without changes.
20 changes: 13 additions & 7 deletions build/build.sh
Original file line number Diff line number Diff line change
Expand Up @@ -5,13 +5,17 @@ export LC_ALL=en_US.UTF-8

# Generate reference information
echo "Retrieving and processing reference metadata"
(cd build && python references.py)
manubot \
--content-directory=content \
--output-directory=output \
--cache-directory=ci/cache \
--log-level=INFO

# pandoc settings
CSL_PATH=references/style.csl
DOCX_PATH=references/pandoc-reference.docx
BIBLIOGRAPHY_PATH=references/generated/bibliography.json
INPUT_PATH=references/generated/all-sections.md
CSL_PATH=build/assets/style.csl
DOCX_PATH=build/assets/pandoc-reference.docx
BIBLIOGRAPHY_PATH=output/references.json
INPUT_PATH=output/manuscript.md

# Make output directory
mkdir -p output
Expand All @@ -32,7 +36,7 @@ pandoc --verbose \
--css=github-pandoc.css \
--include-in-header=build/assets/analytics.js \
--include-after-body=build/assets/anchors.js \
--output=output/index.html \
--output=output/manuscript.html \
$INPUT_PATH

# Create PDF output
Expand All @@ -44,13 +48,14 @@ wkhtmltopdf \
--margin-bottom 17 \
--margin-left 0 \
--margin-right 0 \
output/index.html \
webpage/index.html \
output/manuscript.pdf

# Create DOCX output when user specifies to do so
if [ "$BUILD_DOCX" = "true" ];
then
echo "Exporting Word Docx manuscript"
ln --symbolic content/images images
pandoc --verbose \
--smart \
--from=markdown \
Expand All @@ -63,4 +68,5 @@ then
--reference-docx=$DOCX_PATH \
--output=output/manuscript.docx \
$INPUT_PATH
rm --recursive images
fi
Loading