Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create sphinx documentation for Read the Docs #329

Merged
merged 49 commits into from
Nov 7, 2019
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
49 commits
Select commit Hold shift + click to select a range
d0f407a
Setup sphinx
pietermarsman Oct 27, 2019
26cf7a4
Specify master doc in conf.py because default is contents.rst
pietermarsman Oct 27, 2019
298ac32
Initial version of documentation
pietermarsman Oct 27, 2019
d93a73f
Updated descriptions of pdf2txt.py arguments and use these descriptio…
pietermarsman Oct 27, 2019
9417341
Fix sphinx error by including requirements for sphinx
pietermarsman Oct 27, 2019
4ad06a1
Fix relative import by using module specification that works if PYTHO…
pietermarsman Oct 27, 2019
ed732e1
Add root folder of project to path
pietermarsman Oct 27, 2019
5a1d151
Try relative to file
pietermarsman Oct 27, 2019
ba3fafa
Add css file for better table layout
pietermarsman Oct 27, 2019
a188c9b
Try with docutil.conf
pietermarsman Oct 27, 2019
3630d6e
move docutils
pietermarsman Oct 27, 2019
cbe85dc
move docutils
pietermarsman Oct 27, 2019
f387652
Try option limit 0
pietermarsman Oct 27, 2019
3be5668
move docutil
pietermarsman Oct 27, 2019
a653c09
Remove docutil.conf
pietermarsman Oct 27, 2019
bb18618
Added dumppdf.py documentation
pietermarsman Oct 27, 2019
f16ca56
Updated pdf2txt.py documentation, put long argument name first
pietermarsman Oct 27, 2019
1889469
Fix typo
pietermarsman Oct 27, 2019
751b1e0
Adding doc on high_level
pietermarsman Oct 29, 2019
c43dd0f
Add link do docs in README
igormp Nov 1, 2019
a02978d
Merge pull request #326 from igormp/develop
pietermarsman Nov 2, 2019
7a108ca
Remove todo from readme
pietermarsman Nov 2, 2019
5215a64
Add link to new documentation
pietermarsman Nov 2, 2019
f3a796e
Restructure
pietermarsman Nov 2, 2019
4e2e50d
Fix headers
pietermarsman Nov 2, 2019
b357c24
Add tests for documentation
pietermarsman Nov 2, 2019
d7aac7f
Add doctest to travis
pietermarsman Nov 2, 2019
e6ad4e5
Separate docs testing
pietermarsman Nov 2, 2019
9c5a626
Specify python for docs
pietermarsman Nov 2, 2019
ceff49a
Add docstrings to layout.py
pietermarsman Nov 2, 2019
fd93cb7
WIP: workign on topic guide on layout analysis
pietermarsman Nov 3, 2019
7993019
Updated README.md
pietermarsman Nov 3, 2019
15f0cfd
Make name lowercase
pietermarsman Nov 3, 2019
f5ec01a
Extend description of layout analysis
pietermarsman Nov 4, 2019
50b7348
Merge branch 'develop' into documentation
pietermarsman Nov 4, 2019
7854a88
Remove old docs
pietermarsman Nov 5, 2019
8b3e55d
Merge branch 'documentation' of github.com:pdfminer/pdfminer.six into…
pietermarsman Nov 5, 2019
8eafaa0
Added to CHANGELOG.md
pietermarsman Nov 5, 2019
1b7fb5d
Merge branch 'develop' into documentation
pietermarsman Nov 7, 2019
bd13148
Cleanup imports after merge
pietermarsman Nov 7, 2019
7db8c5a
(Attempt to) run both docs and py37 tox env on travis
pietermarsman Nov 7, 2019
2834965
Put all checks into 1 tox env
pietermarsman Nov 7, 2019
8ef6d4a
Add docs to extra_requires setup.py section
pietermarsman Nov 7, 2019
37df1b0
Fix error in tox.ini
pietermarsman Nov 7, 2019
3800ead
Recreate tox env on every travis run
pietermarsman Nov 7, 2019
181f80f
Make doctest work on py2
pietermarsman Nov 7, 2019
2d65c0f
Fix unicode to str error in python2 (#333)
igormp Nov 7, 2019
b629346
Add documentation for extract_text
pietermarsman Nov 7, 2019
b594278
Merge branch 'documentation' of github.com:pdfminer/pdfminer.six into…
pietermarsman Nov 7, 2019
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .travis.yml
Original file line number Diff line number Diff line change
Expand Up @@ -9,4 +9,4 @@ python:
install:
- pip install tox-travis
script:
- tox
- tox -r
3 changes: 3 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,9 @@ The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/).
### Added
- Simple wrapper to easily extract text from a PDF file [#330](https://github.com/pdfminer/pdfminer.six/pull/330)
- Support for extracting JBIG2 encoded images ([#311](https://github.com/pdfminer/pdfminer.six/pull/311) and [#46](https://github.com/pdfminer/pdfminer.six/pull/46))
- Sphinx documentation that is published on
[Read the Docs](https://pdfminersix.readthedocs.io/)
([#329](https://github.com/pdfminer/pdfminer.six/pull/329))

### Fixed
- Unhandled AssertionError when dumping pdf containing reference to object id 0
Expand Down
68 changes: 18 additions & 50 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,21 +1,22 @@
PDFMiner.six
pdfminer.six
============

PDFMiner.six is a fork of PDFMiner using six for Python 2+3 compatibility
[![Build Status](https://travis-ci.org/pdfminer/pdfminer.six.svg?branch=master)](https://travis-ci.org/pdfminer/pdfminer.six)
[![PyPI version](https://img.shields.io/pypi/v/pdfminer.six.svg)](https://pypi.python.org/pypi/pdfminer.six/)
[![gitter](https://badges.gitter.im/pdfminer-six/Lobby.svg)](https://gitter.im/pdfminer-six/Lobby?utm_source=badge&utm_medium)

[![Build Status](https://travis-ci.org/pdfminer/pdfminer.six.svg?branch=master)](https://travis-ci.org/pdfminer/pdfminer.six) [![PyPI version](https://img.shields.io/pypi/v/pdfminer.six.svg)](https://pypi.python.org/pypi/pdfminer.six/)

PDFMiner is a tool for extracting information from PDF documents.
Pdfminer.six is an community maintained fork of the original PDFMiner. It is a
tool for extracting information from PDF documents.
Unlike other PDF-related tools, it focuses entirely on getting
and analyzing text data. PDFMiner allows one to obtain
and analyzing text data. Pdfminer.six allows one to obtain
the exact location of text in a page, as well as
other information such as fonts or lines.
It includes a PDF converter that can transform PDF files
into other text formats (such as HTML). It has an extensible
PDF parser that can be used for other purposes than text analysis.

* Webpage: https://github.com/pdfminer/
* Download (PyPI): https://pypi.python.org/pypi/pdfminer.six/
Check out the full documentation on
[Read the Docs](https://pdfminersix.readthedocs.io).


Features
Expand All @@ -33,53 +34,20 @@ Features
* Automatic layout analysis.


How to Install
--------------
How to use
----------

* Install Python 2.7 or newer.
* Install
* Install Python 2.7 or newer. Note that Python 2 support is dropped at
January, 2020.

`pip install pdfminer.six`

* Run the following test:

`pdf2txt.py samples/simple1.pdf`


Command Line Tools
------------------

PDFMiner comes with two handy tools:
pdf2txt.py and dumppdf.py.

**pdf2txt.py**

pdf2txt.py extracts text contents from a PDF file.
It extracts all the text that are to be rendered programmatically,
i.e. text represented as ASCII or Unicode strings.
It cannot recognize text drawn as images that would require optical character recognition.
It also extracts the corresponding locations, font names, font sizes, writing
direction (horizontal or vertical) for each text portion.
You need to provide a password for protected PDF documents when its access is restricted.
You cannot extract any text from a PDF document which does not have extraction permission.

(For details, refer to /docs/index.html.)

**dumppdf.py**

dumppdf.py dumps the internal contents of a PDF file in pseudo-XML format.
This program is primarily for debugging purposes,
but it's also possible to extract some meaningful contents (e.g. images).

(For details, refer to /docs/index.html.)


TODO
----
* Use command-line interface to extract text from pdf:

* PEP-8 and PEP-257 conformance.
* Better documentation.
* Performance improvements.
`python pdf2txt.py samples/simple1.pdf`

* Check out more examples and documentation on
[Read the Docs](https://pdfminersix.readthedocs.io).


Contributing
Expand Down
1 change: 1 addition & 0 deletions docs/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
build/
20 changes: 20 additions & 0 deletions docs/Makefile
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
# Minimal makefile for Sphinx documentation
#

# You can set these variables from the command line, and also
# from the environment for the first two.
SPHINXOPTS ?=
SPHINXBUILD ?= sphinx-build
SOURCEDIR = source
BUILDDIR = build

# Put it first so that "make" without argument is like "make help".
help:
@$(SPHINXBUILD) -M help "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)

.PHONY: help Makefile

# Catch-all target: route all unknown targets to Sphinx using the new
# "make mode" option. $(O) is meant as a shortcut for $(SPHINXOPTS).
%: Makefile
@$(SPHINXBUILD) -M $@ "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)
225 changes: 0 additions & 225 deletions docs/cid.obj

This file was deleted.

Binary file removed docs/cid.png
Binary file not shown.
Loading