Tabula helps you liberate data tables trapped inside PDF files.
© 2012-2013 Manuel Aristarán. Available under MIT License. See
If you’ve ever tried to do anything with data provided to you in PDFs, you know how painful this is — you can’t easily copy-and-paste rows of data out of PDF files. Tabula allows you to extract that data in CSV format, through a simple web interface:
{TODO: screenshot / screencast here}
Caveat: Tabula only works on text-based PDFs, not scanned documents.
An Amazon EC2 AMI image is provided to give you a chance to boot up a quick test server: ami-e895f081
You can find a simple how-to in docs/
Note the EC2 instance types and EC2 pricing. We’re not responsible for any costs this may incur.
Also, please note that this image is a development demo image and may not be secure. Using this AMI for mission-critical or sensitive documents is currently not recommended.
(Note: A comprehensive, mostly copy-and-paste set of instructions is available for
OS X users that normally don't do Ruby development but are interested bootstrapping
Tabula on their own computer: docs/
Install Ruby and JRuby. Tabula has been tested with Ruby 1.9.3 and JRuby 1.7.3. Use of a Ruby version manager is recommended. Both rbenv and RVM are fine choices. (JRuby is required to interface with
, but native Ruby must also be used sinceruby-opencv
is a natively compiled extension.)If using rbenv:
rbenv install 1.9.3-p392 rbenv install jruby-1.7.3
If using rvm:
rvm install 1.9.3-p392 rvm install jruby-1.7.3
(Mac OS X only) Download and install XQuartz:
Install the rest of the dependencies: (TODO: instructions for non-OSX platforms.)
# Install Python, setuptools, and pip. You can skip this # if you already have them. brew install python curl | python curl | python # Install numpy (feel free to put it in a virtualenv); opencv dependency pip install numpy # Add the "science" tap to Homebrew so it can find OpenCV (if you haven't already) brew tap homebrew/science brew install opencv --with-tbb --with-opencl --with-qt brew install mupdf redis
Download Tabula and install the Ruby dependencies. (Note: ensure that
is configured for the standard Ruby interpreter, not JRuby)git clone git:// cd tabula gem install bundler bundle install
Configure Tabula: Copy
. Editlocal_settings.rb
to the path to thejruby
executable.If you are using rbenv, you can find the path to
by doing:RBENV_VERSION='jruby-1.7.3' rbenv which jruby
Start redis-server
in a separate terminal tab
redis-server /usr/local/etc/redis.conf
Next, you need to start resque
and the actual web server. You can run both
of those using Foreman by running the
bundle exec foreman start
The site instance should now be viewable at
Interested in helping out? See
for ideas.