Tabula
Tabula is a tool for liberating data tables trapped inside PDF files
Install / Use
/learn @tabulapdf/TabulaREADME
Is tabula an active project?
Tabula is, and always has been, a volunteer-run project. We've occasionally had funding for specific features, but it's never been a commercial undertaking. At the moment, none of the original authors have the time to actively work on the project. The end-user application, hosted on this repo, is unlikely to see updates from us in the near future. tabula-java sees updates and occasional bug-fix releases from time to time.
--
Repo Note: The master branch is an in development version of Tabula. This may be substantially different from the latest releases of Tabula.
Tabula
Tabula helps you liberate data tables trapped inside PDF files.
- Download from the official site
- Read more about Tabula on OpenNews Source
- Interested in using Tabula on the command-line? Check out tabula-java, a Java library and command-line interface for Tabula. (This is the extraction library that powers Tabula.)
© 2012-2020 Manuel Aristarán. Available under MIT License. See
AUTHORS.md and LICENSE.md.
- Why Tabula?
- Using Tabula
- Known issues
- Incorporating Tabula into your own project
- Running Tabula from source (for developers)
- Contributing
Why Tabula?
If you’ve ever tried to do anything with data provided to you in PDFs, you know how painful this is — you can’t easily copy-and-paste rows of data out of PDF files. Tabula allows you to extract that data in CSV format, through a simple web interface.
Caveat: Tabula only works on text-based PDFs, not scanned documents. If you can click-and-drag to select text in your table in a PDF viewer (even if the output is disorganized trash), then your PDF is text-based and Tabula should work.
Security Concerns?: Tabula is designed with security in mind. Your PDF and the extracted data never touch the net -- when you use Tabula on your local machine, as long as your browser's URL bar says "localhost" or "127.0.0.1", all processing takes place on your local machine. Other than to retrieve a few badges and other static assets, there are two calls that are made from your browser to external machines; one fetches the list of latest Tabula versions from GitHub to alert you if Tabula has been updated, the other makes a call to a stats counter that helps us determine how often various versions of Tabula are being used. If this is a problem, the version check can be disabled by adding -Dtabula.disable_version_check=1 to the command line at startup, and the stats counter call can be disabled by adding -Dtabula.disable_notifications=1. Please note: If you are providing Tabula as a service using a reverse SSL proxy, users may notice a security warning due to our stats counter endpoint being hosted at a non-secure URL, so you may wish to disable the notifications in this scenario.
Using Tabula
First, make sure you have a recent copy of Java installed. You can download Java here. Tabula requires a Java Runtime Environment compatible with Java 7 (i.e. Java 7, 8 or higher). If you have a problem, check Known Issues first, then report an issue.
-
Windows
Download
tabula-win.zipfrom the download site. Unzip the whole thing and open thetabula.exefile inside. A browser should automatically open to http://127.0.0.1:8080/ . If not, open your web browser of choice and visit that link.To close Tabula, just go back to the console window and press "Control-C" (as if to copy).
-
Mac OS X
Download
tabula-mac.zipfrom the download site. Unzip and open the Tabula app inside. A browser should automatically open to http://127.0.0.1:8080/ . If not, open your web browser of choice and visit that link.To close Tabula, find the Tabula icon in your dock, right-click (or control-click) on it, and press "Quit".
Note: If you’re running Mac OS X 10.8 or later, you might get an error like "Tabula is damaged and can't be opened." We're working on fixing this, but click here for a workaround.
-
Linux snap
Tabula is packaged as a snap package. If you have snap on your system, you can install it with
sudo snap install tabula -
Other platforms (e.g. Linux)
Download
tabula-jar.zipfrom the download site and unzip it to the directory of your choice. Open a terminal window, andcdto inside thetabuladirectory you just unzipped. Then run:java -Dfile.encoding=utf-8 -Xms256M -Xmx1024M -jar tabula.jarThen manually navigate your browser to http://127.0.0.1:8080/ (New in Tabula 1.1. To go back to the old behavior that automatically launches your web browser, use the
-Dtabula.openBrowser=trueoption.Tabula binds to port 8080 by default. You can change it with the
warbler.portoption; for example, to use port 9999:java -Dfile.encoding=utf-8 -Xms256M -Xmx1024M -Dwarbler.port=9999 -jar tabula.jar -
Docker Compose quick start using Amazon Correttto image
Make a new directory e.g.
tabulapdfand enter it.mkdir -p /opt/docker/tabulapdfcd /opt/docker/tabulapdfDownload tabula-jar package - for example version 1.2.1
wget https://github.com/tabulapdf/tabula/releases/download/v1.2.1/tabula-jar-1.2.1.zipverify checksum (compare output with the release page)
sha256sum tabula-jar-1.2.1.zipand unzip it.
unzip tabula-jar-1.2.1.zipPlace or create a
docker-compose.ymlfile, adjust accordingly### tabulapdf docker-compose.yml example ### services: tabulapdf: image: amazoncorretto:17 container_name: tabulapdf-app command: > java -Dfile.encoding=utf-8 -Xms256M -Xmx1024M -Dwarbler.port=8080 -Dtabula.openBrowser=false -jar /app/tabula.jar volumes: - ./tabula:/app ports: - "8080:8080"Run the app with
docker compose up -dThe app will be exposed on port 8080 and can be easily paired with a reverse proxy e.g. traefik
If the program fails to run, double-check that you have Java installed and then try again.
<a name="knownissues">Known issues</a>
There are some bugs that we're aware of that we haven't managed to fix yet. If there's not a solution here or you need more help, please go ahead and report an issue.
-
<a name='legacy'>Legacy Java Environment (SE 6) Is Required:</a> (Mac): The Mac operating system recently changed how it packages the Java Runtime Environment. If you get this error, download Tabula's "large experimental" package. This package includes its own Java Runtime Environment and should work without this issue.
-
<a name='gatekeeper'>"Tabula is damaged and can't be opened" (Mac)</a>: If you’re running Mac OS X 10.8 or later, GateKeeper may prevent you from opening the Tabula app. Please see this GateKeeper page for more information.
- Right-click on Tabula.app and select Open from the context menu.
- The system will tell you that the application is "from an unidentified developer" and ask you whether you want to open it. Click Open to allow the application to run. The system remembers this choice and won't prompt you again.
(If you continue to have issues, double-check the OS X GateKeeper documentation for more information.)
-
<a name='encoding'>org.jruby.exceptions.RaiseException: (Encoding::CompatibilityError) incompatible character encodings:</a> (Windows): Your Windows computer expects a type of encoding other than Unicode or Windows's English encoding. You can fix this by entering a few simple commands in the Command Prompt. (The commands won't affect anything besides Tabula.)
- Open a Command Prompt
- type
cdand then the path to the directory that containstabula.exe, e.g.cd C:\Users\Username\Downloads - Change that terminal's codepage to Unicode by typing:
chcp 65001 - Run Tabula by typing
tabula.exe
-
<a name='portproblems'>A browser tab opens, but something other than Tabula loads there. Or Tabula doesn't start.</a> It's possible another program is using port 8080, which Tabula binds to by default. You can try closing the other program, or change the port Tabula uses by running Tabula from the terminal with the
warbler.portproperty:java -Dfile.encoding=utf-8 -Xms256M -Xmx1024M -Dwarbler.port=9999 -jar tabula.jar
Incorporating Tabula into your own project
Tabula is open-source, so we'd love for you to incorporate pieces of Tabula into your own projects. The "guts" of Tabula -- that is, the logic and heuristics that reconstruct tables from PDFs -- is contained in the tabula-java repo. There's a JAR file that you can easily incorporate into JVM languages l
Related Skills
node-connect
334.9kDiagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps
frontend-design
82.3kCreate distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.
summarize
334.9kSummarize or extract text/transcripts from URLs, podcasts, and local files (great fallback for “transcribe this YouTube/video”).
feishu-doc
334.9k|
