Table Recognition Dataset Tools

Command-line tools for comparing result files to the ground truth

Updated 2018-02-06: Command-line tools for comparing result files to the ground truth are now available. They are written in Java and can be downloaded as a JAR or ZIP with source code. The GitHub page is here; we would appreciate it if you would flag any issues or bugs. Thanks!

GUI tool for table region comparison, offset adjustment and automatic evaluation

We have now made a GUI tool available, which enables a visual comparison between PDF, ground truth and result table regions. It also enables automatic adjustment of coordinates by setting offsets and scale factors (which can also be negative). We hope that this tool will make it easier for participants to prepare their results for submission in our format.

The tool is written in Java and has been built for Linux and Windows, both in 32-bit and 64-bitversions. It can be downloaded here.It does not require installing; simply unzip the archive and run the pdfAnnotator executable.

Quick user guide

You can open either a single PDF file (File | Open PDF) or a ZIP file (File | Open ZIP). In either case, the tool will automatically load the corresponding ground truth and result files, as long as they conform to the following naming conventions:

  • PDF: filename.pdf
  • GT: filename-reg.xml
  • Result: filename-reg-result.xml

The tabs “Ground truth” and “Result” show the GT and result data and enable it to be edited and saved. If either GT or result is missing, it can be loaded by clicking on the “+” button.

It is possible to open several documents at once and generate results for a set of documents. These documents are displayed in the top-left frame. It is also possible to close an open document by right-clicking on it and selecting “Close”. If working on a complete dataset, we recommend putting everything in a ZIP file, which can be loaded by the tool at once.

The document window on the right-hand side shows a preview of the PDF (the tabs on the bottom switch between character elements and rendition). On the character element view, it is possible to show and hide the GT and result boxes as well as adjust offsets and scale factors to ensure that the coordinate systems match. The completeness and purity scores are also displayed for the current page; in order to generate them for the complete document as well as all other open documents, click on “Tools | Create Report”.

If you have any further questions, please feel free to get in touch.