Installation

This section outlines the steps required to install ImageDataExtractor.

We strongly advise the use of a virtual environment when installing ImageDataExtractor (Click here to learn how.)

Step 1: Install Tesseract 4

ImageDataExtractor currently uses Tesseract 4 for text recognition. You can check your existing version by running:

$ tesseract -v

The source code for the correct installation can be downloaded here if required. Instructions for compiling on your machine can be found here.

Step 2: Install ImageDataExtractor

Option A: Installation with pip (recommended)

Installation with pip is the simplest option. Simply run:

pip install imagedataextractor

Option B: Installation from source

Clone the repo and move into the directory:

git clone https://github.com/by256/imagedataextractor.git
cd imagedataextractor

Activate your virtual environment and install:

python setup.py install

Step 3: Download Data Files Necessary for Document Extraction

Finally, download the data files necessary to be able to use ChemDataExtractor-based document extraction:

cde data download

and you're ready to go!