This version of ImageDataExtractor is deprecated. Find the latest version here.


This section outlines a few advanced options for using ImageDataExtractor.

Large-scale Extraction

ImageDataExtractor can be used for high-throughput data extraction using two methods:

>>> ide.extract_images('<path/to/img/dir>')
>>> ide.extract_documents('<path/to/docs/dir>')  

These run the extract_image and extract_document methods sequentially on every file in the target directory.

ImageDataExtractor also supports .zip, .tar and .tar.gz inputs.

Output locations

In addition to the output graphs, images are in the following locations for transparency:

CSV's containing metadata are also outputted to:

To specify an output directory add this as the second argument to any function. For example:

>>> ide.extract_image('<path/to/image/file>', '<path/to/output>')

wll save all outputs to <path/to/output>