Advanced

This section outlines a few advanced options for using ImageDataExtractor.

Large-scale Extraction

ImageDataExtractor can be used for high-throughput data extraction using two methods:

>>> ide.extract_images('<path/to/img/dir>')
>>> ide.extract_documents('<path/to/docs/dir>')

These run the extract_image and extract_document methods sequentially on every file in the target directory.

ImageDataExtractor also supports .zip, .tar and .tar.gz inputs.

In addition to the output graphs, images are in the following locations for transparency:

raw_images : Contains the images that were downloaded from the article
split_photo_images : Individual images after the first splitting algorithm (whitespace)
spit_grid_images : Individual images after the second splitting algorithm (integer fractions of the grid)

CSV's containing metadata are also outputted to:

To specify an output directory add this as the second argument to any function. For example:

>>> ide.extract_image('<path/to/image/file>', '<path/to/output>')

wll save all outputs to <path/to/output>

« Getting Started

Contributing »