Welcome to the documentation for IQDM-PDF!

IQDM-PDF

build Documentation Status PyPI Python Version lgtm lgtm code quality Codecov Lines of code Repo Size Code style: black

What does it do?

Scans a directory for IMRT QA reports and parses data into a CSV.

Other information

This library is part of the IMRT QA Data Mining (IQDM) project for the AAPM IMRT Working Group (WGIMRT).

Dependencies

Install

Latest PyPI release:

$ pip install iqdmpdf

Install from source:

$ python setup.py install

If you do not have a C++ compiler installed, you might have issues with installing the latest version of pdfminer.six. The following might resolve your issue:

$ pip install pdfminer.six==20200726

Usage

To scan a directory for IMRT QA report files and generate a results .csv file into your current directory:

$ iqdmpdf [init_directory]

As of v0.2.2, multi-threading is enabled. For example, you can enable 4 simultaneous threads with the following:

$ iqdmpdf [init_directory] -n 4

usage: iqdmpdf [-h] [-ie] [-od OUTPUT_DIR] [-of OUTPUT_FILE] [-ver] [-nr]
               [-re] [-n PROCESSES]
               [init_directory]

Command line interface for IQDM-PDF

positional arguments:
  init_directory        Initiate scan here

optional arguments:
  -h, --help            show this help message and exit
  -ie, --ignore-extension
                        Script will check all files, not just ones with .pdf
                        extensions
  -od OUTPUT_DIR, --output-dir OUTPUT_DIR
                        Output stored in local directory by default, specify
                        otherwise here
  -of OUTPUT_FILE, --output-file OUTPUT_FILE
                        Output will be saved as <report_type>_results_<time-
                        stamp>.csv by default. Define this tag to customize
                        file name after <report_type>_
  -ver, --version       Print the IQDM version
  -nr, --no-recursive-search
                        Include this flag to skip sub-directories
  -re, --raise-errors   Allow failed file parsing to halt the program
  -n PROCESSES, --processes PROCESSES
                        Enable multiprocessing, set number of parallel
                        processes

Vendor Compatibility

We plan to support many vendors. If the report is very consistent, a new JSON file in the report_templates is essentially all that is needed. Additional documentation for custom templates can be found here.

Credits

Development Lead

  • Dan Cutright

Contributors

  • Marc Chamberland

  • Aditya Panchal

Test Data

Example IMRT QA reports used for unit testing and design are available here.

  • Dan Cutright, University of Chicago Hospital
    • delta4/UChicago

    • sncpatient/UChicago

  • Marc Chamberland, University of Vermont Health Network
    • sncpatient/UVermontHealthNetwork

  • Serpil Kucuker Dogan, Nortwestern Memorial Hospital
    • sncpatient/Northwestern_Memorial

    • sncpatient2020/Northwestern_Memorial

  • Aditya Panchal, AMITA Health
    • verisoft/AMITA_Health

  • Michael Snyder, Beaumont Health
    • sncpatient/Beaumont

How It Works

IQDM-PDF uses pdfminer.six to extract text and coordinates from IMRT QA PDF files.

Step 1: Match Report Parser

Each report parser has an identifiers property which contains words and phrases used to uniquely pair a PDF to a report parser. If all of the identifiers are found in the PDF text, that report parser will be selected.

Step 2: Parse Data by Text Box Coordinates

The text data is collected with the selected report parser, which is stored by page and bounding box coordinates. Report parsers can look up a text value by page and coordinate.

Step 3: Apply Template

Unless customized logic is needed, a GenericParser class can be used, which reads in a JSON file containing three keys: report type, identifiers, and data. Required keys of data are column, page, and pos. For further customization, see the get_block_data function documentation in CustomPDFReader. All keys from data (except column) are passed.

Check out the report templates on GitHub for examples.

In the simplest case, a report parser class looks something like the following [source]:

class SNCPatientReport2020(GenericReport):
    """SNCPatientReport parser for the new format released in 2020"""

    def __init__(self):
        """Initialization of a SNCPatientReport class"""
        template = join(DIRECTORIES["REPORT_TEMPLATES"], "sncpatient2020.json")
        GenericReport.__init__(self, template)

Then update the REPORT_CLASSES list in parser.py to include the new report parser class.

Step 4: Iterate

From the command-line, you can iterate over all files in a provided directory, and save the results into a CSV file per vendor/template:

$ iqdmpdf your/initial/dir

Or from a python console:

>>> from IQDMPDF.file_processor import process_files
>>> process_files("your/initial/dir")

Non-Template Based Parsing

If the data in the reports have varying coordinates, the code needs more customization. See the Delta4 report parser for examples/inspiration.

Generally speaking, the LAParams for pdfminer.six are customized (e.g., char_margin, line_margin) to get sections of the IMRT QA report text to be collected into one block. Then key words are used to connect data to variable names. Another trick is to look up the positions boxes containing key words, then use the y-position to search for another block of text laterally (used frequently in the PTW Verisoft parser).

These methods are needed if reports have variable templates, fonts, or font sizes. So far, all of IQDM-PDF’s parsers are non-template based, with the exception of the new SNC Patient format introduced in 2020.

Building a New Template

Currently, building a new JSON template requires some python scripting to determine coordinates. The output from the following code will show all text bounding box coordinates and contents.

>>> from IQDMPDF.pdf_reader import CustomPDFReader
>>> data = CustomPDFReader("path/to/report.pdf")
>>> print(data)

Below is a sample of the output from: example_reports/sncpatient/UChicago/DCAM_example_1.pdf

page_index: 0, data_index: 21
bbox: [6.24, 445.18, 140.33, 463.88]
Absolute Dose Comparison
Difference (%)

page_index: 0, data_index: 22
bbox: [79.2, 445.18, 88.84, 452.14]
 : 2

page_index: 0, data_index: 23
bbox: [6.24, 432.94, 51.47, 439.9]
Distance (mm)

page_index: 0, data_index: 24
bbox: [79.2, 432.94, 88.84, 439.9]
 : 2

page_index: 0, data_index: 25
bbox: [6.24, 420.7, 49.8, 427.66]
Threshold (%)

page_index: 0, data_index: 26
bbox: [79.2, 420.7, 98.37, 427.66]
 : 10.0

The data object in the resulting JSON file for this data would look like:

[
    {"column": "Difference (%)", "page": 0, "pos": [79.2, 441.02]},
    {"column": "Distance (mm)", "page": 0, "pos": [79.2, 432.94]},
    {"column": "Threshold (%)", "page": 0, "pos": [79.2, 420.7]}
]

Note that the value for column doesn’t need to match any text in the PDF.

The pos element is assumed to be the bottom left corner of the bounding box by default. If the PDF layout has centered or right-aligned elements, you can specify mode to be any combination of bottom/center/top and left/center/right. (e.g., top-right or center-left; center is equivalent to center-center).

For example, if an element is more consistently found at the center of a bounding box, the data element could look like:

{
  "column": "Difference (%)",
  "page": 0,
  "pos": [88.79, 424.18],
  "mode": "center"
}

IQDM PDF

PDF Reader Module

Read PDF files into python objects

class IQDMPDF.pdf_reader.CustomPDFReader(file_path, laparams_kwargs=None)[source]

Bases: object

Custom PDF Parsing module

Initialize a CustomPDFReader object

Parameters

file_path (str) – Absolute file path to the PDF to be read

convert_pdf_to_text()[source]

Extract text and coordinates from a PDF

get_bbox_of_data(text, return_all=False, include_text=False)[source]

Get the bounding box for a given string

Parameters
  • text (str) – Check all parsed data for this string. Return the first bounding box that contains this text. Meant to search for a unique str

  • return_all (bool) – If true, then return a list containing all matches, in the order pdfminer.six found them

  • include_text (bool) – If true, also return the text data

Returns

“page”->int and “bbox”->[x0, y0, x1, y1]. If include_data is true, “text”->str will contain the text data. If return_all is true, return a list of these dict objects.

Return type

dict, list

get_block_data(page, pos, tol=10, text_cleaner=None, numeric=None, ignored=None, mode='bottom-left')[source]

Use PDFPageParser.get_block_data for the provided page

Parameters
  • page (int) – The index of the PDF page

  • pos (tuple of int, float) – The (x,y) coordinates of the text block to be retrieved

  • tol (int, float, tuple) – Maximum distance a block’s x or y-coordinate may be from pos. If a tuple is provided, first value is the x_tolerance, 2nd is y_tolerance

  • text_cleaner (callable, optional) – A function called on each text element (e.g., remove leading ‘:’)

  • numeric (bool, optional) – If true, only return value if it is numeric. If false, only return value if it is not numeric. Leave as None to ignore this feature.

  • ignored (list, optional) – Optionally provide a list of strings that should be ignored. If the value of the block data is in this list, the value will become an empty string instead

  • mode (str, optional) – Options are combinations of top/center/bottom and right/center/left, e.g., ‘top-right’, ‘center-right’. ‘center’ is assumed to be ‘center-center’. Default is ‘bottom-left’.

Returns

All text data that meet the input constraints

Return type

list of str

class IQDMPDF.pdf_reader.PDFPageParser(lt_objs, page_data, page_index=0)[source]

Bases: object

Custom PDF Page Parsing module

Initialization of PDFPageParser

Parameters
  • lt_objs (list) – A layout object from PDFPageAggregator.get_result()._objs

  • page_data (dict) – A dictionary of lists, with keys ‘x’, ‘y’, ‘text’

  • page_index (int, optional) – The index of the page

get_block_data(pos, tol, text_cleaner=None, numeric=None, ignored=None, mode='bottom-left')[source]

Get the text block data by x,y coordinates

Parameters
  • pos (list of int, float) – The (x,y) coordinates of the text block to be retrieved

  • tol (int, float, tuple) – Maximum distance a block’s x or y-coordinate may be from pos. If a tuple is provided, first value is the x_tolerance, 2nd is y_tolerance

  • text_cleaner (callable, optional) – A function called on each text element (e.g., remove leading ‘:’)

  • numeric (bool, optional) – If true, only return value if it is numeric. If false, only return value if it is not numeric. Leave as None to ignore this feature.

  • ignored (list, optional) – Optionally provide a list of strings that should be ignored. If the value of the block data is in this list, the value will become an empty string instead

  • mode (str, optional) – Options are combinations of top/center/bottom and right/center/left, e.g., ‘top-right’, ‘center-right’. ‘center’ is assumed to be ‘center-center’. Default is ‘bottom-left’.

Returns

All text data that meet the input constraints

Return type

list of str

parse_obj(lt_objs)[source]

Extract x, y, and text data from a layout objects

Parameters

lt_objs (list) – A layout object from PDFPageAggregator.get_result()._objs

sort_all_data(sort_key, reverse=False)[source]

Sort all parsed data by sort_key

Parameters
  • sort_key (str) – Either ‘x’ or ‘y’

  • reverse (bool) – Passes into standard library sorted() function

sort_all_data_by_y()[source]

Sort parsed data by y coordinate

sub_sort_all_data_by_x()[source]

Sort each row of data by x-coordinate, keeping y order

IQDMPDF.pdf_reader.convert_pdf_to_txt(path)[source]

Extract text from a PDF

Parameters

path (str) – Absolute file path to the PDF to be read

Returns

The text content of the PDF

Return type

str

File Processor

Process IMRT QA file(s) into CSV file(s)

IQDMPDF.file_processor.print_callback(msg)[source]

Simple print callback for process_files

Parameters

msg (dict) – The message sent from process_files

IQDMPDF.file_processor.process_file(file_path, output_file, output_dir=None)[source]

Process a pdf file into a parser class, write data to csv

Parameters
  • file_path (str) – PDF file to processed

  • output_file (str) – Report type in file name will be prepended to this value

  • output_dir (str, optional) – Save results to this directory, default is local directory

IQDMPDF.file_processor.process_file_worker(file_path)[source]

Mutliprocessing worker function

Parameters

file_path (str) – PDF file to be passed to ReportParser

Returns

{“data”: ReportParser.csv_data, “report_type”: ReportParser.report_type, “columns”: ReportParser.columns}

Return type

dict

IQDMPDF.file_processor.process_files(init_directory, ignore_extension=False, output_file=None, output_dir=None, no_recursive_search=False, callback=None, raise_errors=False, processes=1)[source]

Process all pdf files into parser classes, write data to csv

Parameters
  • init_directory (str) – initial scanning directory

  • ignore_extension (bool, optional) – Set to True to catch pdf files that are missing .pdf extension

  • output_file (str, optional) – Report type in file name will be prepended to this value

  • output_dir (str, optional) – Save results to this directory, default is local directory

  • no_recursive_search (bool, optional) – Ignore sub-directories it True

  • callback (callable) – Pointer to a function to be called before each process_file call. The parameter will be dict with keys of “label” and “gauge”.

  • raise_errors (bool) – Set to True to allow errors to be raised (useful for debugging)

  • processes (int) – Number of parallel processes allowed

IQDMPDF.file_processor.validate_kwargs(kwargs, add_print_callback=True)[source]

Process kwargs from main for process_files

Parameters
  • kwargs (dict) – Keyword arguments for main. See main.create_arg_parser for valid arguments

  • add_print_callback (bool) – If true, add simple print function at the start of each process_file call

Returns

Returns a dict containing only keywords applicable to process_files, or an empty dict if “init_directory” is missing or “print_version” is True and “init_directory” is missing

Return type

dict

IQDMPDF.file_processor.write_csv(file_path, rows, mode='w', newline='')[source]

Create csv.writer, call writerows(rows)

file_pathstr

path to file

rowslist, iterable

Items to be written to file_pointer (input for csv.writer.writerows)

modestr

optional string that specifies the mode in which the file is opened

newlinestr

controls how universal newlines mode works. It can be None, ‘’, ‘

‘, ‘ ‘, and ‘ ‘

Unified Report Parser

Unified IMRT QA report parser

class IQDMPDF.parsers.parser.ReportParser(file_path)[source]

Bases: object

Determines which Report class to use, then processes the data.

Initialization class for ReportParser

Parameters

file_path (str) – File path pointing to an IMRT QA report

property columns

Get columns headers for csv

Returns

Report columns + “report_file_creation” + “report_file_path”

Return type

list

property csv_data

Get a csv string from the selected ReportParser

Returns

Report columns + “report_file_creation” + “report_file_path”

Return type

str

get_report()[source]

Determine the report_class, then return class with data processed

Returns

Searches for a Report Class with matching identifiers, processes the file and returns the Report Class

Return type

ParserBase inherited class

property report_type

Get report type of the selected ReportParser

Returns

Get ReportParser.report_type

Return type

str

Generic Report Parser

Generic IMRT QA report parser

class IQDMPDF.parsers.generic.GenericReport(json_file_path, text_cleaner=None)[source]

Bases: IQDMPDF.parsers.generic.ParserBase

Generic IMRT QA PDF report parser based on page, x, y values

Initialization of a GenericReport class

Parameters
  • json_file_path (str) – File path to a JSON file describing the PDF report. It should contain these keys (type): report_type (str), identifiers (list of str), and data (list). The format of each data element should be {‘column’: [str], ‘page’: [int], ‘pos’: [float, float]}. Optionally, you can also supply ‘tol’, which is either an integer or a list of integers (i.e., [x_tol, y_tol]). Also, specifying ‘numeric’ with a boolean value will ensure the value is or is not numeric (and return an empty string if not met). The JSON object can also have “alternates” which contains an array of data like items that will be checked until a value for a column is found. “ignored” is another option, if a value is returned that is in this array, an empty string will be returned instead. The value of “column” is automatically added to the “ignored” array.

  • text_cleaner (callable, optional) – A function called on each text element (e.g., remove leading ‘:’)

property summary_data

A summary of data from the QA report

Returns

Keys will match “column” elements from the JSON file. Values are of type str

Return type

dict

class IQDMPDF.parsers.generic.ParserBase[source]

Bases: object

Base class for all Report Parser classes, not to be used alone

Initialize columns and identifiers

property csv_data

Get a CSV data of summary_data for all columns for csv.writer

Returns

summary data as a list in order of columns. File path automatically appended to data

Return type

list

is_text_data_valid(text)[source]

Check that all identifiers are in text

Parameters

text (str) – Output from pdf_reader.convert_pdf_to_txt

Returns

True if and only if all identifiers are found in text

Return type

bool

ScandiDos Delta4 Report Parser

Delta4 QA report parser

class IQDMPDF.parsers.delta4.Delta4Report[source]

Bases: IQDMPDF.parsers.generic.ParserBase

Custom Delta4 report parser

Initialize SNCPatientCustom class

property accepted_date

Get the QA accepted date

Returns

QA Accepted date from DICOM

Return type

str

property beam_count

Get the number of delivered beams in the report

Returns

The number of beams

Return type

int

property composite_tx_summary_data

Get the composite analysis data

Returns

‘norm_dose’, ‘dev’, ‘dta’, ‘gamma_index’, and ‘dose_dev’

Return type

dict

property daily_corr

Get the daily correction factor

Returns

The daily correction factor

Return type

str

property energy

Beam energy

Returns

Energy of the first reported beam

Return type

str

property gamma_distance

Get the gamma distance criteria

Returns

Gamma analysis distance criteria

Return type

str

property gamma_dose

Get the Gamma Analysis dose criteria

Returns

Gamma dose criteria

Return type

str

property gamma_pass_criteria

Get the gamma analysis pass-rate criteria

Returns

Gamma pass-rate criteria

Return type

str

property measured_date

Get the measured name

Returns

Date of QA measurement

Return type

str

property patient_id

Get the patient ID

Returns

Patient ID

Return type

str

property patient_name

Get the patient name

Returns

Patient name

Return type

str

property plan_date

Get the plan date

Returns

Plan date from DICOM

Return type

str

property plan_name

Get the plan name

Returns

Plan name from DICOM

Return type

str

property radiation_dev

Get the radiation device

Returns

Radiation device per DICOM-RT Plan

Return type

str

property summary_data

A summary of data from the QA report

Returns

Keys will match “column” elements Values are of type str

Return type

dict

property threshold

Get the minimum dose (%) included in analysis

Returns

Minimum dose threshold

Return type

str

SNC Patient Report Parser

SNC Patient report parser

class IQDMPDF.parsers.sncpatient.SNCPatientCustom[source]

Bases: IQDMPDF.parsers.generic.ParserBase

Custom SNCPatient report parser

Initialize SNCPatientCustom class

property angle

Angle in QA File Parameter table

Returns

Angle

Return type

str

property depth

Depth in QA File Parameter table

Returns

Depth

Return type

str

property dist_param

Distance criteria

Returns

Distance criteria for analysis

Return type

str

property dose_comparison_type

Dose comparison type based on table title

Returns

Dose comparison type (e.g., Absolute)

Return type

str

property dose_diff_param

Dose difference criteria

Returns

Dose difference criteria for analysis

Return type

str

property dose_diff_threshold

Dose Diff Threshold

Returns

Dose Difference Threshold for analysis

Return type

str

property energy

Energy in QA File Parameter table

Returns

Energy

Return type

str

property failed_points

Number of points failing analysis

Returns

Number of points/detectors not meeting analysis criteria

Return type

str

property meas_uncertainty

Measurement Uncertainty

Returns

Whether or not measurement uncertainty is turned on

Return type

str

property notes

Custom note entered by report author

Returns

Text from the Notes block

Return type

str

property pass_rate

Passing rate of points

Returns

Percentage of points/detectors meeting analysis criteria

Return type

str

property passed_points

Number of points passing analysis

Returns

Number of points/detectors meeting analysis criteria

Return type

str

property patient_id

Patient ID in QA File Parameter table

Returns

Patient ID

Return type

str

property patient_name

Patient name in QA File Parameter table

Returns

Patient name

Return type

str

property plan_date

Plan date in QA File Parameter table

Returns

Plan date

Return type

str

property qa_date

Date in top-left of the report

Returns

QA report date

Return type

str

property rotation_angle

Rotation angle

Returns

Rotation angle applied to data for analysis

Return type

str

property sdd

SDD in QA File Parameter table

Returns

SDD

Return type

str

property ssd

SSD in QA File Parameter table

Returns

SSD

Return type

str

property summary_data

A summary of data from the QA report

Returns

Keys will match “column” elements Values are of type str

Return type

dict

property summary_type

Title of the dose comparison table

Returns

Dose comparison type (e.g., Absolute)

Return type

str

property threshold_param

Dose threshold criteria

Returns

Minimum dose threshold for analysis

Return type

str

property total_points

Total Points

Returns

Total number of points/detectors used for analysis

Return type

str

property use_global

Use Global %

Returns

Whether or not Use Global % is turned on

Return type

str

property use_van_dyk

Use VanDyk

Returns

Whether or not Van Dyk criteria is turned on

Return type

str

class IQDMPDF.parsers.sncpatient.SNCPatientReport2020[source]

Bases: IQDMPDF.parsers.generic.GenericReport

SNCPatientReport parser for the new format released in 2020

Initialization of a SNCPatientReport class

PTW VeriSoft Report Parser

PTW VeriSoft report parser

class IQDMPDF.parsers.verisoft.VeriSoftReport[source]

Bases: IQDMPDF.parsers.generic.ParserBase

PTW VeriSoft IMRT QA report parser

Initialize VeriSoftReport class

property abs_diff

Get all of the Absolute Difference values

Returns

‘mean’, ‘min’, ‘max’, ‘median’ Absolute Difference values, and ‘mean_units’, etc

Return type

dict

property abs_diff_max_pos

Get the max absolute dose diff position

Returns

‘x’ and ‘y’ positions of the maximum absolute dose diff value

Return type

dict

property abs_diff_min_pos

Get the min absolute dose diff position

Returns

‘x’ and ‘y’ positions of the min absolute dose diff value

Return type

dict

property calibrate_air_density

Get the Calibrate Air Density value

Returns

Calibrate Air Density from Manipulations table

Return type

str

property comment

Get the comment

Returns

Comment from Administrative Data table

Return type

str

property data_set_a

Get Data Set A file path

Returns

Data Set A file path

Return type

str

property data_set_b

Get Data Set B file path(s)

Returns

Strings after _data_set_b_index joined by

Return type

str

property date

Date printed in footer of report

Returns

Report date

Return type

str

property eval_dose_points

Evaluated Dose Points from Statistics table

Returns

Evaluated Dose Points

Return type

str

property eval_dose_points_percent

Evaluated Dose Points (%) from Statistics table

Returns

Evaluated Dose Points (%)

Return type

str

property failed_points

Failed Dose Points from Statistics table

Returns

Failed Dose Points

Return type

str

property failed_points_percent

Failed Dose Points (%) from Statistics table

Returns

Failed Dose Points (%)

Return type

str

property gamma_diff

Get all of the Gamma 2D values

Returns

Mean, min, max, median Gamma values from Gamma 2D

Return type

dict

property gamma_dist

Get the Gamma Distance to Agreement setting

Returns

DTA from Gamma 2D - Parameters

Return type

str

property gamma_dose

Get the Gamma Dose difference value

Returns

Gamma Dose Difference value from Gamma 2D - Parameters

Return type

str

property gamma_dose_info

Get the Gamma Dose difference info

Returns

Gamma Dose Difference normalization from Gamma 2D - Parameters

Return type

str

property gamma_max_pos

Get the max gamma position

Returns

‘x’ and ‘y’ positions of the maximum gamma value

Return type

dict

property gamma_min_pos

Get the min gamma position

Returns

‘x’ and ‘y’ positions of the minimum gamma value

Return type

dict

property institution

Get the institution

Returns

Institution from Administrative Data table

Return type

str

property num_dose_points

Number of Dose Points from Statistics table

Returns

Number of Dose Points

Return type

str

property pass_rate

Result from Statistics table

Returns

Dose point pass rate

Return type

str

property pass_result_color

Result color from Statistics table

Returns

Result color

Return type

str

property passed_points

Passed Dose Points from Statistics table

Returns

Passed Dose Points

Return type

str

property passed_points_percent

Passed Dose Points (%) from Statistics table

Returns

Passed Dose Points (%)

Return type

str

property passing_criteria

Passing Criteria from the Settings table

Returns

Passing criteria

Return type

str

property passing_green

Green threshold from the Settings table

Returns

Minimum pass rate for green status

Return type

str

property passing_red

Red threshold from the Settings table

Returns

Minimum pass rate for red status

Return type

str

property passing_yellow

Yellow threshold from the Settings table

Returns

Minimum pass rate for yellow status

Return type

str

property patient_id

Get the patient ID

Returns

Patient ID from Administrative Data table

Return type

str

property patient_name

Get the patient name

Returns

Patient name from Administrative Data table

Return type

str

property physicist

Get the physicist

Returns

Physicist from Administrative Data table

Return type

str

property set_zero

Get the Set Zero data

Returns

Get the Set Zero data from Manipulations table

Return type

dict

property summary_data

A summary of data from the QA report

Returns

Keys will match “column” elements Values are of type str

Return type

dict

property threshold

Get the Gamma Dose threshold value

Returns

Gamma Dose threshold value from Gamma 2D - Parameters

Return type

str

property threshold_info

Get the Gamma Dose threshold info

Returns

Gamma Dose threshold info from Gamma 2D - Parameters

Return type

str

property version

VeriSoft version printed in footer of report

Returns

Software version

Return type

str

Utilities

Common functions for IQDM-PDF

IQDMPDF.utilities.append_files(files, dir_name, files_to_append, extension=None)[source]

Helper function for get_files

Parameters
  • files (list) – Accumulate file paths into this list

  • dir_name (str) – The base path of the files in file_list

  • files_to_append (list) – A list of file paths to loop accumulate

  • extension (str, optional) – Collect file paths with only this extension (e.g., ‘.pdf’)

IQDMPDF.utilities.are_all_strings_in_text(text, list_of_strings)[source]

Check that all strings in list_of_strings exist in text

Parameters
  • text (str) – output from IQDMPDF.pdf_reader.convert_pdf_to_text

  • list_of_strings (list of str) – a list of strings used to identify document type

Returns

Returns true if every string in list_of_strings is found in text data

Return type

bool

IQDMPDF.utilities.bbox_to_pos(bbox, mode)[source]

Convert a bounding box to an x-y position

Parameters
  • bbox (list) – Bounding box from pdf_reader layout object, which is a list of four floats [x0, y0, x1, y1]

  • mode (str) – Options are combinations of top/center/bottom and right/center/left, e.g., ‘top-right’, ‘center-left’. ‘center’ is assumed to be ‘center-center’

IQDMPDF.utilities.create_arg_parser()[source]

Create an argument parser

Returns

Argument parsers for command-line use of IQDM-PDF

Return type

argparse.ArgumentParser

IQDMPDF.utilities.creation_date(path_to_file)[source]

Try to get the date that a file was created, falling back to when it was last modified if that isn’t possible. See http://stackoverflow.com/a/39501288/1709587 for explanation.

Parameters

path_to_file (str) – Path to any file

Returns

Time stamp of file

Return type

float

IQDMPDF.utilities.get_files(init_dir, search_sub_dir=True, extension=None)[source]

Collect paths of all files in a director

Parameters
  • init_dir (str) – Initial directory to begin scanning

  • search_sub_dir (bool) – Recursively search through sub-directories if True

  • extension (str, optional) – Collect file paths with only this extension (e.g., ‘.pdf’)

Returns

List of file paths

Return type

list

IQDMPDF.utilities.get_relative_path(path, relative_base)[source]

Return a partial path with the specified base

Parameters
  • path (str) – A path with relative_base as a sub-component

  • relative_base (str) – A directory within path

Returns

The path with all components prior to relative_base removed

Return type

str

IQDMPDF.utilities.get_sorted_indices(some_list, reverse=False)[source]

Get sorted indices of some_list

Parameters
  • some_list (list) – Any list compatible with sorted()

  • reverse (bool) – Reverse sort if True

IQDMPDF.utilities.is_in_tol(value, expected_value, tolerance)[source]

Is the provided value within expected_value +/- tolerance

Parameters
  • value (int, float) – Value of interest

  • expected_value (int, float) – Expected value

  • tolerance (int, float) – Allowed deviation from expected_value

Returns

True if value is within within expected_value +/- tolerance, exclusive

Return type

bool

IQDMPDF.utilities.is_numeric(val)[source]

Check if value is numeric (float or int)

Parameters

val (any) – Any value

Returns

Returns true if float(val) doesn’t raise a ValueError

Return type

bool

IQDMPDF.utilities.run_multiprocessing(worker, queue, processes, callback=None)[source]

Parallel processing

Parameters
  • worker (callable) – single parameter function to be called on each item in queue

  • queue (iterable) – A list of arguments for worker

  • processes (int) – Number of processes for multiprocessing.Pool

  • callback (callable) – Optional call back function on progress update, accepts str rep of tqdm object. Final call sent with ‘complete’

Returns

List of returns from worker

Return type

list

Unit Testing

IQDM-PDF employs unit testing to ensure that updates don’t break previous examples. It also ensures that the identifiers assigned to a report parser are sufficiently unique.

New Example PDFs

Any modifications to report parsers require an example PDF to be included in tests/test_data/examples_reports. The expected results should be added to tests/test_data/expected_report_data.py.

Expected Report Data

The variable TEST_DATA in expected_report_data.py contains exepected data and paths to PDFs for all vendors. An example output from TEST_DATA[vendor][example_description]:

{
  "path": join(DIRECTORIES["DELTA4_EXAMPLES"], "UChicago", "DCAM_example_1.pdf"),
  "data": summary_data
}

Where summary_data is the output from the report parser’s property summary_data. It’s important to use IQDMPDF.paths.DIRECTORIES to ensure source code and installed versions know where the test data is.

If adding a new vendor or report template, a new unit testing class can be added to tests/test_report_parsers.py in a fashion similar to below:

class TestNewVendor(TestReportParserBase, unittest.TestCase):
    def setUp(self):
        self.do_setup_for_vendor("new_vendor")

Then just update PARSERS near the top of test_report_parsers.py with a “new_vendor” key pointing to the new report parser.

Credits

Development Lead

  • Dan Cutright

Contributors

  • Marc Chamberland

  • Aditya Panchal

Test Data

Example IMRT QA reports used for unit testing and design are available here.

  • Dan Cutright, University of Chicago Hospital
    • delta4/UChicago

    • sncpatient/UChicago

  • Marc Chamberland, University of Vermont Health Network
    • sncpatient/UVermontHealthNetwork

  • Serpil Kucuker Dogan, Nortwestern Memorial Hospital
    • sncpatient/Northwestern_Memorial

    • sncpatient2020/Northwestern_Memorial

  • Aditya Panchal, AMITA Health
    • verisoft/AMITA_Health

  • Michael Snyder, Beaumont Health
    • sncpatient/Beaumont

Change Log for IQDM-PDF

v0.3.0 (2021.03.14)

  • Brand new Delta4 parser using only relative positions

v0.2.9 (2021.03.11)

  • Better date parsing for Delta4

  • Address “Set1” issue for long patient names with SNCPatientCustom

  • Add report_file_creation column

v0.2.8 (2021.03.07)

  • IQDM Analytics support from GUI

v0.2.7 (2021.03.04)

  • Updates to SNCPatient2020 parser

  • Ignore parsed values that are equal to column names

  • Added analysis_columns property for IQDM Analytics support

v0.2.6 (2021.02.11)

  • New Custom SNCPatient parser using relative positions

v0.2.5 (2021.01.27)

  • PTW VeriSoft: Collect ‘Set Zero’ data

  • Use csv standard library for CSV writing

v0.2.4 (2021.01.24)

  • Support for PTW VeriSoft

v0.2.3 (2021.01.21)

  • Added optional alternates in JSON templates

  • Added optional numeric flag to make sure value is or is not numerical

  • Added optional ignored flag to ignore any returned value in this array

v0.2.2 (2021.01.16)

  • Multi-threading support

Indices and tables