PDF Reader Module

Read PDF files into python objects

class IQDMPDF.pdf_reader.CustomPDFReader(file_path, laparams_kwargs=None)[source]

Bases: object

Custom PDF Parsing module

Initialize a CustomPDFReader object

Parameters

file_path (str) – Absolute file path to the PDF to be read

convert_pdf_to_text()[source]

Extract text and coordinates from a PDF

get_bbox_of_data(text, return_all=False, include_text=False)[source]

Get the bounding box for a given string

Parameters
  • text (str) – Check all parsed data for this string. Return the first bounding box that contains this text. Meant to search for a unique str

  • return_all (bool) – If true, then return a list containing all matches, in the order pdfminer.six found them

  • include_text (bool) – If true, also return the text data

Returns

“page”->int and “bbox”->[x0, y0, x1, y1]. If include_data is true, “text”->str will contain the text data. If return_all is true, return a list of these dict objects.

Return type

dict, list

get_block_data(page, pos, tol=10, text_cleaner=None, numeric=None, ignored=None, mode='bottom-left')[source]

Use PDFPageParser.get_block_data for the provided page

Parameters
  • page (int) – The index of the PDF page

  • pos (tuple of int, float) – The (x,y) coordinates of the text block to be retrieved

  • tol (int, float, tuple) – Maximum distance a block’s x or y-coordinate may be from pos. If a tuple is provided, first value is the x_tolerance, 2nd is y_tolerance

  • text_cleaner (callable, optional) – A function called on each text element (e.g., remove leading ‘:’)

  • numeric (bool, optional) – If true, only return value if it is numeric. If false, only return value if it is not numeric. Leave as None to ignore this feature.

  • ignored (list, optional) – Optionally provide a list of strings that should be ignored. If the value of the block data is in this list, the value will become an empty string instead

  • mode (str, optional) – Options are combinations of top/center/bottom and right/center/left, e.g., ‘top-right’, ‘center-right’. ‘center’ is assumed to be ‘center-center’. Default is ‘bottom-left’.

Returns

All text data that meet the input constraints

Return type

list of str

class IQDMPDF.pdf_reader.PDFPageParser(lt_objs, page_data, page_index=0)[source]

Bases: object

Custom PDF Page Parsing module

Initialization of PDFPageParser

Parameters
  • lt_objs (list) – A layout object from PDFPageAggregator.get_result()._objs

  • page_data (dict) – A dictionary of lists, with keys ‘x’, ‘y’, ‘text’

  • page_index (int, optional) – The index of the page

get_block_data(pos, tol, text_cleaner=None, numeric=None, ignored=None, mode='bottom-left')[source]

Get the text block data by x,y coordinates

Parameters
  • pos (list of int, float) – The (x,y) coordinates of the text block to be retrieved

  • tol (int, float, tuple) – Maximum distance a block’s x or y-coordinate may be from pos. If a tuple is provided, first value is the x_tolerance, 2nd is y_tolerance

  • text_cleaner (callable, optional) – A function called on each text element (e.g., remove leading ‘:’)

  • numeric (bool, optional) – If true, only return value if it is numeric. If false, only return value if it is not numeric. Leave as None to ignore this feature.

  • ignored (list, optional) – Optionally provide a list of strings that should be ignored. If the value of the block data is in this list, the value will become an empty string instead

  • mode (str, optional) – Options are combinations of top/center/bottom and right/center/left, e.g., ‘top-right’, ‘center-right’. ‘center’ is assumed to be ‘center-center’. Default is ‘bottom-left’.

Returns

All text data that meet the input constraints

Return type

list of str

parse_obj(lt_objs)[source]

Extract x, y, and text data from a layout objects

Parameters

lt_objs (list) – A layout object from PDFPageAggregator.get_result()._objs

sort_all_data(sort_key, reverse=False)[source]

Sort all parsed data by sort_key

Parameters
  • sort_key (str) – Either ‘x’ or ‘y’

  • reverse (bool) – Passes into standard library sorted() function

sort_all_data_by_y()[source]

Sort parsed data by y coordinate

sub_sort_all_data_by_x()[source]

Sort each row of data by x-coordinate, keeping y order

IQDMPDF.pdf_reader.convert_pdf_to_txt(path)[source]

Extract text from a PDF

Parameters

path (str) – Absolute file path to the PDF to be read

Returns

The text content of the PDF

Return type

str

File Processor

Process IMRT QA file(s) into CSV file(s)

IQDMPDF.file_processor.print_callback(msg)[source]

Simple print callback for process_files

Parameters

msg (dict) – The message sent from process_files

IQDMPDF.file_processor.process_file(file_path, output_file, output_dir=None)[source]

Process a pdf file into a parser class, write data to csv

Parameters
  • file_path (str) – PDF file to processed

  • output_file (str) – Report type in file name will be prepended to this value

  • output_dir (str, optional) – Save results to this directory, default is local directory

IQDMPDF.file_processor.process_file_worker(file_path)[source]

Mutliprocessing worker function

Parameters

file_path (str) – PDF file to be passed to ReportParser

Returns

{“data”: ReportParser.csv_data, “report_type”: ReportParser.report_type, “columns”: ReportParser.columns}

Return type

dict

IQDMPDF.file_processor.process_files(init_directory, ignore_extension=False, output_file=None, output_dir=None, no_recursive_search=False, callback=None, raise_errors=False, processes=1)[source]

Process all pdf files into parser classes, write data to csv

Parameters
  • init_directory (str) – initial scanning directory

  • ignore_extension (bool, optional) – Set to True to catch pdf files that are missing .pdf extension

  • output_file (str, optional) – Report type in file name will be prepended to this value

  • output_dir (str, optional) – Save results to this directory, default is local directory

  • no_recursive_search (bool, optional) – Ignore sub-directories it True

  • callback (callable) – Pointer to a function to be called before each process_file call. The parameter will be dict with keys of “label” and “gauge”.

  • raise_errors (bool) – Set to True to allow errors to be raised (useful for debugging)

  • processes (int) – Number of parallel processes allowed

IQDMPDF.file_processor.validate_kwargs(kwargs, add_print_callback=True)[source]

Process kwargs from main for process_files

Parameters
  • kwargs (dict) – Keyword arguments for main. See main.create_arg_parser for valid arguments

  • add_print_callback (bool) – If true, add simple print function at the start of each process_file call

Returns

Returns a dict containing only keywords applicable to process_files, or an empty dict if “init_directory” is missing or “print_version” is True and “init_directory” is missing

Return type

dict

IQDMPDF.file_processor.write_csv(file_path, rows, mode='w', newline='')[source]

Create csv.writer, call writerows(rows)

file_pathstr

path to file

rowslist, iterable

Items to be written to file_pointer (input for csv.writer.writerows)

modestr

optional string that specifies the mode in which the file is opened

newlinestr

controls how universal newlines mode works. It can be None, ‘’, ‘

‘, ‘ ‘, and ‘ ‘

Unified Report Parser

Unified IMRT QA report parser

class IQDMPDF.parsers.parser.ReportParser(file_path)[source]

Bases: object

Determines which Report class to use, then processes the data.

Initialization class for ReportParser

Parameters

file_path (str) – File path pointing to an IMRT QA report

property columns

Get columns headers for csv

Returns

Report columns + “report_file_creation” + “report_file_path”

Return type

list

property csv_data

Get a csv string from the selected ReportParser

Returns

Report columns + “report_file_creation” + “report_file_path”

Return type

str

get_report()[source]

Determine the report_class, then return class with data processed

Returns

Searches for a Report Class with matching identifiers, processes the file and returns the Report Class

Return type

ParserBase inherited class

property report_type

Get report type of the selected ReportParser

Returns

Get ReportParser.report_type

Return type

str

Generic Report Parser

Generic IMRT QA report parser

class IQDMPDF.parsers.generic.GenericReport(json_file_path, text_cleaner=None)[source]

Bases: IQDMPDF.parsers.generic.ParserBase

Generic IMRT QA PDF report parser based on page, x, y values

Initialization of a GenericReport class

Parameters
  • json_file_path (str) – File path to a JSON file describing the PDF report. It should contain these keys (type): report_type (str), identifiers (list of str), and data (list). The format of each data element should be {‘column’: [str], ‘page’: [int], ‘pos’: [float, float]}. Optionally, you can also supply ‘tol’, which is either an integer or a list of integers (i.e., [x_tol, y_tol]). Also, specifying ‘numeric’ with a boolean value will ensure the value is or is not numeric (and return an empty string if not met). The JSON object can also have “alternates” which contains an array of data like items that will be checked until a value for a column is found. “ignored” is another option, if a value is returned that is in this array, an empty string will be returned instead. The value of “column” is automatically added to the “ignored” array.

  • text_cleaner (callable, optional) – A function called on each text element (e.g., remove leading ‘:’)

property summary_data

A summary of data from the QA report

Returns

Keys will match “column” elements from the JSON file. Values are of type str

Return type

dict

class IQDMPDF.parsers.generic.ParserBase[source]

Bases: object

Base class for all Report Parser classes, not to be used alone

Initialize columns and identifiers

property csv_data

Get a CSV data of summary_data for all columns for csv.writer

Returns

summary data as a list in order of columns. File path automatically appended to data

Return type

list

is_text_data_valid(text)[source]

Check that all identifiers are in text

Parameters

text (str) – Output from pdf_reader.convert_pdf_to_txt

Returns

True if and only if all identifiers are found in text

Return type

bool

ScandiDos Delta4 Report Parser

Delta4 QA report parser

class IQDMPDF.parsers.delta4.Delta4Report[source]

Bases: IQDMPDF.parsers.generic.ParserBase

Custom Delta4 report parser

Initialize SNCPatientCustom class

property accepted_date

Get the QA accepted date

Returns

QA Accepted date from DICOM

Return type

str

property beam_count

Get the number of delivered beams in the report

Returns

The number of beams

Return type

int

property composite_tx_summary_data

Get the composite analysis data

Returns

‘norm_dose’, ‘dev’, ‘dta’, ‘gamma_index’, and ‘dose_dev’

Return type

dict

property daily_corr

Get the daily correction factor

Returns

The daily correction factor

Return type

str

property energy

Beam energy

Returns

Energy of the first reported beam

Return type

str

property gamma_distance

Get the gamma distance criteria

Returns

Gamma analysis distance criteria

Return type

str

property gamma_dose

Get the Gamma Analysis dose criteria

Returns

Gamma dose criteria

Return type

str

property gamma_pass_criteria

Get the gamma analysis pass-rate criteria

Returns

Gamma pass-rate criteria

Return type

str

property measured_date

Get the measured name

Returns

Date of QA measurement

Return type

str

property patient_id

Get the patient ID

Returns

Patient ID

Return type

str

property patient_name

Get the patient name

Returns

Patient name

Return type

str

property plan_date

Get the plan date

Returns

Plan date from DICOM

Return type

str

property plan_name

Get the plan name

Returns

Plan name from DICOM

Return type

str

property radiation_dev

Get the radiation device

Returns

Radiation device per DICOM-RT Plan

Return type

str

property summary_data

A summary of data from the QA report

Returns

Keys will match “column” elements Values are of type str

Return type

dict

property threshold

Get the minimum dose (%) included in analysis

Returns

Minimum dose threshold

Return type

str

SNC Patient Report Parser

SNC Patient report parser

class IQDMPDF.parsers.sncpatient.SNCPatientCustom[source]

Bases: IQDMPDF.parsers.generic.ParserBase

Custom SNCPatient report parser

Initialize SNCPatientCustom class

property angle

Angle in QA File Parameter table

Returns

Angle

Return type

str

property depth

Depth in QA File Parameter table

Returns

Depth

Return type

str

property dist_param

Distance criteria

Returns

Distance criteria for analysis

Return type

str

property dose_comparison_type

Dose comparison type based on table title

Returns

Dose comparison type (e.g., Absolute)

Return type

str

property dose_diff_param

Dose difference criteria

Returns

Dose difference criteria for analysis

Return type

str

property dose_diff_threshold

Dose Diff Threshold

Returns

Dose Difference Threshold for analysis

Return type

str

property energy

Energy in QA File Parameter table

Returns

Energy

Return type

str

property failed_points

Number of points failing analysis

Returns

Number of points/detectors not meeting analysis criteria

Return type

str

property meas_uncertainty

Measurement Uncertainty

Returns

Whether or not measurement uncertainty is turned on

Return type

str

property notes

Custom note entered by report author

Returns

Text from the Notes block

Return type

str

property pass_rate

Passing rate of points

Returns

Percentage of points/detectors meeting analysis criteria

Return type

str

property passed_points

Number of points passing analysis

Returns

Number of points/detectors meeting analysis criteria

Return type

str

property patient_id

Patient ID in QA File Parameter table

Returns

Patient ID

Return type

str

property patient_name

Patient name in QA File Parameter table

Returns

Patient name

Return type

str

property plan_date

Plan date in QA File Parameter table

Returns

Plan date

Return type

str

property qa_date

Date in top-left of the report

Returns

QA report date

Return type

str

property rotation_angle

Rotation angle

Returns

Rotation angle applied to data for analysis

Return type

str

property sdd

SDD in QA File Parameter table

Returns

SDD

Return type

str

property ssd

SSD in QA File Parameter table

Returns

SSD

Return type

str

property summary_data

A summary of data from the QA report

Returns

Keys will match “column” elements Values are of type str

Return type

dict

property summary_type

Title of the dose comparison table

Returns

Dose comparison type (e.g., Absolute)

Return type

str

property threshold_param

Dose threshold criteria

Returns

Minimum dose threshold for analysis

Return type

str

property total_points

Total Points

Returns

Total number of points/detectors used for analysis

Return type

str

property use_global

Use Global %

Returns

Whether or not Use Global % is turned on

Return type

str

property use_van_dyk

Use VanDyk

Returns

Whether or not Van Dyk criteria is turned on

Return type

str

class IQDMPDF.parsers.sncpatient.SNCPatientReport2020[source]

Bases: IQDMPDF.parsers.generic.GenericReport

SNCPatientReport parser for the new format released in 2020

Initialization of a SNCPatientReport class

PTW VeriSoft Report Parser

PTW VeriSoft report parser

class IQDMPDF.parsers.verisoft.VeriSoftReport[source]

Bases: IQDMPDF.parsers.generic.ParserBase

PTW VeriSoft IMRT QA report parser

Initialize VeriSoftReport class

property abs_diff

Get all of the Absolute Difference values

Returns

‘mean’, ‘min’, ‘max’, ‘median’ Absolute Difference values, and ‘mean_units’, etc

Return type

dict

property abs_diff_max_pos

Get the max absolute dose diff position

Returns

‘x’ and ‘y’ positions of the maximum absolute dose diff value

Return type

dict

property abs_diff_min_pos

Get the min absolute dose diff position

Returns

‘x’ and ‘y’ positions of the min absolute dose diff value

Return type

dict

property calibrate_air_density

Get the Calibrate Air Density value

Returns

Calibrate Air Density from Manipulations table

Return type

str

property comment

Get the comment

Returns

Comment from Administrative Data table

Return type

str

property data_set_a

Get Data Set A file path

Returns

Data Set A file path

Return type

str

property data_set_b

Get Data Set B file path(s)

Returns

Strings after _data_set_b_index joined by

Return type

str

property date

Date printed in footer of report

Returns

Report date

Return type

str

property eval_dose_points

Evaluated Dose Points from Statistics table

Returns

Evaluated Dose Points

Return type

str

property eval_dose_points_percent

Evaluated Dose Points (%) from Statistics table

Returns

Evaluated Dose Points (%)

Return type

str

property failed_points

Failed Dose Points from Statistics table

Returns

Failed Dose Points

Return type

str

property failed_points_percent

Failed Dose Points (%) from Statistics table

Returns

Failed Dose Points (%)

Return type

str

property gamma_diff

Get all of the Gamma 2D values

Returns

Mean, min, max, median Gamma values from Gamma 2D

Return type

dict

property gamma_dist

Get the Gamma Distance to Agreement setting

Returns

DTA from Gamma 2D - Parameters

Return type

str

property gamma_dose

Get the Gamma Dose difference value

Returns

Gamma Dose Difference value from Gamma 2D - Parameters

Return type

str

property gamma_dose_info

Get the Gamma Dose difference info

Returns

Gamma Dose Difference normalization from Gamma 2D - Parameters

Return type

str

property gamma_max_pos

Get the max gamma position

Returns

‘x’ and ‘y’ positions of the maximum gamma value

Return type

dict

property gamma_min_pos

Get the min gamma position

Returns

‘x’ and ‘y’ positions of the minimum gamma value

Return type

dict

property institution

Get the institution

Returns

Institution from Administrative Data table

Return type

str

property num_dose_points

Number of Dose Points from Statistics table

Returns

Number of Dose Points

Return type

str

property pass_rate

Result from Statistics table

Returns

Dose point pass rate

Return type

str

property pass_result_color

Result color from Statistics table

Returns

Result color

Return type

str

property passed_points

Passed Dose Points from Statistics table

Returns

Passed Dose Points

Return type

str

property passed_points_percent

Passed Dose Points (%) from Statistics table

Returns

Passed Dose Points (%)

Return type

str

property passing_criteria

Passing Criteria from the Settings table

Returns

Passing criteria

Return type

str

property passing_green

Green threshold from the Settings table

Returns

Minimum pass rate for green status

Return type

str

property passing_red

Red threshold from the Settings table

Returns

Minimum pass rate for red status

Return type

str

property passing_yellow

Yellow threshold from the Settings table

Returns

Minimum pass rate for yellow status

Return type

str

property patient_id

Get the patient ID

Returns

Patient ID from Administrative Data table

Return type

str

property patient_name

Get the patient name

Returns

Patient name from Administrative Data table

Return type

str

property physicist

Get the physicist

Returns

Physicist from Administrative Data table

Return type

str

property set_zero

Get the Set Zero data

Returns

Get the Set Zero data from Manipulations table

Return type

dict

property summary_data

A summary of data from the QA report

Returns

Keys will match “column” elements Values are of type str

Return type

dict

property threshold

Get the Gamma Dose threshold value

Returns

Gamma Dose threshold value from Gamma 2D - Parameters

Return type

str

property threshold_info

Get the Gamma Dose threshold info

Returns

Gamma Dose threshold info from Gamma 2D - Parameters

Return type

str

property version

VeriSoft version printed in footer of report

Returns

Software version

Return type

str

Utilities

Common functions for IQDM-PDF

IQDMPDF.utilities.append_files(files, dir_name, files_to_append, extension=None)[source]

Helper function for get_files

Parameters
  • files (list) – Accumulate file paths into this list

  • dir_name (str) – The base path of the files in file_list

  • files_to_append (list) – A list of file paths to loop accumulate

  • extension (str, optional) – Collect file paths with only this extension (e.g., ‘.pdf’)

IQDMPDF.utilities.are_all_strings_in_text(text, list_of_strings)[source]

Check that all strings in list_of_strings exist in text

Parameters
  • text (str) – output from IQDMPDF.pdf_reader.convert_pdf_to_text

  • list_of_strings (list of str) – a list of strings used to identify document type

Returns

Returns true if every string in list_of_strings is found in text data

Return type

bool

IQDMPDF.utilities.bbox_to_pos(bbox, mode)[source]

Convert a bounding box to an x-y position

Parameters
  • bbox (list) – Bounding box from pdf_reader layout object, which is a list of four floats [x0, y0, x1, y1]

  • mode (str) – Options are combinations of top/center/bottom and right/center/left, e.g., ‘top-right’, ‘center-left’. ‘center’ is assumed to be ‘center-center’

IQDMPDF.utilities.create_arg_parser()[source]

Create an argument parser

Returns

Argument parsers for command-line use of IQDM-PDF

Return type

argparse.ArgumentParser

IQDMPDF.utilities.creation_date(path_to_file)[source]

Try to get the date that a file was created, falling back to when it was last modified if that isn’t possible. See http://stackoverflow.com/a/39501288/1709587 for explanation.

Parameters

path_to_file (str) – Path to any file

Returns

Time stamp of file

Return type

float

IQDMPDF.utilities.get_files(init_dir, search_sub_dir=True, extension=None)[source]

Collect paths of all files in a director

Parameters
  • init_dir (str) – Initial directory to begin scanning

  • search_sub_dir (bool) – Recursively search through sub-directories if True

  • extension (str, optional) – Collect file paths with only this extension (e.g., ‘.pdf’)

Returns

List of file paths

Return type

list

IQDMPDF.utilities.get_relative_path(path, relative_base)[source]

Return a partial path with the specified base

Parameters
  • path (str) – A path with relative_base as a sub-component

  • relative_base (str) – A directory within path

Returns

The path with all components prior to relative_base removed

Return type

str

IQDMPDF.utilities.get_sorted_indices(some_list, reverse=False)[source]

Get sorted indices of some_list

Parameters
  • some_list (list) – Any list compatible with sorted()

  • reverse (bool) – Reverse sort if True

IQDMPDF.utilities.is_in_tol(value, expected_value, tolerance)[source]

Is the provided value within expected_value +/- tolerance

Parameters
  • value (int, float) – Value of interest

  • expected_value (int, float) – Expected value

  • tolerance (int, float) – Allowed deviation from expected_value

Returns

True if value is within within expected_value +/- tolerance, exclusive

Return type

bool

IQDMPDF.utilities.is_numeric(val)[source]

Check if value is numeric (float or int)

Parameters

val (any) – Any value

Returns

Returns true if float(val) doesn’t raise a ValueError

Return type

bool

IQDMPDF.utilities.run_multiprocessing(worker, queue, processes, callback=None)[source]

Parallel processing

Parameters
  • worker (callable) – single parameter function to be called on each item in queue

  • queue (iterable) – A list of arguments for worker

  • processes (int) – Number of processes for multiprocessing.Pool

  • callback (callable) – Optional call back function on progress update, accepts str rep of tqdm object. Final call sent with ‘complete’

Returns

List of returns from worker

Return type

list