PDF Reader Module¶
Read PDF files into python objects
-
class
IQDMPDF.pdf_reader.
CustomPDFReader
(file_path, laparams_kwargs=None)[source]¶ Bases:
object
Custom PDF Parsing module
Initialize a CustomPDFReader object
- Parameters
file_path (str) – Absolute file path to the PDF to be read
-
get_bbox_of_data
(text, return_all=False, include_text=False)[source]¶ Get the bounding box for a given string
- Parameters
text (str) – Check all parsed data for this string. Return the first bounding box that contains this text. Meant to search for a unique str
return_all (bool) – If true, then return a list containing all matches, in the order pdfminer.six found them
include_text (bool) – If true, also return the text data
- Returns
“page”->int and “bbox”->[x0, y0, x1, y1]. If include_data is true, “text”->str will contain the text data. If return_all is true, return a list of these dict objects.
- Return type
dict, list
-
get_block_data
(page, pos, tol=10, text_cleaner=None, numeric=None, ignored=None, mode='bottom-left')[source]¶ Use PDFPageParser.get_block_data for the provided page
- Parameters
page (int) – The index of the PDF page
pos (tuple of int, float) – The (x,y) coordinates of the text block to be retrieved
tol (int, float, tuple) – Maximum distance a block’s x or y-coordinate may be from pos. If a tuple is provided, first value is the x_tolerance, 2nd is y_tolerance
text_cleaner (callable, optional) – A function called on each text element (e.g., remove leading ‘:’)
numeric (bool, optional) – If true, only return value if it is numeric. If false, only return value if it is not numeric. Leave as None to ignore this feature.
ignored (list, optional) – Optionally provide a list of strings that should be ignored. If the value of the block data is in this list, the value will become an empty string instead
mode (str, optional) – Options are combinations of top/center/bottom and right/center/left, e.g., ‘top-right’, ‘center-right’. ‘center’ is assumed to be ‘center-center’. Default is ‘bottom-left’.
- Returns
All text data that meet the input constraints
- Return type
list of str
-
class
IQDMPDF.pdf_reader.
PDFPageParser
(lt_objs, page_data, page_index=0)[source]¶ Bases:
object
Custom PDF Page Parsing module
Initialization of PDFPageParser
- Parameters
lt_objs (list) – A layout object from PDFPageAggregator.get_result()._objs
page_data (dict) – A dictionary of lists, with keys ‘x’, ‘y’, ‘text’
page_index (int, optional) – The index of the page
-
get_block_data
(pos, tol, text_cleaner=None, numeric=None, ignored=None, mode='bottom-left')[source]¶ Get the text block data by x,y coordinates
- Parameters
pos (list of int, float) – The (x,y) coordinates of the text block to be retrieved
tol (int, float, tuple) – Maximum distance a block’s x or y-coordinate may be from pos. If a tuple is provided, first value is the x_tolerance, 2nd is y_tolerance
text_cleaner (callable, optional) – A function called on each text element (e.g., remove leading ‘:’)
numeric (bool, optional) – If true, only return value if it is numeric. If false, only return value if it is not numeric. Leave as None to ignore this feature.
ignored (list, optional) – Optionally provide a list of strings that should be ignored. If the value of the block data is in this list, the value will become an empty string instead
mode (str, optional) – Options are combinations of top/center/bottom and right/center/left, e.g., ‘top-right’, ‘center-right’. ‘center’ is assumed to be ‘center-center’. Default is ‘bottom-left’.
- Returns
All text data that meet the input constraints
- Return type
list of str
-
parse_obj
(lt_objs)[source]¶ Extract x, y, and text data from a layout objects
- Parameters
lt_objs (list) – A layout object from PDFPageAggregator.get_result()._objs
File Processor¶
Process IMRT QA file(s) into CSV file(s)
-
IQDMPDF.file_processor.
print_callback
(msg)[source]¶ Simple print callback for process_files
- Parameters
msg (dict) – The message sent from process_files
-
IQDMPDF.file_processor.
process_file
(file_path, output_file, output_dir=None)[source]¶ Process a pdf file into a parser class, write data to csv
- Parameters
file_path (str) – PDF file to processed
output_file (str) – Report type in file name will be prepended to this value
output_dir (str, optional) – Save results to this directory, default is local directory
-
IQDMPDF.file_processor.
process_file_worker
(file_path)[source]¶ Mutliprocessing worker function
- Parameters
file_path (str) – PDF file to be passed to ReportParser
- Returns
{“data”: ReportParser.csv_data, “report_type”: ReportParser.report_type, “columns”: ReportParser.columns}
- Return type
dict
-
IQDMPDF.file_processor.
process_files
(init_directory, ignore_extension=False, output_file=None, output_dir=None, no_recursive_search=False, callback=None, raise_errors=False, processes=1)[source]¶ Process all pdf files into parser classes, write data to csv
- Parameters
init_directory (str) – initial scanning directory
ignore_extension (bool, optional) – Set to True to catch pdf files that are missing .pdf extension
output_file (str, optional) – Report type in file name will be prepended to this value
output_dir (str, optional) – Save results to this directory, default is local directory
no_recursive_search (bool, optional) – Ignore sub-directories it True
callback (callable) – Pointer to a function to be called before each process_file call. The parameter will be dict with keys of “label” and “gauge”.
raise_errors (bool) – Set to True to allow errors to be raised (useful for debugging)
processes (int) – Number of parallel processes allowed
-
IQDMPDF.file_processor.
validate_kwargs
(kwargs, add_print_callback=True)[source]¶ Process kwargs from main for process_files
- Parameters
kwargs (dict) – Keyword arguments for main. See main.create_arg_parser for valid arguments
add_print_callback (bool) – If true, add simple print function at the start of each process_file call
- Returns
Returns a dict containing only keywords applicable to process_files, or an empty dict if “init_directory” is missing or “print_version” is True and “init_directory” is missing
- Return type
dict
-
IQDMPDF.file_processor.
write_csv
(file_path, rows, mode='w', newline='')[source]¶ Create csv.writer, call writerows(rows)
- file_pathstr
path to file
- rowslist, iterable
Items to be written to file_pointer (input for csv.writer.writerows)
- modestr
optional string that specifies the mode in which the file is opened
- newlinestr
controls how universal newlines mode works. It can be None, ‘’, ‘
‘, ‘ ‘, and ‘ ‘
Unified Report Parser¶
Unified IMRT QA report parser
-
class
IQDMPDF.parsers.parser.
ReportParser
(file_path)[source]¶ Bases:
object
Determines which Report class to use, then processes the data.
Initialization class for ReportParser
- Parameters
file_path (str) – File path pointing to an IMRT QA report
-
property
columns
¶ Get columns headers for csv
- Returns
Report columns + “report_file_creation” + “report_file_path”
- Return type
list
-
property
csv_data
¶ Get a csv string from the selected ReportParser
- Returns
Report columns + “report_file_creation” + “report_file_path”
- Return type
str
-
get_report
()[source]¶ Determine the report_class, then return class with data processed
- Returns
Searches for a Report Class with matching identifiers, processes the file and returns the Report Class
- Return type
ParserBase inherited class
-
property
report_type
¶ Get report type of the selected ReportParser
- Returns
Get ReportParser.report_type
- Return type
str
Generic Report Parser¶
Generic IMRT QA report parser
-
class
IQDMPDF.parsers.generic.
GenericReport
(json_file_path, text_cleaner=None)[source]¶ Bases:
IQDMPDF.parsers.generic.ParserBase
Generic IMRT QA PDF report parser based on page, x, y values
Initialization of a GenericReport class
- Parameters
json_file_path (str) – File path to a JSON file describing the PDF report. It should contain these keys (type): report_type (str), identifiers (list of str), and data (list). The format of each data element should be {‘column’: [str], ‘page’: [int], ‘pos’: [float, float]}. Optionally, you can also supply ‘tol’, which is either an integer or a list of integers (i.e., [x_tol, y_tol]). Also, specifying ‘numeric’ with a boolean value will ensure the value is or is not numeric (and return an empty string if not met). The JSON object can also have “alternates” which contains an array of data like items that will be checked until a value for a column is found. “ignored” is another option, if a value is returned that is in this array, an empty string will be returned instead. The value of “column” is automatically added to the “ignored” array.
text_cleaner (callable, optional) – A function called on each text element (e.g., remove leading ‘:’)
-
property
summary_data
¶ A summary of data from the QA report
- Returns
Keys will match “column” elements from the JSON file. Values are of type str
- Return type
dict
-
class
IQDMPDF.parsers.generic.
ParserBase
[source]¶ Bases:
object
Base class for all Report Parser classes, not to be used alone
Initialize columns and identifiers
-
property
csv_data
¶ Get a CSV data of summary_data for all columns for csv.writer
- Returns
summary data as a list in order of columns. File path automatically appended to data
- Return type
list
-
property
ScandiDos Delta4 Report Parser¶
Delta4 QA report parser
-
class
IQDMPDF.parsers.delta4.
Delta4Report
[source]¶ Bases:
IQDMPDF.parsers.generic.ParserBase
Custom Delta4 report parser
Initialize SNCPatientCustom class
-
property
accepted_date
¶ Get the QA accepted date
- Returns
QA Accepted date from DICOM
- Return type
str
-
property
beam_count
¶ Get the number of delivered beams in the report
- Returns
The number of beams
- Return type
int
-
property
composite_tx_summary_data
¶ Get the composite analysis data
- Returns
‘norm_dose’, ‘dev’, ‘dta’, ‘gamma_index’, and ‘dose_dev’
- Return type
dict
-
property
daily_corr
¶ Get the daily correction factor
- Returns
The daily correction factor
- Return type
str
-
property
energy
¶ Beam energy
- Returns
Energy of the first reported beam
- Return type
str
-
property
gamma_distance
¶ Get the gamma distance criteria
- Returns
Gamma analysis distance criteria
- Return type
str
-
property
gamma_dose
¶ Get the Gamma Analysis dose criteria
- Returns
Gamma dose criteria
- Return type
str
-
property
gamma_pass_criteria
¶ Get the gamma analysis pass-rate criteria
- Returns
Gamma pass-rate criteria
- Return type
str
-
property
measured_date
¶ Get the measured name
- Returns
Date of QA measurement
- Return type
str
-
property
patient_id
¶ Get the patient ID
- Returns
Patient ID
- Return type
str
-
property
patient_name
¶ Get the patient name
- Returns
Patient name
- Return type
str
-
property
plan_date
¶ Get the plan date
- Returns
Plan date from DICOM
- Return type
str
-
property
plan_name
¶ Get the plan name
- Returns
Plan name from DICOM
- Return type
str
-
property
radiation_dev
¶ Get the radiation device
- Returns
Radiation device per DICOM-RT Plan
- Return type
str
-
property
summary_data
¶ A summary of data from the QA report
- Returns
Keys will match “column” elements Values are of type str
- Return type
dict
-
property
threshold
¶ Get the minimum dose (%) included in analysis
- Returns
Minimum dose threshold
- Return type
str
-
property
SNC Patient Report Parser¶
SNC Patient report parser
-
class
IQDMPDF.parsers.sncpatient.
SNCPatientCustom
[source]¶ Bases:
IQDMPDF.parsers.generic.ParserBase
Custom SNCPatient report parser
Initialize SNCPatientCustom class
-
property
angle
¶ Angle in QA File Parameter table
- Returns
Angle
- Return type
str
-
property
depth
¶ Depth in QA File Parameter table
- Returns
Depth
- Return type
str
-
property
dist_param
¶ Distance criteria
- Returns
Distance criteria for analysis
- Return type
str
-
property
dose_comparison_type
¶ Dose comparison type based on table title
- Returns
Dose comparison type (e.g., Absolute)
- Return type
str
-
property
dose_diff_param
¶ Dose difference criteria
- Returns
Dose difference criteria for analysis
- Return type
str
-
property
dose_diff_threshold
¶ Dose Diff Threshold
- Returns
Dose Difference Threshold for analysis
- Return type
str
-
property
energy
¶ Energy in QA File Parameter table
- Returns
Energy
- Return type
str
-
property
failed_points
¶ Number of points failing analysis
- Returns
Number of points/detectors not meeting analysis criteria
- Return type
str
-
property
meas_uncertainty
¶ Measurement Uncertainty
- Returns
Whether or not measurement uncertainty is turned on
- Return type
str
-
property
notes
¶ Custom note entered by report author
- Returns
Text from the Notes block
- Return type
str
-
property
pass_rate
¶ Passing rate of points
- Returns
Percentage of points/detectors meeting analysis criteria
- Return type
str
-
property
passed_points
¶ Number of points passing analysis
- Returns
Number of points/detectors meeting analysis criteria
- Return type
str
-
property
patient_id
¶ Patient ID in QA File Parameter table
- Returns
Patient ID
- Return type
str
-
property
patient_name
¶ Patient name in QA File Parameter table
- Returns
Patient name
- Return type
str
-
property
plan_date
¶ Plan date in QA File Parameter table
- Returns
Plan date
- Return type
str
-
property
qa_date
¶ Date in top-left of the report
- Returns
QA report date
- Return type
str
-
property
rotation_angle
¶ Rotation angle
- Returns
Rotation angle applied to data for analysis
- Return type
str
-
property
sdd
¶ SDD in QA File Parameter table
- Returns
SDD
- Return type
str
-
property
ssd
¶ SSD in QA File Parameter table
- Returns
SSD
- Return type
str
-
property
summary_data
¶ A summary of data from the QA report
- Returns
Keys will match “column” elements Values are of type str
- Return type
dict
-
property
summary_type
¶ Title of the dose comparison table
- Returns
Dose comparison type (e.g., Absolute)
- Return type
str
-
property
threshold_param
¶ Dose threshold criteria
- Returns
Minimum dose threshold for analysis
- Return type
str
-
property
total_points
¶ Total Points
- Returns
Total number of points/detectors used for analysis
- Return type
str
-
property
use_global
¶ Use Global %
- Returns
Whether or not Use Global % is turned on
- Return type
str
-
property
use_van_dyk
¶ Use VanDyk
- Returns
Whether or not Van Dyk criteria is turned on
- Return type
str
-
property
-
class
IQDMPDF.parsers.sncpatient.
SNCPatientReport2020
[source]¶ Bases:
IQDMPDF.parsers.generic.GenericReport
SNCPatientReport parser for the new format released in 2020
Initialization of a SNCPatientReport class
PTW VeriSoft Report Parser¶
PTW VeriSoft report parser
-
class
IQDMPDF.parsers.verisoft.
VeriSoftReport
[source]¶ Bases:
IQDMPDF.parsers.generic.ParserBase
PTW VeriSoft IMRT QA report parser
Initialize VeriSoftReport class
-
property
abs_diff
¶ Get all of the Absolute Difference values
- Returns
‘mean’, ‘min’, ‘max’, ‘median’ Absolute Difference values, and ‘mean_units’, etc
- Return type
dict
-
property
abs_diff_max_pos
¶ Get the max absolute dose diff position
- Returns
‘x’ and ‘y’ positions of the maximum absolute dose diff value
- Return type
dict
-
property
abs_diff_min_pos
¶ Get the min absolute dose diff position
- Returns
‘x’ and ‘y’ positions of the min absolute dose diff value
- Return type
dict
-
property
calibrate_air_density
¶ Get the Calibrate Air Density value
- Returns
Calibrate Air Density from Manipulations table
- Return type
str
-
property
comment
¶ Get the comment
- Returns
Comment from Administrative Data table
- Return type
str
-
property
data_set_a
¶ Get Data Set A file path
- Returns
Data Set A file path
- Return type
str
-
property
data_set_b
¶ Get Data Set B file path(s)
- Returns
Strings after _data_set_b_index joined by
- Return type
str
-
property
date
¶ Date printed in footer of report
- Returns
Report date
- Return type
str
-
property
eval_dose_points
¶ Evaluated Dose Points from Statistics table
- Returns
Evaluated Dose Points
- Return type
str
-
property
eval_dose_points_percent
¶ Evaluated Dose Points (%) from Statistics table
- Returns
Evaluated Dose Points (%)
- Return type
str
-
property
failed_points
¶ Failed Dose Points from Statistics table
- Returns
Failed Dose Points
- Return type
str
-
property
failed_points_percent
¶ Failed Dose Points (%) from Statistics table
- Returns
Failed Dose Points (%)
- Return type
str
-
property
gamma_diff
¶ Get all of the Gamma 2D values
- Returns
Mean, min, max, median Gamma values from Gamma 2D
- Return type
dict
-
property
gamma_dist
¶ Get the Gamma Distance to Agreement setting
- Returns
DTA from Gamma 2D - Parameters
- Return type
str
-
property
gamma_dose
¶ Get the Gamma Dose difference value
- Returns
Gamma Dose Difference value from Gamma 2D - Parameters
- Return type
str
-
property
gamma_dose_info
¶ Get the Gamma Dose difference info
- Returns
Gamma Dose Difference normalization from Gamma 2D - Parameters
- Return type
str
-
property
gamma_max_pos
¶ Get the max gamma position
- Returns
‘x’ and ‘y’ positions of the maximum gamma value
- Return type
dict
-
property
gamma_min_pos
¶ Get the min gamma position
- Returns
‘x’ and ‘y’ positions of the minimum gamma value
- Return type
dict
-
property
institution
¶ Get the institution
- Returns
Institution from Administrative Data table
- Return type
str
-
property
num_dose_points
¶ Number of Dose Points from Statistics table
- Returns
Number of Dose Points
- Return type
str
-
property
pass_rate
¶ Result from Statistics table
- Returns
Dose point pass rate
- Return type
str
-
property
pass_result_color
¶ Result color from Statistics table
- Returns
Result color
- Return type
str
-
property
passed_points
¶ Passed Dose Points from Statistics table
- Returns
Passed Dose Points
- Return type
str
-
property
passed_points_percent
¶ Passed Dose Points (%) from Statistics table
- Returns
Passed Dose Points (%)
- Return type
str
-
property
passing_criteria
¶ Passing Criteria from the Settings table
- Returns
Passing criteria
- Return type
str
-
property
passing_green
¶ Green threshold from the Settings table
- Returns
Minimum pass rate for green status
- Return type
str
-
property
passing_red
¶ Red threshold from the Settings table
- Returns
Minimum pass rate for red status
- Return type
str
-
property
passing_yellow
¶ Yellow threshold from the Settings table
- Returns
Minimum pass rate for yellow status
- Return type
str
-
property
patient_id
¶ Get the patient ID
- Returns
Patient ID from Administrative Data table
- Return type
str
-
property
patient_name
¶ Get the patient name
- Returns
Patient name from Administrative Data table
- Return type
str
-
property
physicist
¶ Get the physicist
- Returns
Physicist from Administrative Data table
- Return type
str
-
property
set_zero
¶ Get the Set Zero data
- Returns
Get the Set Zero data from Manipulations table
- Return type
dict
-
property
summary_data
¶ A summary of data from the QA report
- Returns
Keys will match “column” elements Values are of type str
- Return type
dict
-
property
threshold
¶ Get the Gamma Dose threshold value
- Returns
Gamma Dose threshold value from Gamma 2D - Parameters
- Return type
str
-
property
threshold_info
¶ Get the Gamma Dose threshold info
- Returns
Gamma Dose threshold info from Gamma 2D - Parameters
- Return type
str
-
property
version
¶ VeriSoft version printed in footer of report
- Returns
Software version
- Return type
str
-
property
Utilities¶
Common functions for IQDM-PDF
-
IQDMPDF.utilities.
append_files
(files, dir_name, files_to_append, extension=None)[source]¶ Helper function for get_files
- Parameters
files (list) – Accumulate file paths into this list
dir_name (str) – The base path of the files in file_list
files_to_append (list) – A list of file paths to loop accumulate
extension (str, optional) – Collect file paths with only this extension (e.g., ‘.pdf’)
-
IQDMPDF.utilities.
are_all_strings_in_text
(text, list_of_strings)[source]¶ Check that all strings in list_of_strings exist in text
- Parameters
text (str) – output from IQDMPDF.pdf_reader.convert_pdf_to_text
list_of_strings (list of str) – a list of strings used to identify document type
- Returns
Returns true if every string in list_of_strings is found in text data
- Return type
bool
-
IQDMPDF.utilities.
bbox_to_pos
(bbox, mode)[source]¶ Convert a bounding box to an x-y position
- Parameters
bbox (list) – Bounding box from pdf_reader layout object, which is a list of four floats [x0, y0, x1, y1]
mode (str) – Options are combinations of top/center/bottom and right/center/left, e.g., ‘top-right’, ‘center-left’. ‘center’ is assumed to be ‘center-center’
-
IQDMPDF.utilities.
create_arg_parser
()[source]¶ Create an argument parser
- Returns
Argument parsers for command-line use of IQDM-PDF
- Return type
argparse.ArgumentParser
-
IQDMPDF.utilities.
creation_date
(path_to_file)[source]¶ Try to get the date that a file was created, falling back to when it was last modified if that isn’t possible. See http://stackoverflow.com/a/39501288/1709587 for explanation.
- Parameters
path_to_file (str) – Path to any file
- Returns
Time stamp of file
- Return type
float
-
IQDMPDF.utilities.
get_files
(init_dir, search_sub_dir=True, extension=None)[source]¶ Collect paths of all files in a director
- Parameters
init_dir (str) – Initial directory to begin scanning
search_sub_dir (bool) – Recursively search through sub-directories if True
extension (str, optional) – Collect file paths with only this extension (e.g., ‘.pdf’)
- Returns
List of file paths
- Return type
list
-
IQDMPDF.utilities.
get_relative_path
(path, relative_base)[source]¶ Return a partial path with the specified base
- Parameters
path (str) – A path with relative_base as a sub-component
relative_base (str) – A directory within path
- Returns
The path with all components prior to relative_base removed
- Return type
str
-
IQDMPDF.utilities.
get_sorted_indices
(some_list, reverse=False)[source]¶ Get sorted indices of some_list
- Parameters
some_list (list) – Any list compatible with sorted()
reverse (bool) – Reverse sort if True
-
IQDMPDF.utilities.
is_in_tol
(value, expected_value, tolerance)[source]¶ Is the provided value within expected_value +/- tolerance
- Parameters
value (int, float) – Value of interest
expected_value (int, float) – Expected value
tolerance (int, float) – Allowed deviation from expected_value
- Returns
True if value is within within expected_value +/- tolerance, exclusive
- Return type
bool
-
IQDMPDF.utilities.
is_numeric
(val)[source]¶ Check if value is numeric (float or int)
- Parameters
val (any) – Any value
- Returns
Returns true if float(val) doesn’t raise a ValueError
- Return type
bool
-
IQDMPDF.utilities.
run_multiprocessing
(worker, queue, processes, callback=None)[source]¶ Parallel processing
- Parameters
worker (callable) – single parameter function to be called on each item in queue
queue (iterable) – A list of arguments for worker
processes (int) – Number of processes for multiprocessing.Pool
callback (callable) – Optional call back function on progress update, accepts str rep of tqdm object. Final call sent with ‘complete’
- Returns
List of returns from worker
- Return type
list