Welcome to the documentation for IQDM-PDF!¶
IQDM-PDF¶
What does it do?¶
Scans a directory for IMRT QA reports and parses data into a CSV.
Other information¶
This library is part of the IMRT QA Data Mining (IQDM) project for the AAPM IMRT Working Group (WGIMRT).
Free software: MIT license
Documentation: Read the docs
Tested on Python 3.6, 3.7, 3.8, 3.9
Dependencies¶
Install¶
Latest PyPI release:
$ pip install iqdmpdf
Install from source:
$ python setup.py install
If you do not have a C++ compiler installed, you might have issues with installing the latest version of pdfminer.six. The following might resolve your issue:
$ pip install pdfminer.six==20200726
Usage¶
To scan a directory for IMRT QA report files and generate a results .csv file into your current directory:
$ iqdmpdf [init_directory]
As of v0.2.2, multi-threading is enabled. For example, you can enable 4 simultaneous threads with the following:
$ iqdmpdf [init_directory] -n 4
usage: iqdmpdf [-h] [-ie] [-od OUTPUT_DIR] [-of OUTPUT_FILE] [-ver] [-nr]
[-re] [-n PROCESSES]
[init_directory]
Command line interface for IQDM-PDF
positional arguments:
init_directory Initiate scan here
optional arguments:
-h, --help show this help message and exit
-ie, --ignore-extension
Script will check all files, not just ones with .pdf
extensions
-od OUTPUT_DIR, --output-dir OUTPUT_DIR
Output stored in local directory by default, specify
otherwise here
-of OUTPUT_FILE, --output-file OUTPUT_FILE
Output will be saved as <report_type>_results_<time-
stamp>.csv by default. Define this tag to customize
file name after <report_type>_
-ver, --version Print the IQDM version
-nr, --no-recursive-search
Include this flag to skip sub-directories
-re, --raise-errors Allow failed file parsing to halt the program
-n PROCESSES, --processes PROCESSES
Enable multiprocessing, set number of parallel
processes
Vendor Compatibility¶
We plan to support many vendors. If the report is very consistent, a new JSON file in the report_templates is essentially all that is needed. Additional documentation for custom templates can be found here.
Sun Nuclear: SNC Patient
ScandiDos: Delta4
PTW: VeriSoft
Credits¶
Development Lead¶
Dan Cutright
Contributors¶
Marc Chamberland
Aditya Panchal
Test Data¶
Example IMRT QA reports used for unit testing and design are available here.
- Dan Cutright, University of Chicago Hospital
delta4/UChicago
sncpatient/UChicago
- Marc Chamberland, University of Vermont Health Network
sncpatient/UVermontHealthNetwork
- Serpil Kucuker Dogan, Nortwestern Memorial Hospital
sncpatient/Northwestern_Memorial
sncpatient2020/Northwestern_Memorial
- Aditya Panchal, AMITA Health
verisoft/AMITA_Health
- Michael Snyder, Beaumont Health
sncpatient/Beaumont
How It Works¶
IQDM-PDF uses pdfminer.six to extract text and coordinates from IMRT QA PDF files.
Step 1: Match Report Parser¶
Each report parser has an identifiers
property which contains words and
phrases used to uniquely pair a PDF to a report parser. If all of the
identifiers are found in the PDF text, that report parser will be
selected.
Step 2: Parse Data by Text Box Coordinates¶
The text data is collected with the selected report parser, which is stored by page and bounding box coordinates. Report parsers can look up a text value by page and coordinate.
Step 3: Apply Template¶
Unless customized logic is needed, a GenericParser
class can be used, which reads in a JSON file containing three keys:
report type
, identifiers
, and data
. Required keys of data
are column
, page
, and pos
. For further customization, see
the get_block_data
function documentation in CustomPDFReader
. All keys from data
(except
column
) are passed.
Check out the report templates on GitHub for examples.
In the simplest case, a report parser class looks something like the following [source]:
class SNCPatientReport2020(GenericReport):
"""SNCPatientReport parser for the new format released in 2020"""
def __init__(self):
"""Initialization of a SNCPatientReport class"""
template = join(DIRECTORIES["REPORT_TEMPLATES"], "sncpatient2020.json")
GenericReport.__init__(self, template)
Then update the REPORT_CLASSES
list in parser.py
to include the new report parser class.
Step 4: Iterate¶
From the command-line, you can iterate over all files in a provided directory, and save the results into a CSV file per vendor/template:
$ iqdmpdf your/initial/dir
Or from a python console:
>>> from IQDMPDF.file_processor import process_files
>>> process_files("your/initial/dir")
Non-Template Based Parsing¶
If the data in the reports have varying coordinates, the code needs more customization. See the Delta4 report parser for examples/inspiration.
Generally speaking, the LAParams for pdfminer.six
are customized (e.g., char_margin
, line_margin
) to get sections of the
IMRT QA report text to be collected into one block. Then key words are used to
connect data to variable names. Another trick is to look up the positions
boxes containing key words, then use the y-position to search for another
block of text laterally (used frequently in the PTW Verisoft parser).
These methods are needed if reports have variable templates, fonts, or font sizes. So far, all of IQDM-PDF’s parsers are non-template based, with the exception of the new SNC Patient format introduced in 2020.
Building a New Template¶
Currently, building a new JSON template requires some python scripting to determine coordinates. The output from the following code will show all text bounding box coordinates and contents.
>>> from IQDMPDF.pdf_reader import CustomPDFReader
>>> data = CustomPDFReader("path/to/report.pdf")
>>> print(data)
Below is a sample of the output from: example_reports/sncpatient/UChicago/DCAM_example_1.pdf
page_index: 0, data_index: 21
bbox: [6.24, 445.18, 140.33, 463.88]
Absolute Dose Comparison
Difference (%)
page_index: 0, data_index: 22
bbox: [79.2, 445.18, 88.84, 452.14]
: 2
page_index: 0, data_index: 23
bbox: [6.24, 432.94, 51.47, 439.9]
Distance (mm)
page_index: 0, data_index: 24
bbox: [79.2, 432.94, 88.84, 439.9]
: 2
page_index: 0, data_index: 25
bbox: [6.24, 420.7, 49.8, 427.66]
Threshold (%)
page_index: 0, data_index: 26
bbox: [79.2, 420.7, 98.37, 427.66]
: 10.0
The data
object in the resulting JSON file for this data would look like:
[
{"column": "Difference (%)", "page": 0, "pos": [79.2, 441.02]},
{"column": "Distance (mm)", "page": 0, "pos": [79.2, 432.94]},
{"column": "Threshold (%)", "page": 0, "pos": [79.2, 420.7]}
]
Note that the value for column
doesn’t need to match any text in the PDF.
The pos
element is assumed to be the bottom left corner of the bounding
box by default. If the PDF layout has centered or right-aligned elements, you
can specify mode
to be any combination of bottom/center/top and
left/center/right. (e.g., top-right
or center-left
;
center
is equivalent to center-center
).
For example, if an element is more consistently found at the center of a
bounding box, the data
element could look like:
{
"column": "Difference (%)",
"page": 0,
"pos": [88.79, 424.18],
"mode": "center"
}
IQDM PDF¶
PDF Reader Module¶
Read PDF files into python objects
-
class
IQDMPDF.pdf_reader.
CustomPDFReader
(file_path, laparams_kwargs=None)[source]¶ Bases:
object
Custom PDF Parsing module
Initialize a CustomPDFReader object
- Parameters
file_path (str) – Absolute file path to the PDF to be read
-
get_bbox_of_data
(text, return_all=False, include_text=False)[source]¶ Get the bounding box for a given string
- Parameters
text (str) – Check all parsed data for this string. Return the first bounding box that contains this text. Meant to search for a unique str
return_all (bool) – If true, then return a list containing all matches, in the order pdfminer.six found them
include_text (bool) – If true, also return the text data
- Returns
“page”->int and “bbox”->[x0, y0, x1, y1]. If include_data is true, “text”->str will contain the text data. If return_all is true, return a list of these dict objects.
- Return type
dict, list
-
get_block_data
(page, pos, tol=10, text_cleaner=None, numeric=None, ignored=None, mode='bottom-left')[source]¶ Use PDFPageParser.get_block_data for the provided page
- Parameters
page (int) – The index of the PDF page
pos (tuple of int, float) – The (x,y) coordinates of the text block to be retrieved
tol (int, float, tuple) – Maximum distance a block’s x or y-coordinate may be from pos. If a tuple is provided, first value is the x_tolerance, 2nd is y_tolerance
text_cleaner (callable, optional) – A function called on each text element (e.g., remove leading ‘:’)
numeric (bool, optional) – If true, only return value if it is numeric. If false, only return value if it is not numeric. Leave as None to ignore this feature.
ignored (list, optional) – Optionally provide a list of strings that should be ignored. If the value of the block data is in this list, the value will become an empty string instead
mode (str, optional) – Options are combinations of top/center/bottom and right/center/left, e.g., ‘top-right’, ‘center-right’. ‘center’ is assumed to be ‘center-center’. Default is ‘bottom-left’.
- Returns
All text data that meet the input constraints
- Return type
list of str
-
class
IQDMPDF.pdf_reader.
PDFPageParser
(lt_objs, page_data, page_index=0)[source]¶ Bases:
object
Custom PDF Page Parsing module
Initialization of PDFPageParser
- Parameters
lt_objs (list) – A layout object from PDFPageAggregator.get_result()._objs
page_data (dict) – A dictionary of lists, with keys ‘x’, ‘y’, ‘text’
page_index (int, optional) – The index of the page
-
get_block_data
(pos, tol, text_cleaner=None, numeric=None, ignored=None, mode='bottom-left')[source]¶ Get the text block data by x,y coordinates
- Parameters
pos (list of int, float) – The (x,y) coordinates of the text block to be retrieved
tol (int, float, tuple) – Maximum distance a block’s x or y-coordinate may be from pos. If a tuple is provided, first value is the x_tolerance, 2nd is y_tolerance
text_cleaner (callable, optional) – A function called on each text element (e.g., remove leading ‘:’)
numeric (bool, optional) – If true, only return value if it is numeric. If false, only return value if it is not numeric. Leave as None to ignore this feature.
ignored (list, optional) – Optionally provide a list of strings that should be ignored. If the value of the block data is in this list, the value will become an empty string instead
mode (str, optional) – Options are combinations of top/center/bottom and right/center/left, e.g., ‘top-right’, ‘center-right’. ‘center’ is assumed to be ‘center-center’. Default is ‘bottom-left’.
- Returns
All text data that meet the input constraints
- Return type
list of str
-
parse_obj
(lt_objs)[source]¶ Extract x, y, and text data from a layout objects
- Parameters
lt_objs (list) – A layout object from PDFPageAggregator.get_result()._objs
File Processor¶
Process IMRT QA file(s) into CSV file(s)
-
IQDMPDF.file_processor.
print_callback
(msg)[source]¶ Simple print callback for process_files
- Parameters
msg (dict) – The message sent from process_files
-
IQDMPDF.file_processor.
process_file
(file_path, output_file, output_dir=None)[source]¶ Process a pdf file into a parser class, write data to csv
- Parameters
file_path (str) – PDF file to processed
output_file (str) – Report type in file name will be prepended to this value
output_dir (str, optional) – Save results to this directory, default is local directory
-
IQDMPDF.file_processor.
process_file_worker
(file_path)[source]¶ Mutliprocessing worker function
- Parameters
file_path (str) – PDF file to be passed to ReportParser
- Returns
{“data”: ReportParser.csv_data, “report_type”: ReportParser.report_type, “columns”: ReportParser.columns}
- Return type
dict
-
IQDMPDF.file_processor.
process_files
(init_directory, ignore_extension=False, output_file=None, output_dir=None, no_recursive_search=False, callback=None, raise_errors=False, processes=1)[source]¶ Process all pdf files into parser classes, write data to csv
- Parameters
init_directory (str) – initial scanning directory
ignore_extension (bool, optional) – Set to True to catch pdf files that are missing .pdf extension
output_file (str, optional) – Report type in file name will be prepended to this value
output_dir (str, optional) – Save results to this directory, default is local directory
no_recursive_search (bool, optional) – Ignore sub-directories it True
callback (callable) – Pointer to a function to be called before each process_file call. The parameter will be dict with keys of “label” and “gauge”.
raise_errors (bool) – Set to True to allow errors to be raised (useful for debugging)
processes (int) – Number of parallel processes allowed
-
IQDMPDF.file_processor.
validate_kwargs
(kwargs, add_print_callback=True)[source]¶ Process kwargs from main for process_files
- Parameters
kwargs (dict) – Keyword arguments for main. See main.create_arg_parser for valid arguments
add_print_callback (bool) – If true, add simple print function at the start of each process_file call
- Returns
Returns a dict containing only keywords applicable to process_files, or an empty dict if “init_directory” is missing or “print_version” is True and “init_directory” is missing
- Return type
dict
-
IQDMPDF.file_processor.
write_csv
(file_path, rows, mode='w', newline='')[source]¶ Create csv.writer, call writerows(rows)
- file_pathstr
path to file
- rowslist, iterable
Items to be written to file_pointer (input for csv.writer.writerows)
- modestr
optional string that specifies the mode in which the file is opened
- newlinestr
controls how universal newlines mode works. It can be None, ‘’, ‘
‘, ‘ ‘, and ‘ ‘
Unified Report Parser¶
Unified IMRT QA report parser
-
class
IQDMPDF.parsers.parser.
ReportParser
(file_path)[source]¶ Bases:
object
Determines which Report class to use, then processes the data.
Initialization class for ReportParser
- Parameters
file_path (str) – File path pointing to an IMRT QA report
-
property
columns
¶ Get columns headers for csv
- Returns
Report columns + “report_file_creation” + “report_file_path”
- Return type
list
-
property
csv_data
¶ Get a csv string from the selected ReportParser
- Returns
Report columns + “report_file_creation” + “report_file_path”
- Return type
str
-
get_report
()[source]¶ Determine the report_class, then return class with data processed
- Returns
Searches for a Report Class with matching identifiers, processes the file and returns the Report Class
- Return type
ParserBase inherited class
-
property
report_type
¶ Get report type of the selected ReportParser
- Returns
Get ReportParser.report_type
- Return type
str
Generic Report Parser¶
Generic IMRT QA report parser
-
class
IQDMPDF.parsers.generic.
GenericReport
(json_file_path, text_cleaner=None)[source]¶ Bases:
IQDMPDF.parsers.generic.ParserBase
Generic IMRT QA PDF report parser based on page, x, y values
Initialization of a GenericReport class
- Parameters
json_file_path (str) – File path to a JSON file describing the PDF report. It should contain these keys (type): report_type (str), identifiers (list of str), and data (list). The format of each data element should be {‘column’: [str], ‘page’: [int], ‘pos’: [float, float]}. Optionally, you can also supply ‘tol’, which is either an integer or a list of integers (i.e., [x_tol, y_tol]). Also, specifying ‘numeric’ with a boolean value will ensure the value is or is not numeric (and return an empty string if not met). The JSON object can also have “alternates” which contains an array of data like items that will be checked until a value for a column is found. “ignored” is another option, if a value is returned that is in this array, an empty string will be returned instead. The value of “column” is automatically added to the “ignored” array.
text_cleaner (callable, optional) – A function called on each text element (e.g., remove leading ‘:’)
-
property
summary_data
¶ A summary of data from the QA report
- Returns
Keys will match “column” elements from the JSON file. Values are of type str
- Return type
dict
-
class
IQDMPDF.parsers.generic.
ParserBase
[source]¶ Bases:
object
Base class for all Report Parser classes, not to be used alone
Initialize columns and identifiers
-
property
csv_data
¶ Get a CSV data of summary_data for all columns for csv.writer
- Returns
summary data as a list in order of columns. File path automatically appended to data
- Return type
list
-
property
ScandiDos Delta4 Report Parser¶
Delta4 QA report parser
-
class
IQDMPDF.parsers.delta4.
Delta4Report
[source]¶ Bases:
IQDMPDF.parsers.generic.ParserBase
Custom Delta4 report parser
Initialize SNCPatientCustom class
-
property
accepted_date
¶ Get the QA accepted date
- Returns
QA Accepted date from DICOM
- Return type
str
-
property
beam_count
¶ Get the number of delivered beams in the report
- Returns
The number of beams
- Return type
int
-
property
composite_tx_summary_data
¶ Get the composite analysis data
- Returns
‘norm_dose’, ‘dev’, ‘dta’, ‘gamma_index’, and ‘dose_dev’
- Return type
dict
-
property
daily_corr
¶ Get the daily correction factor
- Returns
The daily correction factor
- Return type
str
-
property
energy
¶ Beam energy
- Returns
Energy of the first reported beam
- Return type
str
-
property
gamma_distance
¶ Get the gamma distance criteria
- Returns
Gamma analysis distance criteria
- Return type
str
-
property
gamma_dose
¶ Get the Gamma Analysis dose criteria
- Returns
Gamma dose criteria
- Return type
str
-
property
gamma_pass_criteria
¶ Get the gamma analysis pass-rate criteria
- Returns
Gamma pass-rate criteria
- Return type
str
-
property
measured_date
¶ Get the measured name
- Returns
Date of QA measurement
- Return type
str
-
property
patient_id
¶ Get the patient ID
- Returns
Patient ID
- Return type
str
-
property
patient_name
¶ Get the patient name
- Returns
Patient name
- Return type
str
-
property
plan_date
¶ Get the plan date
- Returns
Plan date from DICOM
- Return type
str
-
property
plan_name
¶ Get the plan name
- Returns
Plan name from DICOM
- Return type
str
-
property
radiation_dev
¶ Get the radiation device
- Returns
Radiation device per DICOM-RT Plan
- Return type
str
-
property
summary_data
¶ A summary of data from the QA report
- Returns
Keys will match “column” elements Values are of type str
- Return type
dict
-
property
threshold
¶ Get the minimum dose (%) included in analysis
- Returns
Minimum dose threshold
- Return type
str
-
property
SNC Patient Report Parser¶
SNC Patient report parser
-
class
IQDMPDF.parsers.sncpatient.
SNCPatientCustom
[source]¶ Bases:
IQDMPDF.parsers.generic.ParserBase
Custom SNCPatient report parser
Initialize SNCPatientCustom class
-
property
angle
¶ Angle in QA File Parameter table
- Returns
Angle
- Return type
str
-
property
depth
¶ Depth in QA File Parameter table
- Returns
Depth
- Return type
str
-
property
dist_param
¶ Distance criteria
- Returns
Distance criteria for analysis
- Return type
str
-
property
dose_comparison_type
¶ Dose comparison type based on table title
- Returns
Dose comparison type (e.g., Absolute)
- Return type
str
-
property
dose_diff_param
¶ Dose difference criteria
- Returns
Dose difference criteria for analysis
- Return type
str
-
property
dose_diff_threshold
¶ Dose Diff Threshold
- Returns
Dose Difference Threshold for analysis
- Return type
str
-
property
energy
¶ Energy in QA File Parameter table
- Returns
Energy
- Return type
str
-
property
failed_points
¶ Number of points failing analysis
- Returns
Number of points/detectors not meeting analysis criteria
- Return type
str
-
property
meas_uncertainty
¶ Measurement Uncertainty
- Returns
Whether or not measurement uncertainty is turned on
- Return type
str
-
property
notes
¶ Custom note entered by report author
- Returns
Text from the Notes block
- Return type
str
-
property
pass_rate
¶ Passing rate of points
- Returns
Percentage of points/detectors meeting analysis criteria
- Return type
str
-
property
passed_points
¶ Number of points passing analysis
- Returns
Number of points/detectors meeting analysis criteria
- Return type
str
-
property
patient_id
¶ Patient ID in QA File Parameter table
- Returns
Patient ID
- Return type
str
-
property
patient_name
¶ Patient name in QA File Parameter table
- Returns
Patient name
- Return type
str
-
property
plan_date
¶ Plan date in QA File Parameter table
- Returns
Plan date
- Return type
str
-
property
qa_date
¶ Date in top-left of the report
- Returns
QA report date
- Return type
str
-
property
rotation_angle
¶ Rotation angle
- Returns
Rotation angle applied to data for analysis
- Return type
str
-
property
sdd
¶ SDD in QA File Parameter table
- Returns
SDD
- Return type
str
-
property
ssd
¶ SSD in QA File Parameter table
- Returns
SSD
- Return type
str
-
property
summary_data
¶ A summary of data from the QA report
- Returns
Keys will match “column” elements Values are of type str
- Return type
dict
-
property
summary_type
¶ Title of the dose comparison table
- Returns
Dose comparison type (e.g., Absolute)
- Return type
str
-
property
threshold_param
¶ Dose threshold criteria
- Returns
Minimum dose threshold for analysis
- Return type
str
-
property
total_points
¶ Total Points
- Returns
Total number of points/detectors used for analysis
- Return type
str
-
property
use_global
¶ Use Global %
- Returns
Whether or not Use Global % is turned on
- Return type
str
-
property
use_van_dyk
¶ Use VanDyk
- Returns
Whether or not Van Dyk criteria is turned on
- Return type
str
-
property
-
class
IQDMPDF.parsers.sncpatient.
SNCPatientReport2020
[source]¶ Bases:
IQDMPDF.parsers.generic.GenericReport
SNCPatientReport parser for the new format released in 2020
Initialization of a SNCPatientReport class
PTW VeriSoft Report Parser¶
PTW VeriSoft report parser
-
class
IQDMPDF.parsers.verisoft.
VeriSoftReport
[source]¶ Bases:
IQDMPDF.parsers.generic.ParserBase
PTW VeriSoft IMRT QA report parser
Initialize VeriSoftReport class
-
property
abs_diff
¶ Get all of the Absolute Difference values
- Returns
‘mean’, ‘min’, ‘max’, ‘median’ Absolute Difference values, and ‘mean_units’, etc
- Return type
dict
-
property
abs_diff_max_pos
¶ Get the max absolute dose diff position
- Returns
‘x’ and ‘y’ positions of the maximum absolute dose diff value
- Return type
dict
-
property
abs_diff_min_pos
¶ Get the min absolute dose diff position
- Returns
‘x’ and ‘y’ positions of the min absolute dose diff value
- Return type
dict
-
property
calibrate_air_density
¶ Get the Calibrate Air Density value
- Returns
Calibrate Air Density from Manipulations table
- Return type
str
-
property
comment
¶ Get the comment
- Returns
Comment from Administrative Data table
- Return type
str
-
property
data_set_a
¶ Get Data Set A file path
- Returns
Data Set A file path
- Return type
str
-
property
data_set_b
¶ Get Data Set B file path(s)
- Returns
Strings after _data_set_b_index joined by
- Return type
str
-
property
date
¶ Date printed in footer of report
- Returns
Report date
- Return type
str
-
property
eval_dose_points
¶ Evaluated Dose Points from Statistics table
- Returns
Evaluated Dose Points
- Return type
str
-
property
eval_dose_points_percent
¶ Evaluated Dose Points (%) from Statistics table
- Returns
Evaluated Dose Points (%)
- Return type
str
-
property
failed_points
¶ Failed Dose Points from Statistics table
- Returns
Failed Dose Points
- Return type
str
-
property
failed_points_percent
¶ Failed Dose Points (%) from Statistics table
- Returns
Failed Dose Points (%)
- Return type
str
-
property
gamma_diff
¶ Get all of the Gamma 2D values
- Returns
Mean, min, max, median Gamma values from Gamma 2D
- Return type
dict
-
property
gamma_dist
¶ Get the Gamma Distance to Agreement setting
- Returns
DTA from Gamma 2D - Parameters
- Return type
str
-
property
gamma_dose
¶ Get the Gamma Dose difference value
- Returns
Gamma Dose Difference value from Gamma 2D - Parameters
- Return type
str
-
property
gamma_dose_info
¶ Get the Gamma Dose difference info
- Returns
Gamma Dose Difference normalization from Gamma 2D - Parameters
- Return type
str
-
property
gamma_max_pos
¶ Get the max gamma position
- Returns
‘x’ and ‘y’ positions of the maximum gamma value
- Return type
dict
-
property
gamma_min_pos
¶ Get the min gamma position
- Returns
‘x’ and ‘y’ positions of the minimum gamma value
- Return type
dict
-
property
institution
¶ Get the institution
- Returns
Institution from Administrative Data table
- Return type
str
-
property
num_dose_points
¶ Number of Dose Points from Statistics table
- Returns
Number of Dose Points
- Return type
str
-
property
pass_rate
¶ Result from Statistics table
- Returns
Dose point pass rate
- Return type
str
-
property
pass_result_color
¶ Result color from Statistics table
- Returns
Result color
- Return type
str
-
property
passed_points
¶ Passed Dose Points from Statistics table
- Returns
Passed Dose Points
- Return type
str
-
property
passed_points_percent
¶ Passed Dose Points (%) from Statistics table
- Returns
Passed Dose Points (%)
- Return type
str
-
property
passing_criteria
¶ Passing Criteria from the Settings table
- Returns
Passing criteria
- Return type
str
-
property
passing_green
¶ Green threshold from the Settings table
- Returns
Minimum pass rate for green status
- Return type
str
-
property
passing_red
¶ Red threshold from the Settings table
- Returns
Minimum pass rate for red status
- Return type
str
-
property
passing_yellow
¶ Yellow threshold from the Settings table
- Returns
Minimum pass rate for yellow status
- Return type
str
-
property
patient_id
¶ Get the patient ID
- Returns
Patient ID from Administrative Data table
- Return type
str
-
property
patient_name
¶ Get the patient name
- Returns
Patient name from Administrative Data table
- Return type
str
-
property
physicist
¶ Get the physicist
- Returns
Physicist from Administrative Data table
- Return type
str
-
property
set_zero
¶ Get the Set Zero data
- Returns
Get the Set Zero data from Manipulations table
- Return type
dict
-
property
summary_data
¶ A summary of data from the QA report
- Returns
Keys will match “column” elements Values are of type str
- Return type
dict
-
property
threshold
¶ Get the Gamma Dose threshold value
- Returns
Gamma Dose threshold value from Gamma 2D - Parameters
- Return type
str
-
property
threshold_info
¶ Get the Gamma Dose threshold info
- Returns
Gamma Dose threshold info from Gamma 2D - Parameters
- Return type
str
-
property
version
¶ VeriSoft version printed in footer of report
- Returns
Software version
- Return type
str
-
property
Utilities¶
Common functions for IQDM-PDF
-
IQDMPDF.utilities.
append_files
(files, dir_name, files_to_append, extension=None)[source]¶ Helper function for get_files
- Parameters
files (list) – Accumulate file paths into this list
dir_name (str) – The base path of the files in file_list
files_to_append (list) – A list of file paths to loop accumulate
extension (str, optional) – Collect file paths with only this extension (e.g., ‘.pdf’)
-
IQDMPDF.utilities.
are_all_strings_in_text
(text, list_of_strings)[source]¶ Check that all strings in list_of_strings exist in text
- Parameters
text (str) – output from IQDMPDF.pdf_reader.convert_pdf_to_text
list_of_strings (list of str) – a list of strings used to identify document type
- Returns
Returns true if every string in list_of_strings is found in text data
- Return type
bool
-
IQDMPDF.utilities.
bbox_to_pos
(bbox, mode)[source]¶ Convert a bounding box to an x-y position
- Parameters
bbox (list) – Bounding box from pdf_reader layout object, which is a list of four floats [x0, y0, x1, y1]
mode (str) – Options are combinations of top/center/bottom and right/center/left, e.g., ‘top-right’, ‘center-left’. ‘center’ is assumed to be ‘center-center’
-
IQDMPDF.utilities.
create_arg_parser
()[source]¶ Create an argument parser
- Returns
Argument parsers for command-line use of IQDM-PDF
- Return type
argparse.ArgumentParser
-
IQDMPDF.utilities.
creation_date
(path_to_file)[source]¶ Try to get the date that a file was created, falling back to when it was last modified if that isn’t possible. See http://stackoverflow.com/a/39501288/1709587 for explanation.
- Parameters
path_to_file (str) – Path to any file
- Returns
Time stamp of file
- Return type
float
-
IQDMPDF.utilities.
get_files
(init_dir, search_sub_dir=True, extension=None)[source]¶ Collect paths of all files in a director
- Parameters
init_dir (str) – Initial directory to begin scanning
search_sub_dir (bool) – Recursively search through sub-directories if True
extension (str, optional) – Collect file paths with only this extension (e.g., ‘.pdf’)
- Returns
List of file paths
- Return type
list
-
IQDMPDF.utilities.
get_relative_path
(path, relative_base)[source]¶ Return a partial path with the specified base
- Parameters
path (str) – A path with relative_base as a sub-component
relative_base (str) – A directory within path
- Returns
The path with all components prior to relative_base removed
- Return type
str
-
IQDMPDF.utilities.
get_sorted_indices
(some_list, reverse=False)[source]¶ Get sorted indices of some_list
- Parameters
some_list (list) – Any list compatible with sorted()
reverse (bool) – Reverse sort if True
-
IQDMPDF.utilities.
is_in_tol
(value, expected_value, tolerance)[source]¶ Is the provided value within expected_value +/- tolerance
- Parameters
value (int, float) – Value of interest
expected_value (int, float) – Expected value
tolerance (int, float) – Allowed deviation from expected_value
- Returns
True if value is within within expected_value +/- tolerance, exclusive
- Return type
bool
-
IQDMPDF.utilities.
is_numeric
(val)[source]¶ Check if value is numeric (float or int)
- Parameters
val (any) – Any value
- Returns
Returns true if float(val) doesn’t raise a ValueError
- Return type
bool
-
IQDMPDF.utilities.
run_multiprocessing
(worker, queue, processes, callback=None)[source]¶ Parallel processing
- Parameters
worker (callable) – single parameter function to be called on each item in queue
queue (iterable) – A list of arguments for worker
processes (int) – Number of processes for multiprocessing.Pool
callback (callable) – Optional call back function on progress update, accepts str rep of tqdm object. Final call sent with ‘complete’
- Returns
List of returns from worker
- Return type
list
Unit Testing¶
IQDM-PDF employs unit testing to ensure that updates don’t break previous examples. It also ensures that the identifiers assigned to a report parser are sufficiently unique.
New Example PDFs¶
Any modifications to report parsers require an example PDF to be included in tests/test_data/examples_reports. The expected results should be added to tests/test_data/expected_report_data.py.
Expected Report Data¶
The variable TEST_DATA
in expected_report_data.py contains exepected
data and paths to PDFs for all vendors. An example output from
TEST_DATA[vendor][example_description]:
{
"path": join(DIRECTORIES["DELTA4_EXAMPLES"], "UChicago", "DCAM_example_1.pdf"),
"data": summary_data
}
Where summary_data
is the output from the report parser’s property
summary_data
. It’s important to use IQDMPDF.paths.DIRECTORIES
to ensure source
code and installed versions know where the test data is.
If adding a new vendor or report template, a new unit testing class can be added to tests/test_report_parsers.py in a fashion similar to below:
class TestNewVendor(TestReportParserBase, unittest.TestCase):
def setUp(self):
self.do_setup_for_vendor("new_vendor")
Then just update PARSERS
near the top of test_report_parsers.py
with a “new_vendor” key pointing to the new report parser.
Credits¶
Development Lead¶
Dan Cutright
Contributors¶
Marc Chamberland
Aditya Panchal
Test Data¶
Example IMRT QA reports used for unit testing and design are available here.
- Dan Cutright, University of Chicago Hospital
delta4/UChicago
sncpatient/UChicago
- Marc Chamberland, University of Vermont Health Network
sncpatient/UVermontHealthNetwork
- Serpil Kucuker Dogan, Nortwestern Memorial Hospital
sncpatient/Northwestern_Memorial
sncpatient2020/Northwestern_Memorial
- Aditya Panchal, AMITA Health
verisoft/AMITA_Health
- Michael Snyder, Beaumont Health
sncpatient/Beaumont
Change Log for IQDM-PDF¶
v0.3.0 (2021.03.14)¶
Brand new Delta4 parser using only relative positions
v0.2.9 (2021.03.11)¶
Better date parsing for Delta4
Address “Set1” issue for long patient names with
SNCPatientCustom
Add
report_file_creation
column
v0.2.8 (2021.03.07)¶
IQDM Analytics support from GUI
v0.2.7 (2021.03.04)¶
Updates to SNCPatient2020 parser
Ignore parsed values that are equal to column names
Added
analysis_columns
property for IQDM Analytics support
v0.2.6 (2021.02.11)¶
New Custom SNCPatient parser using relative positions
v0.2.5 (2021.01.27)¶
PTW VeriSoft: Collect ‘Set Zero’ data
Use csv standard library for CSV writing
v0.2.4 (2021.01.24)¶
Support for PTW VeriSoft
v0.2.3 (2021.01.21)¶
Added optional
alternates
in JSON templatesAdded optional
numeric
flag to make sure value is or is not numericalAdded optional
ignored
flag to ignore any returned value in this array
v0.2.2 (2021.01.16)¶
Multi-threading support