Part 2.: Automated medical physics quality control (QC) in Radiation Oncology


Deep learning models are used for various radiation oncology tasks to improve efficiency, increase accuracy and reduce errors. For example, these models may be used for:

  • outlining organs at risk on CT or MRI image data;
  • predicting patient-specific quality assurance results;
  • developing novel dose calculation algorithms [1]; and
  • Comparing proton vs photon radiation therapy plans to determine if a patient would benefit from proton therapy [2].

Dr. Michael Douglass, PhD, CMPS, MACPSEM

Young Achiever Award 2021


Medical Physicist & Research Fellow

Royal Adelaide Hospital,

University of Adelaide,

Australian Bragg Centre for Proton Therapy and Research


One largely overlooked area is routine machine performance quality checks (QC) of linear accelerators or other radiation delivery devices. In a recent literature review on QC of radiation therapy equipment, I found only three articles describing machine learning for linear accelerator specific QA tasks [3,4]. Many (if not most medical physics departments worldwide) are using some form of automated analysis as part of their routine linear accelerator QA.

Tasks involving phantom imaging tests on the linear accelerator EPID or on-board kV image system are the easiest to automate. Some examples of image QC accuracy, reproducibility or function tests are:

  • kV and MV image quality measurements; and
  • physical checks of linear accelerator:
    • mechanical isocentre;
    • radiation field size;
    • radiation “star shots”, and
    • ‘Picket Fence’ MLC.

If the regular QA tests of the EPID show stable dosimetry and geometric performance, and the method of setting up the test phantom is reproducible, then computer software programs can be written which can automatically detect and analyse the EPID or kV images.

‘In-house’ developed scripts or commercial automated QA solutions can significantly improve the test procedure. The process can be more efficient and take less time for routine linear accelerator QA. Most of our department’s monthly QA software is reliable and efficiently automates the data analysis and storage of results. But there are occasions when the test data may vary from the normal situation, and the code fails to analyse it properly.

Software analysis failure could be due to:

  • EPID electronic noise; or
  • dead pixels affecting EPID image quality; or
  • poor phantom set up for the test.

With the proviso that all results continue to be rigorously checked by staff to guard against unforeseen disparities, we decided to explore the use of deep learning software analysis to automate linear accelerator QC checks. If successful, a knowledge-based approach could potentially overcome the limitations of traditional, functional analysis of EPID image software. Thus, a more robust analysis procedure could assist with the medical physics QC tests by automatically removing the aforementioned software analysis failures.


Example of deep learning analysis: Linear accelerator QC tests

We decided to automate the analysis of EPID images obtained by the Winston Lutz test method to test this hypothesis. This is an important test because it determines (amongst other things) the congruence of the room lasers to the linear accelerator’s radiation isocentre point in space (Figure 1).

Figure 1. Winston Lutz set up with Linac and Lasers (Figure from Automated Winston-lutz test for efficient quality control in stereotactic radiosurgery, M. Darcis, Gert Leurs, +3 authors L. Claesen, Published 1 October 2017, Medicine, 2017 10th International Congress on Image and Signal Processing, BioMedical Engineering and Informatics (CISP-BMEI)

A phantom containing a centrally placed radio-opaque ball bearing is positioned on the treatment couch. The ball bearing (indicated by cross-lines scribed on the phantom surface) is aligned to the point in space described by the intersecting lateral, vertical and inferior room lasers.

For the Winston Lutz test and the phantom ball bearing positioned at the linear accelerator isocentre in space, EPID panel or film images are recorded during linear accelerator X-ray beam exposures at multiple gantry, collimator and couch angles (Figure 2). The test measures the co-incident alignment accuracy between the laser isocentre and the linear accelerator radiation field’s central axes (at all angles and orientations). It’s referred to as a measure of the congruence. The test is also suitable for identifying mechanical issues such as machine mechanical gantry sag or multileaf collimator (MLC) misalignment.


Automated Analysis of EPID Images

The deep learning computer software model needs to be trained to analyse the Winston Lutz images. A separate mask is required to identify which pixels correspond to the ball bearing shown in the radiation field in each training image. From repeated measurements, the model learns this relationship. As with the organ at risk segmentation tasks mentioned earlier, this is a very time-consuming process and has the same subjectivity and variability issues (i.e. which pixel corresponds to the ball bearing and which the MLC field).

Figure 2. Examples of EPID images obtained during a Winston-Lutz test. Both images contain a radio-opaque ball bearing at the centre of an MLC defined radiation field. The left image uses a rectangular MLC field, while the right uses a circular MLC field [7].

Traditionally, these images would be automatically analysed by a Canny edge detection algorithm (or similar) which uses a measurement of contrast change across the image to locate the ball in the MLC X-ray field. The method is generally acceptable for most cases. But there are situations when the EPID image measurements are different in some small way from the baseline images (used to calibrate the code).

For example, a few bright pixels in the image (caused by electronic/imaging noise in the EPID panel) can be sufficient to disturb the precisely calibrated edge detection code sensitive threshold values and yield an inaccurate prediction of the ball bearing or MLC field position.

So, while the Canny edge detection method can decrease the time it takes to produce image and segmentation pairs and is used for training a deep learning model, the inaccuracies of the Canny ball bearing and MLC field masks make it no more accurate than the previously described repetitive edge detection method.


An Alternative Approach – Deep Learning and Synthetic Data

We decided to adopt a synthetic data approach as used in other research areas. Data generation by this method enables us to generate high accuracy training images with equally accurate segmentation masks of the ball bearing target in the MLC X-ray field.

Our initial idea was to design a Monte Carlo Linac model using one of the popular toolkits (such as GATE or TOPAS) to simulate the linear accelerator radiation sources, x-ray interaction with the phantom, ball bearing, and the EPID to produce a synthesised Winston Lutz EPID image.

Monte Carlo simulations are technically challenging and time-consuming for such a project. Even once the model is appropriately commissioned and tested, there’s still a need to:

  • run the simulations for a sufficient time to maintain a statistically high signal-to-noise ratio; and then
  • Write the code to create masks of the ball bearing and MLC field used to train the deep learning model.

It’s a nightmare!


A novel approach to generating synthetic x-ray images

A second option was to synthesise fake (but visually realistic EPID images) containing linear accelerator QC tests. It would not require modelling the Linac in detail. To do this, we developed a method inspired by the movie industry.


The Optical Path Tracing Method

To test, we used an open-source 3D modelling and animation package called Blender [5, 6] to model a simple Linac. But it only had information on the relative distances between the Linac target, MLC bank, isocentre and the EPID panel [7]. The movie industry uses these tools to produce the visual effects of our favourite Disney animated films and to bring the Star Wars movies to life. Some typical examples of renders produced by Blender are shown in Figures 3 (a and b).

Figure 3(a). Blender software uses optical path tracing engines to produce photo-realistic images by simulating many of the optical properties of photons and their interactions with various materials.
Figure 3(b). Computer generated art using Blender. Figures 3(a) and 3(b) images were entered into the 2021 Better Healthcare Technology Foundation ‘ARSENIC’ competition. (https://photographyinmedicalphysics.com/winners-arsenic-2021/)

Simulating EPID images with Blender does not require any specialised or detailed knowledge of the composition or geometric sizes of any of the Linac components. The appearance of the synthesised WL images is the result of some simple visual effects trickery. Software like Blender uses optical path tracing engines to produce photo-realistic images by simulating many of the optical properties of photons and their interactions with various materials. This is analogous to the approach used by standard Monte Carlo radiation transport toolkits like GATE or TOPAS, except for simulating high energy particles.

To build up a virtual “photograph” of the simulated scene, the paths of the optical photons are simulated one by one between the light sources and the camera (actually between the camera and the light source in Blender). The amount of noise in the image is inversely proportional to the number of photons simulated, generally referred to as the number of samples. This is analogous to the number of histories used in traditional Monte Carlo simulations. The path-tracing engine used in Blender [5] (called “Cycles”) is a simplified approximation of actual optical physics processes. It is a non-spectral render engine – meaning it is not capable of simulating properties such as dispersion.

The geometry used to simulate the Winston-Lutz EPID images consists of a “spot-light” style optical light source placed at the approximate location of the x-ray target source in a linear accelerator.

The multi-leaf collimators (MLCs) were then included in the geometry and were represented as a series of rectangular 5 mm width planes. Given that optical photon transport was used to mimic the radiographic properties of the EPID images, the thickness of the MLC leaves was not relevant because the optical photons cannot pass through the simulated leaves of any thickness as x-rays can.

The phantom we used for routine monthly QA and Winston-Lutz test was simulated as a square Perspex phantom containing six ceramic ball bearings (only the central ball bearing is used for the WL test). The dimensions are set to match our real phantom.

Finally, the EPID was simulated by placing a Blender camera object at the same source to imager QA distance. The scene camera was set to orthographic mode to remove simulated perspective from the images.

The model’s geometry was fully customisable from within Blender using a custom python script. This enables properties such as:

  • MLC field shape;
  • light source intensity and size; and
  • phantom offset

to be conveniently modified and randomised for each training image.


Synthesising Randomised Training Data

This simplicity of the synthesised training data approach makes deep learning, for automated machine learning QA, more practical for physicists and researchers not wishing to spend extensive time commissioning the traditional Linac Monte Carlo simulation.

The python script was configured to randomise the simulation parameters such as:

Figure 4. Examples of some synthetic WL EPID fields generated in the current work and the paired binary label images show the WL fields’ different parts. Red is out of the field, Green is in-field, and blue is the ball bearing.
  • MLC leaf end rounding;
  • phantom offset;
  • EPID offset;
  • x-ray intensity;
  • MLC field shape;
  • x-ray source size; and
  • phantom scatter.

for each synthetic image produced by Blender.

A binary image is also produced to indicate which pixel in the synthetic image corresponds to the ball bearing and which pixel corresponds to the MLC field. The synthesis method has a perfect set of matched image pairs for training a deep learning model without the limitations of manually annotating ground-truth data (Figure 4).

The resolution of the training data is fully customisable in Blender and is limited only by the hardware of the computer used to render the synthetic images. Once the Blender scene was configured, the required number of synthetic images and corresponding image masks is entered in the python script and executed. Each image pair requires approximately 2-3 seconds to render and then is automatically exported to separate folders for training the deep learning model.


Deep Winston- Lutz

The deep learning model, which we named deepWL (deep Winston-Lutz), was a typical 3 level U-Model that takes a 120×120 pixel WL image and returns a labelled binary image map with pixels representing either the MLC field or ball bearing. The loss function (objective function) used for training the deep learning model was a weighted combination of a categorical cross-entropy and dice loss function. During training, the loss function provides a metric to ensure the model weights are being optimised in the right direction. The categorical loss maximises the probability that the deep learning model will label a particular pixel in the image correctly as a ball bearing or MLC field. In contrast, the dice loss component tries to produce a segmentation that maintains the correct shape of each contour.

Given the small image size required for this model, the deep learning model could be trained without a specialised GPU and was trained entirely on a typical desktop PC in approximately one day, making this approach feasible for any radiation oncology department.


Results: Testing the model

Figure 5. The predicted probability map from DeepWL indicates which regions of the EPID image it believes belong to the radiation field and which belong to the ball bearing. Yellow indicates a 100% chance of ball bearing. The probability of the pixel belonging to the radiation field decreases sharply at the edge of the ball bearing.


Once the model was trained and validated on the synthetic training data, some real, measured data from an actual EPID was evaluated using the model and compared with a typical edge detection algorithm. The model’s typical prediction shown in Figure 5 is the calculated probability that a pixel in the WL image is part of the MLC field. This figure shows that the model has correctly identified the MLC field region and excluded the area it has predicted belonging to the ball bearing.

The probability map can be converted into contours corresponding to both the MLC field and ball bearing boundary with this information and specifying a confidence threshold. The centre of each contour can then be calculated as the weighted or unweighted geometric mean of each contour, enabling the relative displacement of each to be determined.

Figure 6. The activation maps from various feature sets in the deepWL model. This effectively shows what the deep learning model thinks are essential features in the Winston Lutz images to determine the offset of the ball bearing from the centre of the radiation field. Dark blue indicates not important, and brighter yellow or green colours indicate these features are important.


Figure 6 shows a small subset of activation maps from the deepWL model. These images were generated by deepWL by passing an actual WL image through the deep learning model and seeing what pixels in each feature map are activated by that image. In essence, it shows what the deep learning model “thinks” are essential features of the EPID images relevant to identifying the MLC field and ball bearing contours. The figure above shows bright regions corresponding to each of the four edges of the MLC fields and the ball bearing, mimicking how a human would analyse a WL image manually.

Figure 7. A comparison of Winston Lutz predicted displacements by DeepWL and Canny for three different Varian EPID panels.

Data produced by the model was tested from measurements obtained on three different EPID models – a Varian AS1200, AS1000 and AS500 EPID – each of varying image resolutions. Figure 7 compares ball bearing versus MLC field predicted displacements using deepWL and a traditional Canny edge detection algorithm. As expected, there is a linear relationship between these predictions for all EPID panels.

Figure 8. Predicted ball bearing displacements (mm) using Canny edge detection and DeepWL compared to predictions performed manually by a medical physicist.

However, further analysis revealed that the deepWL predictions were more consistent with manual human annotations than the Canny edge detection approach (Figure 8).

Figure 9. Demonstrates the predictions of the ball bearing and MLC fields using DeepWL and Canny Edge detection. The deep learning approach produces more accurate and robust predictions in extreme/outlier cases.


Figure 9 shows an extreme example where the phantom has been displaced by an unrealistic amount relative to the MLC field and the contours predicted by the DL and edge detection methods. As this figure demonstrates, the DL model can accurately detect the ball bearing and radiation field, while the Canny edge detection approach has difficulty separating the two regions.


Discussion

One exciting result discovered during this project was that the deep learning model produced contours with a shape more consistent with the ball bearing compared with the Canny approach. For example, the deep learning contours of the ball bearing were shown to be statistically more circular than the edge detection approach, which is more consistent with the spherical shape of the actual ball bearing. The circularity of the ball bearing contours was evaluated in terms of the eccentricity of the binary pixel map of the ball bearing.


Conclusions

In summary, the essential findings from this study were:

  • Deep learning models for analysing EPID based linear accelerator QA are more robust to changes in EPID image features such as image quality and test setup variations.
  • The deep learning model produces predictions of the shape of the target objects (in this case, the ball bearing) which are more consistent with the actual shape of the ball bearing. In essence, the knowledge-based approach of deep learning considers both the learned appearance of a ball bearing and the raw pixel values.
  • The relative simplicity of QA EPID images (compared with clinical CT data, for example) makes synthesising training data for deep learning models feasible without requiring extra “real” data.
  • Optical path-tracing for generating synthetic x-ray EPID images is a viable and relatively easy approach for generating synthetic data. It may be a more straightforward option than conventional Monte Carlo radiation transport simulations.

So, while this approach does show signs of being superior to conventional automated methods for the WL test, it is likely not significant in a clinical context.

However, this approach could be utilised for more complex quality assurance tests and provide more robust results. For example, imagine imaging a CatPhan or some sort of scaling phantom for routine quality assurance.

Conventional algorithms to analyse these images require the phantoms to be aligned within a small region so that the algorithm can search pre-defined areas of interest and look for specific features in those regions to perform the analysis. If the phantoms were rotated significantly, the features would no longer be within the bounding boxes, and the analysis will fail.

Now consider a deep learning analysis model trained to detect features for scenarios in which the phantoms are rotated or flipped as well as being nominally aligned. The deep learning model uses knowledge and experience to perform the analysis instead of following a series of instructions. The deep learning approach has the potential to be more robust to set up than conventional analysis techniques.

Although not used in the current work, because of the way the synthetic data was generated, and in addition to giving the masks for the BB and MLC fields, Blender can provide the precise displacement measurements as another output. Potentially, another deep learning model could be trained to directly predict the BB’s displacement from the MLC field without performing secondary analysis on the BB and MLC contours.

Please read our paper published in Physica Medica [7] for more technical details on this workflow.

References

[1] Radiation Dose Calculation in 3D Heterogeneous Media Using Artificial Neural Networks February 2021 Medical Physics 48(5, DOI: 10.1002/mp.14780, James Keal, Alexandre M. Caraça Santos, Scott Penfold, Michael Douglass

[2] Model-based patient pre-selection for intensity-modulated proton therapy (IMPT) using automated treatment planning and machine learning, Jasper Kouwenberg, Joan Penninkhof, Ben Heijmen, March 2021 DOI:https://doi.org/10.1016/j.radonc.2021.02.034

[3] El Naqa, I., Irrer, J., Ritter, T.A., DeMarco, J., Al-Hallaq, H., Booth, J., Kim, G., Alkhatib, A., Popple, R., Perez, M., Farrey, K. and Moran, J.M. (2019), Machine learning for automated quality assurance in radiotherapy: A proof of principle using EPID data description. Med. Phys., 46: 1914-1921. https://doi.org/10.1002/mp.13433

[4] Wei Zhao, Ishan Patil, Bin Han, Yong Yang, Lei Xing, Emil Schüler, 2020, Beam data modelling of linear accelerators (linacs) through machine learning and its potential applications in fast and robust Linac commissioning and quality assurance, Radiotherapy and Oncology, 153, 122-129, https://doi.org/10.1016/j.radonc.2020.09.057.

[5] My contribution to the release of Blender version 3.0 in which I demonstrated some of my medical physics research conducted using Blender. https://youtu.be/rJ48-SYY1sQ

[6] The Blender website. https://www.blender.org/

[7] Michael John James Douglass, James Alan Keal, DeepWL: Robust EPID based Winston-Lutz analysis using deep learning, synthetic image generation and optical path-tracing, Physica Medica, Volume 89,2021, Pages 306-316, ISSN 1120-1797,https://doi.org/10.1016/j.ejmp.2021.08.012

————————————————————————————-


Michael Douglass PhD, 1 May 2022