Digitizing Diagnoses With Histopathological Slides

Published by Daniel Lichtblau

Wolfram Research, Champaign, Illinois, United States of America

These findings are described in the article entitled Cancer diagnosis through a tandem of classifiers for digitized histopathological slides, recently published in the journal PLoS One (2019). This work was conducted by Daniel Lichtblau from Wolfram Research and Catalin Stoean from the University of Craiova.

Tissue slides are frequently used to make medical diagnoses. One example involves H&E stained slides for assessing presence and grade/severity of cancer. Once the slides are available, they are typically evaluated by trained pathologists.

While this usually leads to appropriate diagnoses, there are several potential issues. One is that different pathologists might (and sometimes do) grade the same slide differently. Another is that the same pathologist might, on different occasions, grade the same slide differently (lighting, fatigue, etc. all play a role here). Yet another is that as diagnostic technology becomes less expensive and more widely used, there may be locales for which samples can be prepared but there are not sufficiently many trained pathologists to process them. For all these reasons, automated diagnosis software is viewed as a way to ameliorate the workload and also to get second (or third) opinions, in a way that is unbiased (or, more correctly, tends to have different biases from those of human pathologists).

Prior literature has made good use of image processing and machine learning (ML) methods for the purpose of automating diagnoses. A typical algorithm workflow involves the following steps:

  1. Image segmentation and related methods for obtaining various “measures” in given images.
  2. Feeding many such measures into ML classifiers, with a view toward determining which features are “important” as predictors of the actual diagnosis.
  3. Further training of classifiers using the determined features.

Such methods tend to require considerable time, computational resources, and a good idea in advance of what set of image features might be useful.

Our approach is more direct. We let the ML classifiers determine relevant features, and only provide training data in the form of slides and corresponding diagnoses (as determined by more than one pathologist, under careful conditions). In addition, we employ an unrelated method that grades an image by its proximity to “nearby” images of known grade (based on prior published work involving tandem usage of Fourier and Principal Components methods, by the first author).

The benefit is that this method tends to correlate more loosely with ML classifiers than they correlate with one another — loosely speaking, it makes its mistakes in different places, and thus serves to offset incorrect grading from the standard ML approaches. Further along these lines, we then use a validation method to create an ensemble weighting using the multiple classifiers. We also provide a confidence measure. That is to say, using thresholds for the overall probabilities we can assess the reliability of a given diagnosis. For the main data set in this study, it shows that roughly 70% of the diagnoses are quite trustworthy (and correct), with the
most if not all errors occurring in the rest.

The main novelty in this work is the methodology for creating an ensemble score. Our approach is shown to be competitive with more strenuous processing methods by assessing three benchmark data sets. A second aspect, of independent interest, is that the benchmark tests cover two different cancer types (colorectal and breast), thus giving some confidence that the methodology might be extensible.

There are some future directions under consideration. We first note that H&E tissue images can pose certain difficulties for machine learning methods. One is that results should be independent of slide orientation. Another is that different levels of coloration might be due to different lab set-ups or lighting differences in creating electronic images from actual slides. A third is that tissue inhomogeneities sometimes arise from boundaries with unrelated tissue rather than benign/malignancy borders. Possible future experiments involve color deconvolution (to offset the effect of inter-lab differences), and use of images averaged over rotations to minimize the impact of both orientation and tissue inhomogeneities. Also, we might extend to a different cancer type, such as leukemia, for which there exists a large benchmark set of stained slide images.

About The Author

Daniel Lichtblau

Daniel Lichtblau is aĀ Mathematica developer at theĀ Wolfram Research Ā· Kernel Group.


Speak Your Mind!


Developing A Low-cost Diagnostic For HIV Drug Resistance Mutations

Although there is still no cure for HIV, HIV-positive patients can live a near normal life expectancy with modern therapeutic regimens. Thanks to affordable diagnostic tests and highly effective antiretroviral therapy, the annual number of deaths due to HIV has been decreasing since 2006. However, as people live longer with HIV, another problem has gained […]

Space Words Of 2018

Space words help you to learn more about our solar system and everything in it. From Mars to International Space Station, space words bring a meaning to everything beyond Earth. UnderstandingĀ the concept of space and the universe might be a little difficult for most of us. Since the dawn of recorded human history, people have […]

Classifying Sesame Oil Seed Varieties And Origins Using Mass Spectrometry

The quality and authenticity of vegetable oils are of importance not only for their nutritional value but also for their miscellaneous biomedical and industrial applications. Hence, the knowledge of their main constituents, triacylglycerols (TAGs), is mandatory for classification and adulteration detection. However, the analysis of TAG composition of vegetable oils is a challenging task because […]

Strategies To Control Solar Cell Performance

Energy, as the driving force for human development, economic growth, and urban modernization is the eternal pursuit of humanity. As the basic energy source in renewable energy, solar energy has the most abundant reserves. The annual radiant energy of solar radiation on the ground is as high as 1.05*1018 kWh. Since the birth of photovoltaic […]

Analyzing How Tea And Mint Infusion Effects Element Content In Beverages

Popular medicinal plants and their infusions have been the subject of significant scientific interest due to their therapeutic value for the prevention and treatment of diseases and health disorders. However, chemical compositions of medicinal plant species are complex. In addition to being sources of organic compounds, such as polyphenols, flavonoids, proteins, vitamins carbohydrates among others, […]

Nucleosides Vs Nucleotides

The difference between nucleosides vs. nucleotides involves the presence or absence of a phosphate group. A nucleoside consists of a nucleobase and a sugar (ribose or deoxyribose) whereas a nucleotide contains a nucleobase, a sugar, and one or several phosphate groups. Hence, the main difference is nucleotides have phosphate groups and nucleosides do not. Youā€™ve […]

The Nitrogen Ice Activity On Pluto Explored With Numerical Modeling Ā 

Numerical modeling of the nitrogen cycle on Pluto explains the distribution, color, geology, and morphology of the different nitrogen ice deposits observed. On July 14, 2015, our vision of Pluto changed as the NASA New Horizons spacecraft flew by Pluto and revealed an active frozen world, with unprecedented landscapes in the Solar System. On the […]