Objective of the challenge
The objective of CONIC grand challenge was to develop algorithms that perform segmentation, classification and counting of 6 different types of nuclei within the current largest known publicly available nuclei-level dataset in computational pathology (CPath), containing around half a million labelled nuclei.
This challenge consists of two separate tasks:
Task 1: Nuclear segmentation and classification:
The first task requires to segment nuclei within the tissue, while also classifying each nucleus into one of the following categories: epithelial, lymphocyte, plasma, eosinophil, neutrophil or connective tissue.
Task 2: Prediction of cellular composition:
For the second task, predict how many nuclei of each class are present in each input image.
The output of Task 1 can be directly used to perform Task 2, but these can be treated as independent tasks.
Introduction
Nuclear segmentation, classification, and quantification within H & E-stained histology images enable the extraction of interpretable cell-based features that can be used in downstream models in computational pathology (CPath). To help drive forward research and innovation for automatic nuclei recognition in CPath, The Colon Nuclei Identification and Counting (CoNIC) Challenge requires researchers to develop algorithms that perform segmentation, classification and counting of 6 different types of nuclei within the current largest known publicly available nuclei-level dataset in CPath, containing around half a million labelled nuclei.
Methods:
Datasets used
Lizard dataset was used for this challenge, which is currently the largest known publicly available dataset for instance segmentation in Computational Pathology. The dataset consists of Hematoxylin and Eosin-stained histology images at 20x objective magnification (~0.5 microns/pixel) from 6 different data sources.
For each image, an instance segmentation and a classification mask are provided. Within the dataset, each nucleus is assigned to one of the following categories:
- Epithelial
- Lymphocyte
- Plasma
- Eosinophil
- Neutrophil
- Connective tissue
Training set and methods
Lizard Datasets have 238 images of varying size, from where we extracted patches further.
- 204/238 images are used as the training images and 34/238 images are used as validation images.
- We have extracted 244*244 size patches with the overlap of random size between 150 to 200 for each image.
- We introduced mirror padding for edges to resize the patch images to 256*256 size.
- We have used Reinhard color Normalization at the patch level to state color variation challenge.
- We have also added H branch where H refers to Hematoxylin component.
In this case study, we aim to use the above unique characteristic of H&E stain as Hematoxylin-aware guidance for the segmentation network. To achieve this, we apply a color decomposition technique (Ruifrok et al., 2001) to decompose the Hematoxylin Component from the original RGB image. This approach is commonly utilized as a color normalization pre-processing in traditional methods due to its robustness of color inconsistency in the H&E stained WSI.
We assume that the grey level in each RGB channel is linear with light transmission rate T = I/I0, where I0 is the incident light, and I is the transmitted light. So, each specific stain will be characterized by a specific absorption factor c for the light in each of the three RGB channels. Then we can model the relationship between the among of stain and its absorption using Beer-Lambert’s Law (Parson, 2007).
Data augmentation techniques used are affine transformation, rotation, flipping and gaussian Blur. Some background less images are also introduced using the instance map label which contains only nuclei of different classes.
We used StepLR Scheduler to optimize the learning rate. Adam was used as an optimizer. Training was done in single split. Model is trained till 70 epochs after which we observed a constant loss rate.
