Classification#

Classification is the process of grouping images into predetermined categories. Classification can be used to discriminate images based on user preference, e.g. different cell types, phenotypes or tissue types etc.

Labelling#

To train a neural network for classification it is necessary to create images that are labelled accordingly.

../_images/classif_tools.png — Object detection tools#

Different tools and functions in MIA exist to label images for classification.

To label an image select a tool and draw inside the image to label image objects.

Note

There is no save option as all changes during labelling are saved immediately. Use or ctrl + z to undo the last labelling action.

All labels are saved in a subfolder inside the currently active folder and have the same file name as the currently selected image, but with .npz as file extension.

Tools#

As classification only needs a single label for each image, the tools are simpler compared to other applications and you do not have to explicitly click inside the image.

By pressing or F1 currently selected class is assigned to the image.

By pressing or F2 currently selected class is assigned to the image and the next image is selected.

Tip

The classification label is shown as a border around the image with the color of the corresponding class

Training#

For details about neural network training see Training.

Neural Network architectures#

For classification there is no extra architecture implemented, meaning that the architecture is identical to the backbone.

Tip

All architectures are implemented without fully connected layers, even if the original architecture had fully connected layers (like vgg-nets).

The following backbones are currently supported, untrained or with pretrained weights pre-trained on imagenet dataset [2]:

Model	Options	Ref.
DenseNet	densenet121, densenet169, densenet201	[3]
EfficientNet	efficientnetb0, efficientnetb1, efficientnetb2, efficientnetb3, efficientnetb4, efficientnetb5, efficientnetb6, efficientnetb7	[4]
Inception	inceptionv3, inceptionresnetv2	[5]
MobileNet	mobilenet, mobilenetv2	[6]
NasNet	nasnetlarge, nasnetmobile	[7]
ResNet	resnet18, resnet34, resnet50, resnet101, resnet152, resnet50v2, resnet101v2, resnet152v2	[8]
ResNeXt	resnext50, resnext101	[9]
SE-ResNet	seresnet18, seresnet34, seresnet50, seresnet101, seresnet152	[10]
SE-ResNeXt	seresnext50, seresnext101	[10]
SENet	senet154	[10]
VGG	vgg16, vgg19	[11]
Xception	xception	[12]

Tip

Generally the numbers behind the backbone architecture gives either the number of convolutional layers (e.g. resnet18) or the model version (e.g. inceptionv3).
When you have limited computing recources use a small network architecture or a network optimized for efficiency (e.g. mobilenetv2).
From the supported network-backbones the nasnetlarge shows the highest performance on imagenet classification.
From the supported network-backbones the mobilenet has the fastest processing time and fewest parameters.

Losses and Metrics#

For classification several objective function have been tested for neural network optimization and directly impact the model training. Metrics are used to measure the performance of the trained model, but are independent of the optimization and the training process. The loss and metric functions can be set in Train Model → Settings.

Cross Entropy#

The cross entropy loss is a widely used objective function used for classification. It is defined as:

\[L_{CE} = -\sum_{i=1}^{n}{p_i log(q_i)},\]

with \(p_i\) the true label and \(q_i\) the model prediction for the \(i_{th}\) class.

Focal Loss#

The focal loss is an extension of the cross entropy, which improves performance for unbalanced datasets [13]. It is defined as follows:

\[L_{FL} = -\sum_{i=1}^{n}{(1-q_i)^\gamma p_i log(q_i)},\]

with \(\gamma\) as the focussing parameter. Default is set \(\gamma = 2\).

Kullback-Leibler Divergence#

The Kullback-Leibler Divergence, sometimes referred as relative entropy, is defined as follows:

\[L_{KL} = -\sum_{i=1}^{n}{p_i (log(p_i)-log(q_i))}.\]

Accuracy#

The pixel accuracy measures all images that are classified correctly:

\[L_{acc} = \frac{t_p + t_n}{t_p + t_n + f_p + f_n},\]

with \(t_p\) the true positives (\(p_i=1\) and \(q_i=1\)), \(t_n\) the true negatives (\(p_i=0\) and \(q_i=0\)), \(f_p\) the false positives (\(p_i=0\) and \(q_i=1\)) and \(f_n\) the false negatives (\(p_i=1\) and \(q_i=0\)). The accuracy is a misleading measure for imbalanced data.

Precision#

The precision measures how many of all predicted positives, actually are positve:

\[L_{acc} = \frac{t_p}{t_p + f_p}.\]

Recall#

The recall measures how many of the actual positives, are predicted as positve:

\[L_{acc} = \frac{t_p}{t_p + f_n}.\]

F1-Score#

The F1-score is a balanced measure of precision and recall

\[L_{acc} = 2 \frac{precision * recall}{precision + recall}.\]