Melanoma is the deadliest form of skin lesion which is a severe disease globally. Early detection of melanoma using medical images significantly increases the survival rate. However, the accurate recognition of melanoma is extremely challenging task. Since the joint use of image enhancement techniques and deep convolutional neural network (DCNN) has demonstrated proven success, the joint techniques could have discriminatory power on skin lesion diagnosis as well. To this hypothesis, we propose the aggregation algorithm for skin lesion diagnosis that utilizes a DCNN to extract the local features and classify medical images for melanoma disease. All experiments are performed using the data provided in International Skin Imaging Collaboration (ISIC) 2018 Skin Lesion Analysis towards Melanoma Detection. Experimental results show that our algorithm achieves excellent classification results for melanoma diagnosis.
Keywords — Skin Cancer, Deep learning, Convolutional Neural Networks, Deep learning, Melanoma Classification
Yet if we mindfully practice protection against sun whole summer, it’s essential to maintain being observant about our skin in other seasons. During the year, we should consider our body skin from head to toe at least once a month, looking for any unusual lesions. To check if there is some extraordinary spot, we need to know what we’re looking for. As a general rule, to spot either melanomas or non-melanoma skin cancers, such as basal cell carcinoma and squamous cell carcinoma, we should notice any moles or growths, and any current growths that start to develop or change noticeably in any other way. The possible alarming signs are an itch or bleed.
Since melanoma is a very harmful skin disease, it is very crucial to catch melanoma. Melanoma is a type of cancer that most rapidly increases over the world. Example images for melanoma disease are provided by Figure 1. In fact, melanoma raises from the pigment cells in the human skin . According to the annual report of the American Cancer Society (ACS) in the United States, 99,550 cases were diagnosed as new cases of melanoma, and the estimated deaths from this disease included up to 13,460 cases in 2018 . Early detection and treatment of skin cancer may increase the survival rate of humans in the worldwide. Dermoscopy technology is developed to improve the diagnostic performance of skin cancer or melanoma. This technique is widely utilized to achieve a higher recognition accuracy rate of melanoma . Unfortunately, the visual inspection and recognition by dermatologists is usually challenge due to the huge similarity among the different skin lesion types and it takes a long time, especially with a huge number of patients . Recently, some studies have been proposed during the International Skin Imaging Collaboration (ISIC) Challenge to analyze the dermoscopy images seven classes which are Melanoma, Melanocytic Nevus, Basal Cell Carcinoma, Actinic Keratosis, Benign Keratosis Dermatofibroma, and Vascular Lesion.
In recent years, deep learning techniques have been widely acknowledged as the most powerful tool for image classification, since various DCNNs, such as VGG-Net  and Res-Net , have won the Image-Net Challenge in recent years. However, it also has been widely criticized that DCNNs may suffer from over-fitting when the training dataset is not large enough . Although a pre-trained DCNN model can transfer the image representation ability learned from large-scale datasets, such as Image-Net, to the generic visual recognition tasks, the rigid architectures of DCNNs limit the ability in dealing with images where objects have a large variation in shape, size, and clutter.
II. Related Work
This section provides a review of previous research on DCNN for pattern recognition, in particular medical image analysis.
Since the performance of the models which were built using deep learning accomplished promising results on various tasks, we believe that the CNN-based approach may tackle the issue effectively. Despite the fact that real power of CNNs have been discovered recently, applications of CNNs in medical image analysis can be traced to the 1990s, when they were used for computer-aided detection of microcalcifications in digital mammography . With revival of CNN’s owing to the development of powerful GPU computing, the medical imaging literature has witnessed a new generation of computer-aided detection systems that show superior performance in many tasks including computer-aided detection of lymph nodes in CT images, automatic polyp detection in colonoscopy videos, and automatic detection of mitotic cells in histopathology images. Research and developments of CNNs in medical image analysis are not limited to only disease detection systems, however, CNN’s have recently been used for skin lesion classification problems, pancreas segmentation in CT images, multimodality isointense infant brain image segmentation, and neuronal membrane segmentation in electron microscopy images.
In this paper, we propose several image enhancement algorithms joined with DCNN for Skin Lesion Classification. Recently proposed, well-known architectures, such as ResNet50 v1 , Inception v3 , Xception , DenseNet 201 , and Inception-ResNet v2 , achieved fairly high performances on classification task. Thus, in order to evaluate the performance of our architecture, we will be comparing our work with respect to these models.
III. Proposed Approach
A. Image Enhancement
The training images for the task of classifying melanoma, which consists of 10015 images, is downloaded from ISIC 2018: “Skin Lesion Analysis towards Melanoma Detection” grand challenge datasets [8-9]. An example image for each disease category is demonstrated in Figure 2.
Fig. 2. Example images from Skin Lesion Image dataset.
There are seven possible disease categories:
- Melanocytic Nevus
- Basal Cell Carcinoma
- Actinic keratosis / Bowen’s disease (intraepithelial carcinoma)
- Benign keratosis (solar lentigo / seborrheic keratosis / lichen planus-like keratosis)
- Vascular Lesion.
However, we classify images containing melanoma versus non-melanoma groups. Due to ISIC Challenge, datasets comprised of dermoscopic skin lesions taken by different dermoscope and camera devices all over the world, it is important to perform pre-processing for color normalization and illumination.
Despite having a deep and robust CNN algorithm, high recognition cannot be achieved unless a huge number of training images are provided. This is because of the over-fitting problem, where a network trained with a small number of images cannot generalize well on new, unseen test data. To solve this problem, we increased the number of training images using some data augmentation techniques. We performed image enhancement for every input image by preprocessing the dataset with a color constancy algorithm, augment brightness, saturation, and contrasts by a random factor in the range [0.9, 1.1], and the pre-processed results are as shown in Figure 3.
Fig. 3. Results of applying enhancement techniques to images found in the dataset.
B. DCNNs for image classification.
We start the second step by training the multi-class classification problem on five DCNN models: ResNet50 v1, Inception v3, Xception, DenseNet 201, and Inception-ResNet v2. A global average pooling layer is added after the base network and followed by three fully connected layers including the output layer. We use softmax as the loss function. Each model is trained for 300 epochs. The loss and validation loss are converged after around 240 epochs. As we discovered during the image enhancement process, the classes of the dataset are extremely imbalanced. We use class weights to balance the dataset. The higher weights will be given to the samples in the class with small size and the lower weights will be given to the samples in the class with large size.
C. Novel CNN Model
On top of that, we have created a simple CNN structure for the classification of lesion images using some of the ideas fundamental to LeNet, which was applied to recognize hand-written characters. Considering the fact that we need a deeper network for this task we add more layers and the
Fig. 4. The architecture of the deep CNN for skin lesion disease classification. The network consists of nine layers: four convolutional layers (a parallelogram with solid lines), two max-pooling layers (a parallelogram with dotted lines), and three fully connected layers at the end. Blue numbers under parallelogram denote the number of filters, black numbers denote the widths and heights of the feature maps and red numbers denote the number of neurons in the fully-connected layers.
final version of our network consists of 9 layers, as shown in Figure 4. The network has four convolutional and two pooling layers for feature extraction, and three fully-connected layers, in the end, for classification.
The main modification we make to the LeNet approach in our CNN model is using the adaptive piecewise linear activation (APL) function , instead of using traditional activation functions in convolutional layers. The proposed architecture improves the performance of the network, despite having some additional features which increase the training time (the detailed experimental results are shown in Section IV). Average pooling or max pooling is most commonly used in deep learning models. The former uses the average activation value over a pooling region, whereas the latter selects the maximum activation value. The max pooling technique is used in our network. We apply the same method for all three pooling layers over a [image: image1.png] pixel window with stride 2.
D. Adaptive Piecewise Linear Units
The activation function [image: image2.png] of an APL unit [image: image3.png] is a sum of hinge-shaped functions,
The result of Eq. (1) is piecewise linear activation function which computes the output of each neuron in feature extraction layers. Here, S denotes the number of hinges, and it is set as a hyperparameter. Variables and for are learned using gradient descent during training. The variables control the slopes of the linear segment, while the variables determine the location of images .
Using APL units requires the additional number of parameters to be learned; this is computed by 2SM, where M denotes the total number of hidden units in the network. As the number of hidden layers (M) in our network is predefined, the proper value for S which defines the complexity of the activation function is determined during the validation process and it is explained in Section IV
Our training procedure was identical for each model. First, we removed the final fully connected layer of each pre-trained network and replaced it with a randomly initialized (Glorot Uniform) matrix. We penalized the fully connected layer with both an L2 and an L1 regularizer, each weighted at 0.01. We then optimized these regularization terms along with a categorical softmax cross entropy term weighted at 1.0 with the Adam optimizer. All weights were optimized at all steps of training. Our initial learning rate was 10−4 and when the validation accuracy did not improve for 10 steps, we scaled the learning rate by a factor of 0.1. The remaining parameters of the Adam algorithm were left at their default values.
Each of the models we optimized took input tensors with spatial sizes of either 299×299 (Inception, Xception, and Inception-ResNet) or 224 (Resnet and DenseNet). For evaluation and prediction, we first scaled the images to either 329 × 329 or 254 × 254, cropped the central region of the appropriate size, and fed it to each model’s respective preprocessing function. For training, we first scaled each image to normally distributed random dimensions centered at 30 more than the desired dimensions with a standard deviation of 15. We then randomly rotated the image using bilinear interpolation, randomly cropped a region, flipped vertically and or horizontally, and randomly adjusted the brightness and contrast with a max change of 30% in both cases.
F. Network Training
The network is trained using stochastic gradient descent (SGD) and all training-relevant hyper-parameters are set as given in Table 1
TABLE I. Hyperparameters for Training
- Initial learning rate
- Learning rate decay coefficient
We initialize the weights using Xavier initialization .
IV. Experiments and Results
All experiments are performed using the Keras  software package on the Ubuntu 16.04 operating system, running on a PC with Intel(R) Core(TM) i7-7700HQ CPU 2.80 GHz with an Nvidia GTX 1050 Ti GPU.
A. Experiments with Proposed Method
To demonstrate the advantages of using APL units over traditional ReLUs and tangent functions, we compare the results of our deep CNN network model with APL units in hidden layers versus ReLU functions and tangent functions. We only change the activation functions of our model to ReLU and tangent functions; other hyperparameters remain unchanged. We train all networks using the training dataset which consists of 10015 skin images and evaluate using a test set which has 1500 images. Evaluation results, given by Table 2, show that our network performs better with APL units in hidden layers than with ReLUs and tangent functions.
TABLE II. Evaluation of performance of our deep CNN with different activation functions.
- Activation functions
- Test accuracy
- With data augmentation
- Without data Augmentation
- APL units
Moreover, our network with APL units reaches the highest training accuracy around 98 % after about 65 epochs, and then it remains stable. The training accuracy of the same network with ReLUs and tangent functions is not higher than 96 %, and they become stable after about 150 and 200 epochs respectively. This means that our network consumes fewer computing resources and works faster to learn features of fire and smoke when it has APL units in hidden layers than when it has ReLUs and tangent functions. The detailed training process of our network with different activation functions is illustrated in Figure 5. And confusion matrix for our CNN method is presented in Figure 6.
Fig. 5. The training curves of the deep CNN with different activation functions.
B. Experiments and results with resemble
At the time of writing our performance on the challenge hold-out set is unknown, but we report here our results on a held-out 10% of the provided training set. We evaluated our models after each 200 training steps using 20 random batches of class-balanced validation data. The top overall validation accuracy reached by each of the models during training is shown in Table 3.
C. Comparisons with other methods.
In order to show the efficiency, we compared our approach with other existed methods. We conducted experiments with four best-performed algorithms in the ISIC challenge , . Experimental results are displayed in Table 4.
TABLE III. Validation accuracy for each model during training
- Accuracy (%)
- Our Model
Figure 6. Confusion matrix of our model. Visualization of error rates across skin lesion types.
TABLE IV. Comparisons with other methods
- Image size
- Accuracy (%)
- Jinyi Zou
- Shlomo Kashani
- M. K. Amro
- Our method
In this work, we show how it is possible to obtain a high classification accuracy through training medical images using multiple pre-trained models and a newly developed novel CNN model for 7-class skin lesion classification. Firstly, we performed several data enhancement techniques to improve the quality of the images. Then, we trained all images using different pre-trained models for classification and using our proposed network. The experiments show that with a carefully designed data augmentation scheme and transfer learning algorithm, the prevalent deep models pre-trained on natural images can be successfully trained for skin lesion diagnosis applications as well as training the CNN model from the scratch. In our experiments, we demonstrated that the latter idea outperformed transfer learning approaches.