Skin diseases such as Melanoma and Carcinoma are often quite hard to detect at an early stage and it is even harder to classify them separately. Recently, it is well known that the most dangerous form of skin cancer among the other types of skin cancer is melanoma because it is much more likely to spread to other parts of the body if not diagnosed and treated early. In order to classify these skin diseases, “Support Vector Machine (SVM)” a Machine Learning Algorithm can be used. In this paper, we propose a method to identify whether a given sample is affected with Melanoma or not. The steps involved in this study are collecting labeled data of images that are pre-processed, flattening those images and getting the pixel intensities of images into an array, appending all such arrays into a database, training the SVM with labeled data using a suitable kernel, and using the trained data to classify the samples successfully. The results show that the achieved accuracy of classification is about 90%. (Abstract)
Keywords—Melanoma, array, images, Carcinoma, classify, machine, algorithm (keywords)
A. Background and Motivation
Skin diseases are one of those sets of diseases whose number has been largely increasing day by day. Only in India, about 200 million people suffer from one or the other forms of skin diseases. People often neglect skin diseases and do not take necessary treatment. This is especially seen in rural and economically backward areas due to many factors such as lack of awareness, poverty, and lack of resources, etc. this is even higher when it comes to the case of Melanoma skin cancer. It is reportedly found that about 132,000 melanoma skin cancers occur globally each year . When the people tend to approach a physician, it is quite difficult for the physician in order to exactly detect the type of skin disease the patient is getting affected with. Especially when it comes to the diseases like Melanoma, it is quite hard to differentiate without any tests being conducted. In men, it’s often found on the skin on the head, on the neck, or between the shoulders and the hips while, in women, it’s often found on the skin on the lower legs or between the shoulders and the hips . Besides SVM, another technique can also be used to classify among diseases. That is the classification using “Neural Networks” . However, SVM is a better technique to classify than Neural Networks because they have a strong founding theory. SVMs reach the global optimum due to quadratic programming, they have no issue for choosing a proper number of parameters, Also, SVMs are less prone to overfitting and they need less memory to store the predictive model also yielding results that are more readable.
Many research papers are published on using different algorithms to identify diseases but there is very limited research on using one particular method to classify two or more different diseases. Here in this study, we proposed an efficient technique in which the database of pre-processed images are trained and tested, and are classified using SVM, a machine learning-based algorithm to identify whether the skin lesion is benign or malignant. This will be very helpful in the diagnosis of the Melanoma skin cancer efficiently.
C. Related Work
In order to detect Melanoma various research works have been done in the fields of image processing. Some have used MATLAB to analyze and investigate the best formats to carry out the analysis. Pre-processing techniques such as Hair removal, centering the image, shading effect, vignette, and black-border cropping . For segmentation, techniques such as Otsu’s Thresholding , color space transformation, Watershed algorithm, and c-means algorithm were used . Works such as introducing the Image-based screening techniques to differentiate similar diseases and Multi-SVM classifiers were done in this field .
Figure 1: The proposed Block Diagram
A. Pre-enhanced images
The statistical data is taken from an image database  that is verified to be prone to a particular disease. This database of images are pre-enhanced i.e.; they have undergone techniques such as hair removal, centering of the image, and softening.
Figure 2: Pre-enhanced images taken from verified website
B. Image flattening and image to array
The pre-enhanced images are converted into the RGB format and all the pixel values of the image in RGB format are converted into a 1-Dimensional array. The recorded pixel intensities are scaled in between (0, 1). Here, we took a 64*64-pixel format. So around 12288 values for each image have been recorded.
C. Database of image Intensities
All the 1-Dimensional arrays are arranged into the form of a database by appending all the image pixel intensities of all the images. By doing this, a database is created with all the pixel intensities of all the images.
Figure 3: Database of pixel intensities of all the images
As it is observed in figure 3, all the pixel intensities are taken and are formed into a database which is further then used to train the SVM as it is given as the input data.
D. Training the SVM
A Support Vector Machine is nothing but a machine learning algorithm which can classify among two or more classes , , . The classification happens because of different kernels which are used as hyperplanes to differentiate among the classes . The accuracy and precision of the SVM mainly depend on the Kernel used and the boundary values defined . So, a suitable Kernel has to be taken in order to achieve better results. In this study, to differentiate among Melanoma and Non-Melanoma, a “Linear kernel” is used. Linear kernels are used when the classes which are to be separated don’t have many features in common . The linear kernel is also one of the simplest of all the kernels available. When we want to classify two classes which are having more features in common, then other kernels such as Polynomial Kernels are to be used to achieve better accuracy and precision . Gamma kernel is used to define the boundary values of the SVM . The linear kernel’s equation is as follows:
This equation involves calculating the inner products of a new input vector (x) with all support vectors in training data . The coefficients ‘y’ is the distance from the hyperplane to the feature and ‘c’ is an optional constant . All the values, which were added into a database as mentioned, are given as input to the SVM and is trained to differentiate the classes as it is labeled data.
E. SVM classification
When unknown data is given to the SVM, it classifies the sample based on the training samples , . Hence, SVM classifies the image whether it belongs to Melanoma or Non- Melanoma. As the “enum” function is used, the algorithm identifies the matched Melanoma samples as 0 and unmatched samples as 1.
When working with SVM, The result depends on how well the SVM is getting trained. So, more the number of images the SVM is trained with, the higher is the accuracy. We tried training the SVM with different sets of images as input and we achieved the accuracy as follows:
Table 1: Table showing number of images, precision achieved, and time taken for the code to get compiled
As we can observe, with the increase in the number of images to be trained, the precision is increased also with an increase in the compilation time. In python, about 600 images data were approximately compiled in 9 min whereas in MATLAB, it took about 2 hours. This is the main reason for continuing this study in python.
- Image no.
- Predicted result
Table 2: Classification of Melanoma and Non-Melanoma
Table 2 shows the classification done by SVM as it identifies the samples belonging to Melanoma and Non-Melanoma. Here ‘0’ represents Melanoma and ‘1’ represents Non-Melanoma.
Figure 4: Image histograms of matched results
In figure 4, the image histograms of matched results are taken. It can be clearly observed that the RGB planes of both the images are very similar to each other thus proving that the images are malignant prone images.
Figure 5: Image histograms of unmatched results
IV. Conclusion and Future work
In this study, we presented that SVM can be effectively used to classify among the samples containing Melanoma and Non-Melanoma. It is observed that better results and precision can be achieved when the SVM is trained with more number of images.
The future work can be developing a product which can differentiate among two different types of cancers such as Melanoma and Carcinoma. Higher accuracy can be attained when other kernels such as Polynomial Kernels are used.
- Skin cancers, World Health Organization, Available: [online] http://www.who.int/uv/faq/skincancer/en/index1.html Accessed December 2018.
- Landis, Sarah H., et al. “Cancer Statistics, 1999.” CA: A Cancer Journal for Clinicians 49.1(1999):8-31
- Codella, Noel, et al. ‘Deep learning, sparse coding, and SVM for melanoma recognition in dermoscopy images.’ International workshop on machine learning in medical imaging. Springer, Cham, 2015.
- Mustafa, Suleiman, Ali Baba Dauda, and Mohammed Dauda. ‘Image processing and SVM classification for melanoma detection.’ 2017 International Conference on Computing Networking and Informatics (ICCNI). IEEE, 2017.
- Vala, Hetal J., and Astha Baxi. ‘A review on Otsu image segmentation algorithm.’ International Journal of Advanced Research in Computer Engineering & Technology (IJARCET)2.2 (2013): 387-389.
- Manerkar, Mugdha S., et al. ‘Automated skin disease segmentation and classification using multi-class SVM classifier.’ (2016).
- Manerkar, Mugdha S., et al. ‘CLASSIFICATION OF SKIN DISEASE USING MULTI SVM CLASSIFIER.’ 3rd International Conference on Electrical, Electronics, Engineering Trends, Communication, Optimization, and Sciences—2016.
- Image dataset: https://www.kaggle.com/drscarlat/melanoma
- Bono, Aldo, et al. “The ABCD system of melanoma detection.” Cancer 85.1 (1999): 72-77.
- Alquran, Hiam, et al. ‘The melanoma skin cancer detection and classification using support vector machine.’ 2017 IEEE Jordan Conference on Applied Electrical Engineering and Computing Technologies (AEECT). IEEE, 2017.
- Rosipal, Roman, Leonard J. Trejo, and Bryan Matthews. ‘Kernel PLS-SVC for linear and nonlinear classification.’ Proceedings of the 20th International Conference on Machine Learning (ICML-03). 2003.
- Smits, Guido F., and Elizabeth M. Jordaan. ‘Improved SVM regression using mixtures of kernels.’ Proceedings of the 2002 International Joint Conference on Neural Networks. IJCNN’02 (Cat. No. 02CH37290). Vol. 3. IEEE, 2002.
- Cesar Souza (2010, March 17) “http://crsouza.com/2010/03/17/kernel-functions-for-machine-learning-applications
- Fatima, Ruksar, Mohammed Zafar Ali Khan, and K. P. Dhruve. ‘Computer-aided multi-parameter extraction system to aid early detection of skin cancer melanoma.’ International Journal of Computer Science and Network Security 12.10 (2012): 74-86.
- Lv, Xiao. ‘A novel defect inspection approach using image processing and support vector machines in bolts.’ 2015 Seventh International Conference on Measuring Technology and Mechatronics Automation. IEEE, 2015.
- Abe, Shigeo. Support vector machines for pattern classification. Vol. 2. London: Springer, 2005.
- Dreiseitl, Stephan, et al. ‘A comparison of machine learning methods for the diagnosis of pigmented skin lesions.’ Journal of biomedical informatics 34.1 (2001): 28-36.