Categorizing Breast And Prostate Cancer By Selecting Features Subsets Based On Support Vector Machine-Recursive Features Elimination

Topics:	Breast Cancer Prostate Cancer
Words:	1370
Pages:	3 This essay sample was donated by a student to help the academic community. Papers provided by EduBirdie writers usually outdo students' samples.

ABSTRACT

Rise in deaths due to prostate and breast cancer are expected to continue in future. These diseases are the most common types of cancer for men and women across the globe. Machine Learning can be used to drop the number of deaths by these diseases with early detection. One of them is the classification of data of prostate cancer and breast cancer. The Cancer data which has been used has a variety of features, but not all features are essential features. In this study, we use Support Vector Machine-Recursive Feature Elimination(SVM-RFE) as a feature selection method. In this method, it will get a ranked features list. The use of this method in the classification of prostate cancer and breast cancer data results in a high level of evaluation. This method can produce an accuracy rate of 96.50%, the precision of 96.56%, and recall of 96.50%.

Introduction

Cancer is a disease caused by abnormal cell growth. These cells exist because of the changes in gene expression, then they will be developed into a population of cell that can attack specific tissues[1]. This is very dangerous because it can cause death. Based on the Global Cancer (GLOBOCAN) statistics[2] part of the International Agency of Research on Cancer (IARC) in 2018, in the 18.1 million cases of cancer, the second most common cases experienced by men are prostate cancer cases, while the most common cancer cases experienced by women are breast cancer cases. Until now, there has not been found a way to treat cancer efficiently.

In prostate cancer, there is an uncontrolled growth of cancer cells formed in prostate tissue. It is the most common cancer in men, and the case will continue t increase in many countries. In breast cancer, there is an uncontrolled growth of cancer cells formed in breast tissue. The growth of cancer cells form lumps that can spread to other tissues within the body, which is also known as malignant tumor. Cancer data has many features that possess information about the cancer itself. However, not all features are relevant features. The benefit of feature selection in machine learning is reducing the amount of data needed to reach the learning stage, increasing the predictive accuracy, more easy-to-understand data, and reducing execution time.

In the field of health, many methods have been carried out to diagnose breast cancer and prostate cancer. But in this study, we used computational techniques by applying machine learning. The method that is proposed is Supporting Vector Machine-Recursive Features Elimination (SVM-RFE). It is expected that feature selection methods and classification methods would give significant contribution to the health sector, especially in diagnosing prostate cancer and breast cancer. Previous studies on the classification of prostate and breast cancer have been carried out with various methods such as Convolutional Neural Network, Logistic Regression and Decision Tree.

Methods

Support Vector Machines

The basic methodology of the SVM method is to form an optimal plane or hyperplane that separates data into each class. The optimal hyperplane is a field that separates data into its class and is located perpendicular to the closest pattern where patterns are dots that describe a dataset[3]-[4][5]. Suppose there is a dataset D, xi , yi where i = 1, ..., D, the set of training data in the dataset D that has two classes consist of N input vectors x1, ...,xn and yi with yi being the class label from the dataset (malignant cancer or benign cancer).

Support Vector Machines-Recursive Feature Elimination

It is a combination of Support Vector Machines and RFE. RFE is a method that works by selecting features recursively based on the smallest feature value. SVM-RFE works by removing irrelevant features in each iteration, namely the lowest weight feature. We can exclude more than one feature in each iteration for speed reasons.

Save your time!
We can take care of your essay

Proper editing and formatting
Free revision, title page, and bibliography
Flexible prices and money-back guarantee

Place Order

Performance Evaluation of Model

A classification model will map data to prediction classes. There are four cases possible. If the data has a positive label and classified as positive, then it is true positive (TP); if classified negative, it is false negative (FN). If the data has a negative label and is classified as negative, then it is true negative (TN); if classified as positive, it is false positive (FP). From a classifier and a data set, a 2 × 2 confusion matrix can be formed.

Classification report is calculated which gives us the following measures: Precision is used to calculate how many of them are truly positive. Recall is used to calculate how many real positive are captured by the model and labeled Positive. F1 score is the harmonic mean of precision and recall of the model.

Experiments and Results

Data

The data used were data based on prostate cancer and breast cancer, which is obtained from the Kaggle website. 100 observations were recorded for prostate cancer data, in which 62% observations were malignant cancer and 38% observations were benign cancers. Meanwhile, the breast cancer data consisted of 569 observations, in which 212 cancers were malignant cancer, and 357 were benign cancer. Features for each data are mentioned here.

Results

The result and analysis of classification of Prostate and Breast Cancer with the help of SVM-RFE is covered in this section. The results of the ranking score that are obtained using Equation (7) for the feature selection of prostate cancer are listed below in increasing order of their weightage of features: ['fractal_dimension', 'smoothness', 'compactness', 'symmetry','radius', 'texture', 'perimeter', 'area']

The feature having highest weight is the area feature which has a weight of 23992022.23703918,while the lowest weight feature is fractal_dimension feature,which has a weight of only 1.5849710602904137. The results of the ranking score that are obtained using Equation (7) for the feature selection of breast cancer are listed below in increasing order of their weightage of features: ['fractal dimension error', 'smoothness error', 'concave points error', 'mean fractal dimension', 'symmetry error', 'mean smoothness', 'compactness error', 'concavity error', 'worst fractal dimension', 'radius error', 'worst smoothness', 'mean symmetry', 'mean concave points', 'mean compactness', 'worst symmetry', 'worst concave points', 'mean concavity', 'texture error', 'worst compactness', 'worst concavity', 'perimeter error', 'mean radius', 'worst radius', 'mean texture', 'mean perimeter', 'worst texture', 'worst perimeter', 'area error', 'mean area', 'worst area']

The first highest feature is the worst area feature which has a weight of 147512379.03601986,while the lowest feature is fractal dimension error feature, which has a weight of only 0.14560705396198254.

Conclusion

We implemented categorizing breast and prostate cancer by selecting features subsets based on Support Vector Machine-Recursive Features Elimination. In breast cancer data, feature selection was performed by selecting 8 features from 30 features that have the highest rating on SVM weights while in prostate cancer data, feature selection was performed by selecting 2 features from 8 features that have the highest rating on SVM weights. Based upon SVM-RFE experiment, the feature profile of worst area had the highest score for breast cancer while the feature profile of area had the highest score for prostate cancer. We were able to produce an accuracy rate of 96.50%, the precision of 96.56%, and recall of 96.50% with the model. In future work, SVM-RFE optimization is needed to provide a consistent process in feature selection.

References

NCBI. What is Cancer? https://www.cancer.gov/about-cancer/understanding/what-is-cancer.
IARC Global Cancer Observatory. 2018.
Jakkula, Vikramaditya. 'Tutorial on support vector machine (svm).' School of EECS, Washington State University 37 (2006).
Learning: Support Vector Machines https://www.youtube.com/watch?v=_PwhiWxHK8o&t=25s
Qifeng Zhou, Wencai Hong, Guifang Shao and Weiyou Cai, 'A new SVM- RFE approach towards ranking problem,' 2009 IEEE International Conference on Intelligent Computing and Intelligent Systems, Shanghai, 2009, pp. 270-273.
Bustamam, Alhadi & Bachtiar, Anas & Sarwinda, Devvi. (2019). “Selecting Features Subsets Based on Support Vector Machine-Recursive Features
Elimination and One Dimensional-Naïve Bayes Classifier using Support Vector Machines for Classification of Prostate and Breast Cancer”. Procedia Computer Science. 157. 450-458. 10.1016/j.procs.2019.08.238.
Guyon, I., Weston, J., Barnhill, S., Vapnik, V. (2002) “Gene selection for cancer classification using support vector machines.” Mach. Learn 46: 389– 422.
A. Adorada, R. Permatasari, P. W. Wirawan, A. Wibowo and A. Sujiwo, 'Support Vector Machine - Recursive Feature Elimination (SVM - RFE) for Selection of MicroRNA Expression Features of Breast Cancer,' 2018 2nd International Conference on Informatics and Computational Sciences (ICICoS), Semarang, Indonesia, 2018, pp. 1-4.
P. A. Mundra and J. C. Rajapakse, 'SVM-RFE With MRMR Filter for Gene Selection,' in IEEE Transactions on NanoBioscience, vol. 9, no. 1, pp. 31-37, March 2010.
https://www.cancer.gov/about-cancer/understanding/what-is-cancer
https://www.youtube.com/watch?v=_PwhiWxHK8o&t=25s

Cite this paper

Categorizing Breast And Prostate Cancer By Selecting Features Subsets Based On Support Vector Machine-Recursive Features Elimination. (2021, October 04). Edubirdie. Retrieved April 19, 2024, from https://edubirdie.com/examples/categorizing-breast-and-prostate-cancer-by-selecting-features-subsets-based-on-support-vector-machine-recursive-features-elimination/

“Categorizing Breast And Prostate Cancer By Selecting Features Subsets Based On Support Vector Machine-Recursive Features Elimination.” Edubirdie, 04 Oct. 2021, edubirdie.com/examples/categorizing-breast-and-prostate-cancer-by-selecting-features-subsets-based-on-support-vector-machine-recursive-features-elimination/

Categorizing Breast And Prostate Cancer By Selecting Features Subsets Based On Support Vector Machine-Recursive Features Elimination. [online]. Available at: <https://edubirdie.com/examples/categorizing-breast-and-prostate-cancer-by-selecting-features-subsets-based-on-support-vector-machine-recursive-features-elimination/> [Accessed 19 Apr. 2024].

Categorizing Breast And Prostate Cancer By Selecting Features Subsets Based On Support Vector Machine-Recursive Features Elimination [Internet]. Edubirdie. 2021 Oct 04 [cited 2024 Apr 19]. Available from: https://edubirdie.com/examples/categorizing-breast-and-prostate-cancer-by-selecting-features-subsets-based-on-support-vector-machine-recursive-features-elimination/

copy

Evaluating the Impact of Obesity on the Treatment and Recurrence of Breast Cancer

The second biggest cause of cancer in the UK is overweight or obesity and this is preventable....

3 Pages | 1529 Words

Side Effects Of Breast Cancer Treatment And Suggestions To Solve Them

Breast Cancer

Meg, age 50, currently on post-menopausal, was diagnosed with Invasive Ductal Carcinoma (IDC),...

3 Pages | 1380 Words

Essay About Journey of a Breast Cancer Patient

In this essay, a patient’s journey from diagnosis to completion of treatment will be discussed....

4 Pages | 1957 Words

Predicting Breast Cancer

Abstract— Breast Cancer is one of the most common disease that is responsible for a high number of...

5 Pages | 2130 Words

Skin And Breast Cancer Prevention

Cancer is a global epidemic that takes millions of lives every year. There are many types of...

3 Pages | 1180 Words

Breast Cancer: Definition, Risk Factors And Treatment

Breast cancer is a form of cancer most common in women, but can also affect men, were cells in the...

4 Pages | 1700 Words

Breast Cancer In Australia: Risk Factors And Treatment

Cancer is an abnormal growth of cells in the human body that tend to grow uncontrollably and...

4 Pages | 2074 Words

Breastfeeding As A Prevention Of Breast Cancer

An enormous increase in the incidence of breast cancer among women is widely seen. Prevention is...

1 Page | 512 Words

The Prevalence Of Breast Cancer In Minnesota And Cottonwood County: Insurance Coverage, The Required Healthcare, And The Cost Of Treatment

High treatment costs, strenuous treatment plans, and emotional stress on the patient and on her...

4 Pages | 1784 Words

ABSTRACT

Introduction