This paper presents the Chicken Swarm Optimization algorithm for feature selection, which can be used for the prediction of cervical cancer. Cervical Cancer is the type of cancer that occurs at the cells of the cervix – the lower part of the uterus that connects to the vagina. Various strains of the human papillomavirus (HPV), a sexually transmitted infection, play a role in causing most cervical cancer. When exposed to HPV, a woman’s immune system typically prevents the virus from doing harm. In a small group of women, however, the virus survives for years, contributing to the process that causes some cells on the surface of the cervix to become cancer cells. Anyone can reduce the risk of cervical cancer by having screening tests and receiving a vaccine that protects against HPV infection. Feature Selection is a type of optimization algorithm and plays a vital role in the field of Machine Learning. In recent years there has been an exponential increase in the amount of data available for processing in Machine Learning problems. So, the Feature Selection was introduced to solve this problem. Feature Selection is used when there is a need to eliminate such redundant features so that a better subset of features can be obtained which helps in reducing the dimensionality of a dataset. The Chicken Swarm Optimization Algorithm is a new bio-inspired optimization technique, which is proposed for feature selection for prediction of cervical cancer. Impersonating the hierarchical order in the chicken swarm, which includes roosters, hens, and chicks. CSO can productively extricate the chickens’ swarm intelligence to optimize problems. CSO has the ability to attain exceptional optimization results in terms of optimization accuracy. In CSO the chicken swarm is divided into various sets or groups, which consist of a single rooster and a number of hens and chicks. Different chickens follow various kinds of motion. There exists competition amongst various chickens under specific hierarchical order.
Keywords: Cervical Cancer, Chicken Swarm Optimization, Feature Selection, Machine Learning, Evolutionary Algorithms, Classification, Nature Inspired;
Cancer starts when cells in the body begin to grow out of control. Cells in nearly any part of the body begin to grow out of control. Cervical cancer starts in the cells lining the cervix – the lower part of the uterus (womb). The cervix is a part, which connects the body of the uterus to the vagina. Most cervical cancers are squamous cell cancers. Cervical cancer tends to occur during midlife . It is most frequently diagnosed in women between the ages of 35 and 44. It rarely affects women under the age of twenty, and more than 15 percent of diagnoses are made in women older than 65 . Early-stage cervical cancer generally produces no signs or symptoms but the signs of symptoms of more advanced cervical cancer may include pelvic pain, vaginal bleeding, or watery, bloody vaginal discharge that may be heavy and have a foul odor. It isn’t clear what causes cervical cancer, but it’s certain that HPV plays a role. HPV is very common, and most women with the virus never develop cervical cancer. This means other factors- such as your environment or your lifestyle factors also determine whether a person will develop cervical cancer .
Feature Selection aka variable selection is one of the core concepts in the field of Machine Learning which can hugely impact the performance of a model. The data features that we use to train our models have a great impact on the performance the model can achieve. Irrelevant or partially relevant features can negatively impact the performance of our model. So, the Feature selection should be the first and foremost step of any machine learning model design. Feature Selection is the process where you automatically or manually select those features which contribute most to our prediction variable or the output . For the past few years, data is increasing day by day which in turn is introducing many problems along with it. Bigger data is more prone to noise, which needs to be treated because, if not, it could result in the decreased performance on the result. This is where feature selection comes into play. Feature selection reduces the computational cost of the model as well as the complexity of the dataset. Feature selection can be categorized in filter wrapper and embedded methods.
The evolutionary strategy is a scalable alternative to reinforcement learning. Evolutionary strategies, being less efficient than RL, offer many benefits. Evolutionary strategies can be defined as an algorithm that provides the user with a set of candidate solutions to evaluate a problem. The evaluation is based on an objective function that takes a given solution and returns a single fitness value. Based on the fitness results of the current solutions, the algorithm will then produce the next generation of candidate solutions that are more likely to produce even better results than the current generation. The iterative process will stop once the best-known solutions are satisfactory for the user .
A new bio-inspired algorithm, Chicken Swarm Optimization (CSO), is proposed for optimization applications. In the chicken swarm there exist, several groups, each group comprises a dominant rooster, a couple of hens, and chicks. Divide the chicken swarm and determining the identity of the chickens all depends on the fitness values of the chicken themselves. The chicken with best several fitness values would be acted as roosters, each of which would be the head rooster in the group. The chicken with worst several fitness values would be designated as chicks. The other would be the hen. The hens randomly choose which group to live in. The mother-child relationship between the hens and chicks is also randomly established.
- Chicken Swarm Optimisation feature extraction algorithm has been discussed.
- Chicken Swarm Optimisation is used as a search strategy to find optimal features.
- Random Forest and K-nearest neighbors are used to evaluate the quality of the selected features.
- We would like to underline that the main goal of proposing and implementing the chicken swarm optimization is to keep it as easy, simple, and understandable as possible.
- To evaluate the result, we have used two different classifiers (i) k-nearest neighbors (k-NN) and (ii) Random Forest.
The elucidation of the paper is as follows. The background study is explained under Section 2. The proposed method is discussed under Section 3. Results are discussed in section 4. Comparisons have been done in section 5 and finally, Section 6 concludes the work with future scope.
2.1 Machine Learning Methods
In 2007, Muhammed Fahri Unlersen, Kadir Sabanci, Muciz Ozcan  proposed machine learning methods namely KNN and MLP for the prediction of the feature selection for determining the cervical cancer possibility. In it the two famous machine learning methods with best performances have been presented. These algorithms can be defined as:
2.1.1 K-Nearest Neighbours:
KNN is a non-parametric algorithm method used for the classification and the regression of the data. In both regression and classification, the input is the k closest training examples in the feature space . As in the  KNN method is applied for the determination of the cervical cancer and the numbers of the neighbors are changed from 1 to 90.
Algorithm 1: (K-Nearest Neighbours (KNN)).
- Let k be the number of nearest neighbors and S be the set of training sets.
- For each point in the S:
- 2.1 Calculate the distance between the current point ant the chosen point from the S.
- 2.2 Store the distance to the ordered set.
- Sort the ordered set of distances in ascending order of the distances.
- Select the first k entries from the sorted list.
- Get the labels of these entries.
- If the type is regression, return the mean of the selected k labels.
- If the type is classification, return the mode of the selected k labels
Fig. 1 Explaining the kNN algorithm.
2.1.2 Multilayer Perceptron:
The multilayer perceptron is the class of feedforward artificial neural networks. A MultiLayer Perceptron at minimum consists of 3 layers of nodes: input layer, hidden layer, and the output layer. Except for the input layer, each node is a neuron that uses a non-linear activation function . As in the  MLP method is applied for the determination of the cervical cancer and had been investigated that presenting how many neurons in the hidden layer presents the best result.
Algorithm 2: (MultiLayer Perceptron (MLP)).
- for all inputs neurons j do
- set aj = xj
- end for
- for all hidden layers and output neurons j in topological order do
- set netj = wjo + sum( k∈ Pred(j) wjkak)
- set aj = flog(netj)
- end for
- for all output neuron j do
- assemble aj in output vector y
- end for
- return y
Fig. 2 MultiLayer Perceptron
2.2 Feature Selection:
Feature Selection aka variable selection is one of the core concepts in the field of Machine Learning which can hugely impact the performance of a model. The data features that we use to train our models have a great impact on the performance the model can achieve. Irrelevant or partially relevant features can negatively impact performance of our model. So, the Feature selection should be the first and foremost step of any machine learning model design. Feature Selection is the process where you automatically or manually select those features which contribute most to our prediction variable or the output . For the past few years, data is increasing day by day which in turn is introducing many problems along with it. Bigger data is more prone to noise which needs to be treated because if not, it could result in the decreased performance on the result. This is where feature selection comes into play. Feature selection reduces the computational cost of the model as well as the complexity of the dataset. Feature selection can be categorized in filter wrapper and embedded methods.
3.1 Proposed chicken Swarm optimization
The voguish Chicken Swarm Optimization has been applied to the Publicly available Cervical Cancer (Risk Factors) dataset to optimize the problem of feature selection and detect the occurrence of the disease at its early age. It has been used for feature selection tasks. Two famous machine learning methods kNN & MLP  have been compared and also compared with the various algorithms from paper . The performance of the proposed algorithm has been using two machine learning models, kNN & RandomForest. This implementation has been carried out using Python and its libraries.
Algorithm: Chicken Swarm Optimisation (CSO)
Ayush algorithm and flow chart
The flowchart of the CSO has been demonstrated in the Figure. 3
3.2 Implementation of the proposed method
In this section, the experimental setups, parameters, datasets & implementation of the proposed approach has been discussed.
3.2.1 Experimantal Setup
CSO contains six parameters. As chicken is primarily considered only as a food source and only hen lays eggs, which is also a source of food. That’s why keeping hens are more favorable for humans. Thus hen parameter would be greater than the Rooste parameter. Considering individual differences, not all the hens would be laying their eggs at the same time, that’s why Hen parameter will also be bigger than mother hen parameter. Also, we assume that the adult chicken population would surpass that of the chicks i.e. the chick parameter. Now the for the value of the swarm it should neither be too big nor be too small after many tests the value between 5 to 30 would generate the best results.
The dataset i.e. Cervical Cancer (Risk Factors) Dataset is publicly available at the machine learning repository . The dataset was collected at ‘Hospital Universitario de Caracas’ in Caracas, Venezuela. The dataset comprises demographic information, habits, and historic medical records of 858 patients. Several patients decided not to answer some of the questions because of privacy concerns (missing values). This dataset focuses on the prediction of the indicators/diagnosis of cervical cancer. The features cover demographic information, habits, and historic medical records best suited for the prediction of cervical cancer at its early ages. The characteristics of the dataset are:
- Data Set Characteristics: Multivariate
- Attribute Characteristics: Integer, Real
- Associated Tasks: Classification
- No. Of Instances: 858
- No. Of Attributes: 36
- Missing Values? : Yes
- Area – Life
4. Results And Discussions
The proposed Chicken Swarm Optimization Method has been applied to the selected Cervical Cancer (Risk Factors) Dataset and the results calculated are discussed in this section.
In this Section, the proposed Chicken Swarm Optimization has been compared with the two different studies made on the Detection Of Cervical Cancer. Both of these studies are:
In 2018 Yasha Singh, Dhruv Shrivatsva, P.S.Chand, and Dr. Surrinder Singh has proposed a paper  in which they have compared the various algorithms for the screening of Cervical Cancer in the recent times in the chronological order. The comparison of this study has been shown in Fig. 4. In 2007, Muhammed Fahri Unlersen, Kadir Sabanci, Muciz Ozcan  proposed machine learning methods namely KNN and MLP for the prediction of the feature selection for determining the cervical cancer possibility. This comparison has been shown in the Fig. 5.
Fig. 4 Accuracy Comparison with other algorithms in study shown in 
Fig. 5 Accuracy Comparison with other algorithms in study shown in 
The proposed Chicken Swarm Optimisation shows the best accuracy of 99.53% in the feature selection from the Cervical Cancer (Risk Factors) dataset  with very fast computational time of few seconds.
The proposed Chicken Swarm Optimisation clearly outperforms kNN and MLP algorithm. Also, it also outperforms all the algorithms shown in Fig. 5. We can also infer that the Chicken Swarm Optimisation algorithm outperforms the other algorithms in feature selection without harming the accuracy in the original result. Thus, it is claimed that the feature selection using the Chicken Swarm Optimisation algorithm can be used at various practical applications and play a very significant role in the detection of the cervical cancer at the earlier stage
6. Conclusions And Future Works:
In this work, chicken swarm optimization algorithm for feature selection has been proposed. Chicken Swarm Optimisation algorithm has been proposed to get the minimized set of features and determine the comparative accuracy and the computational time without degrading the performances. The dataset discussed in Section 3.2.3 is applied in the CSO. The proposed algorithm has selected fewer features with higher accuracy of 99.53% as compared to any other algorithms. The result shows the proposed Chicken Swarm Algorithm has outperformed other feature selection algorithms. Researchers can apply the proposed algorithm for the feature selection and use it for early detection of Cervical Cancer.
In this paper, the cervical dataset has been used. The Cervical dataset can also be used for various Feature selection methods upcoming in future and it is also used to detect and prevent the early stages of cervical cancer.
Chicken swarm optimization is a very fast optimization algorithm. CSO has the ability to attain exceptional optimization results in terms of optimization accuracy. CSO can be applied for various other feature selection datasets as it can achieve good optimization in terms of accuracy as well as robustness.
Feature Selection i.e. Validation Selection is used when there is a need to eliminate redundant features so that a better subset of features can be obtained which helps in reducing the dimensionality of a dataset.
- Cervical Cancer: https://www.cancer.org/cancer/cervical-cancer/about/what-is-cervical-cancer.html
- Occurrences http://www.nccc-online.org/hpvcervical-cancer/cervical-cancer-overview/
- Causes and symptoms https://www.mayoclinic.org/diseases-conditions/cervical-cancer/symptoms-causes/syc-20352501
- Feature Selection https://towardsdatascience.com/feature-selection-techniques-in-machine-learning-with-python-f24e7da3f36e
- Evolutionary strategy http://blog.otoro.net/2017/10/29/visual-evolution-strategies/
- Chicken Swarm Optimization https://www.researchgate.net/publication/278691165_A_New_Bio-inspired_Algorithm_Chicken_Swarm_Optimization
- KNN: https://en.wikipedia.org/wiki/K-nearest_neighbors_algorithm
- MLP: http://ml.informatik.uni-freiburg.de/former/_media/documents/teaching/ss09/ml/mlps.pdf
- Cervical cancer Dataset https://archive.ics.uci.edu/ml/datasets/Cervical+cancer+%28Risk+Factors%29