Abstract
This paper explores the classification of audio signals feature dataset to diagnosis Parkinson’s disease (PD), mainly it effects the central nervous system, Parkinson’s disease patient typically have a low-volume noise with a monotone quality. Firstly, James Parkinson described Parkinson’s disease as a neurological syndrome. In this research, we build a program to find a better classifier for the corresponding dataset. We use the audio feature dataset from the UCI dataset repository, and the dataset contains UDPR Score to predict. The features extracted were used in XGBoost classifier model to predict the PD, the classifier gives the accuracy of 96% and Matthews Correlation coefficient (MCC) of 89%.
INTRODUCTION
Parkinson’s disease described as a neurodegeneration disorder, which is the death of dopamine generating cells. The loss of dopaminergic neurons in the midbrain decreases the achievable rate of communication. Parkinson's disease affects the central nervous system, which leads to the motor system; the main PD symptoms are tremor, rigidity, and movement disorders. The people who are having Parkinson’s Disease mostly 90% of them have a speech impairment, only 3% to 4% of PD patient receives speech therapy, and also only one of the essential factor for PD is age.
Save your time!
We can take care of your essay
- Proper editing and formatting
- Free revision, title page, and bibliography
- Flexible prices and money-back guarantee
Place an order
Unified Parkinson's disease rating scale (UPDRS) used to find the severity of the PD with the help of clinical expertise and experience. As the speech of PD, the patient has changed in frequency specters in their speech because they lose control of the limb. So, the low frequency region gives essential data to differentiate the speech impairments in PD.
In this paper, we perform a feature selection for the audio features dataset created by Max Little of the University of Oxford, and the Classification that we used to train the dataset is XGBoost. We achieved an accuracy of 96% and the features extracted by using a genetic algorithm. The reason to use the XGBoost algorithm because it is fast and highly performed, the goal is to improve accuracy.
METHODOLOGY
In this research, we applied four machine learning algorithms, which are KNN, Logistic Regression, Decision Tree, XGBoost. We implemented this model to find the best model among them for the respective datasets
XGBoost
XGBoost is a boosting algorithm, it is a statistical learning method and derived from gradient boosting decision tree, it has better performance and optimization. The reason why we used XGBoost is it has excellent efficiency and feasibility, XGBoost allows dense and sparse matrix as the input, and a numeric vector uses integer starting from 0 for classification, we can add several iterations to the model A dataset with n samples and d features of every sample then is the prediction from the decision tree.)
To predict the final result Here, l is the loss function, and it measures the error in prediction
KNN
The process of KNN algorithm estimates the similar values exist close as if we decrease the amount of K from k to 1, the forecast may become less stable, the higher k value it becomes the more stable and high prediction
In KNN, classification done based on the amount of k, as it calculates the distance between datapoints
Evaluation Metrics
The success of classifiers measured based on Accuracy, Sensitivity, Specificity, and MCC. The formula for accuracy is the ratio of the correctly classified instance to the whole instance
Data preprocessing:
MinMaxScaler, Normalizer method in scikit-learn are preprocessing techniques, based on our features values, we select the method, as we know machine learning algorithm will perform better and faster when features are relative or similar scale like KNN, LR, NN. We suggest MinMaxScale() for preprocessing, as it subtracts the minimum value in feature and divides with its range, a difference of maximum and minimum is range MinmaxScale return the default range 0 to1.
CONCLUSION
The objective of the study is to find the better classifier for the Parkinson’s disease voice feature dataset, here the classification accuracy was studied and compared. The XGBoost classifier obtained a classification accuracy of 96% for the given PD dataset. If the dataset has changed, it provides a better classifier for the respective voice feature dataset.