Diabetes Risk Prediction Using Machine Learning

Topics:	Artificial Intelligence Diabetes Metabolism
Words:	1410
Pages:	3 This essay sample was donated by a student to help the academic community. Papers provided by EduBirdie writers usually outdo students' samples.

Abstract

With changing lifestyle and food habits like lack of proper sleep, exercise, bad eating habits, etc have led to rapid increase in the number of people having diabetes hence, its necessary to decrease it. The proposed system developed will predict the risk of a person getting diabetes and classify it into one of the three categories namely low, medium and high. Depending on the risk level a diet plan or a nearby diabetologist will be suggested. The user’s risk level will be provided based on the lifestyle parameters thereby avoiding complex medical jargons. The main advantage of this proposed system is its simplicity, ease of use and easy access. The system uses random forest, a supervised machine learning algorithm for classifying the person into the appropriate risk category based on their inputs that are expert approved lifestyle parameters. The accuracy of the proposed system using the random forest algorithm is 88%. The proposed system allows the users to understand and analyze their lifestyle habits and encourages them to adopt a better and active lifestyle and good eating habits according to their risk category. So, this system effectively contributes in creating a healthy society on the whole.

Introduction

Diabetes is an extensively growing disorder among people nowadays because of their unhealthy lifestyle and imbalanced nutrition, hence finding a solution for its prevention at early stages and spreading awareness about it has become an absolute necessity. The age group of people getting affected by diabetes is increasing every day That is why diabetes risk prediction has become the need of the hour. The diabetes risk predictor will help the user to know his or her risk level. By knowing their risk level, the users can take various preventive measure before diabetes actually hits them. The proposed system hence plays a vital role in keeping the masses educated and prudent.

These days a few systems to calculate the risk of diabetes have surfaced online. Named as the “Diabetes Risk Calculator” they calculate the risk of a person getting diabetes and also provide trivia based on diabetes. In most cases of such systems, machine learning algorithms aren’t applied and hence risk is predicted according to a given range of set values of a few parameters. Hence, the accuracy of the risk calculated is at stake and not so reliable. Some other systems developed included some technical parameters that the user cannot enter without medical help which also affects the prediction’s accuracy and also makes it difficult for the users to use it hence making it less economical and user-friendly.

Drawing inspiration from these systems as well as taking their drawbacks into account the proposed system will be able to calculate the risk using machine learning algorithm called random forest which will improve the accuracy of the system as well make it more reliable. Apart from giving the risk classification the proposed system will also be able to give diet suggestions to the user as well as a list of nearby diabetologists based on the user location. Hence the proposed system to be developed will be a combination of all the pros of the previous systems and also an improvisation on them. This way an effective system to predict the risk of diabetes can provided to the society.

Training and Testing

First step to train and test data was to decide on a programming language that was decided as python and a platform where in the training and testing will be done for this purpose Jupyter Notebook using Anaconda was selected.

Next step will be accessing the collected dataset. Panda library will be used in order to import and read data. The data file imported by Pandas is in .csv format. Through Pandas we used its data cleaning features such as filling, replacing or imputing null values. The (pd.read_csv) reads the csv format, a comma-separated values (csv) file into DataFrame. display(data.head()) previews data.

The next thing that will be done is encoding the data into labels using Label Encoder. Since the data collected was in string format it cannot be processed or transformed without converting the string values to numeric values. Hence, the Label Encoder encodes this string data into numeric data according to the alphabetical order of the inputs column wise.

Save your time!
We can take care of your essay

Proper editing and formatting
Free revision, title page, and bibliography
Flexible prices and money-back guarantee

Place Order

After this splitting of dataset into training and testing data will be done. The training data consists of a known output and the model will learn using this data in order to be generalized to other data afterwards. We have the test dataset in order to test our model’s prediction.

The test set should be big enough to get proper results and represent the data set as a whole. The main aim of this is to generate a model that generalizes and classifies new data well. The test set will represent a proxy for new data. This model does about as well on the test data as it does on the training data. The SciKit library will be used to divide the data via Model Selection library, a tool, that has a ‘train_test_split’ class. Using this the dataset is split into training and testing datasets in 70-30 parts.

The dataset was split into two different datasets, one for the independent features - X, and one for the dependent variable - y that is the risk class. Further the dataset X is split into two separate sets - X_train and X_test. Further we’ll split the dataset y into two sets as well - y_train and y_test.

Performance Comparison

This data collected was trained and tested using three different algorithms after the splitting of the data into test and training data. The three algorithms were tested to see which algorithm gives the best performance which was measured in terms of the algorithm giving the highest accuracy. The three algorithms whose performances were compared are K-Nearest Neighbor (KNN), Support Vector Machine (SVM) and Random Forest:

Methodology

The system proposed includes the making of a web application through which the user will interact with the machine learning model. In this, the user uses the web application to input the thirteen basic lifestyle parameters like gender, age, calorie intake, heredity, smoking, alcohol consumption, mental issues, daily physical activity, sleeping pattern, blood pressure, pcos and dark skin patches through a form. The values entered by the user via the form input

method will then be taken to the already trained model and the trained and tested machine learning model will give the risk level according to the parameter values inputted. This risk calculated will be then displayed on the web application according to which the user can take the necessary steps like going to a nearby diabetologists or working on the parameters that increase the risk of getting diabetes.

Conclusion

The system developed will be able to predict the risk of a person getting diabetes before its onset thereby encouraging people to adopt a healthier and more active lifestyle. Its ease of use is another factor which enables people to make full use of it. This system is in its nascent stage at this point. It has a lot of scope for improvement in the near future. This system can be made more accurate by collecting more dataset and can be extended to predict the risk of type I diabetes as well as gestational diabetes. The diet suggestion can also be customized according to each user’s individual habits.

References

G. K Sowjanya, Dr. Ayush Singhal, Chailtali Choudhary, “MobDBTest: A machine learning based system for predicting diabetes risk using mobile devices”, IEEE International Advance Computing Conference (IACC), 2015
Roxana Mirshahvald, Nastaran Asadi Zanjani, “Diabetes prediction using Ensemble Perceptron Algorithm”, 9th International Conference on Computational Intelligence and Communication Networks, 2017
Prof. Dhomse Kanchan B, Mr. Mahale Kishor M., “Study of Machine Learning Algorithms for Special Disease Prediction using Principal of Component Analysis”, International Conference on Global Trends in Signal Processing, Information Computing and Communication, December 2016
Raid M.Khalil, Adel Al-Jumaily, “Machine learning based prediction of depression among type 2 diabetic patients”, 12th International Conference on Intelligent Systems and Knowledge Engineering (ISKE)978, November 2017
Ayush Anand, Divya Shakti, “Prediction Of Diabetes Based On Personal Lifestyle Indicators”, 1st International Conference on Next Generation Computing Technologies (NGCT-2015) Dehradun, India, 4-5 September 2015
Md. Aminul Islam, Nusrat Jahan, “Predictions of Onset Diabetes using Machine Learning Techniques”, International Journal of Computer Applications(0975-8887), Vol 180- No.5, December 2017
http://www.diabetes.org/are-you-at-risk/diabetes-risk-test/
https://raw.githubusercontent.com/dollcg24/diabetes_dataset/master/data.csv

Cite this paper

Diabetes Risk Prediction Using Machine Learning. (2022, Jun 09). Edubirdie. Retrieved April 25, 2024, from https://edubirdie.com/examples/diabetes-risk-prediction-using-machine-learning/

“Diabetes Risk Prediction Using Machine Learning.” Edubirdie, 09 Jun. 2022, edubirdie.com/examples/diabetes-risk-prediction-using-machine-learning/

Diabetes Risk Prediction Using Machine Learning. [online]. Available at: <https://edubirdie.com/examples/diabetes-risk-prediction-using-machine-learning/> [Accessed 25 Apr. 2024].

Diabetes Risk Prediction Using Machine Learning [Internet]. Edubirdie. 2022 Jun 09 [cited 2024 Apr 25]. Available from: https://edubirdie.com/examples/diabetes-risk-prediction-using-machine-learning/

copy

The Rising Issues Of Diabetes In The UK: Academic Writing

In the first part of this assignment I will be writing about the key academic writing styles and...

5 Pages | 2072 Words

Epidemiology Of Influenza, Diabetes And HIV

Transmittable sicknesses include ailment exuding from the microscopic organisms or infections and...

3 Pages | 1254 Words

How To Follow Healthy Lifestyle

Diabetes Mellitus is a chronic metabolic disorder characterized by hyperglycemia due to absolute...

4 Pages | 1654 Words

Plant Products As Antidiabetic Agents

Beneficial effect of plants in treatment of diabetes is well-known in traditional medicine and...

4 Pages | 1768 Words

The Essence Of Diabetes Management

Diabetes Mellitus can be a tricky disease. I use the word “tricky” because of all the...

2 Pages | 782 Words

The Relation Between Agricultural Biotechnology And Diabetes

Biotechnology is defined as using living organisms or their elements to create useful products for...

2 Pages | 727 Words

Genetic Explanation Of Diabetes

Diabetes
DNA

A genetic disease is typically an illness that rises due to the deformity of the genetic makeup of...

2 Pages | 740 Words

The Impacts Of Diabetes On The Australian Population: The Importance Of The Ottawa Charter

Diabetes is a very complex and serious condition that can have an immense affect on the whole...

7 Pages | 3044 Words

Control Of Diabetes By Lifestyle Activities

When people talk about epidemics, there are thousands of different diseases in the world and...

2 Pages | 907 Words

Abstract

Introduction

Training and Testing

Performance Comparison

Methodology

Conclusion

References

Cite this paper

Most popular essays

Join our 150k of happy users

Diabetes Risk Prediction Using Machine Learning

Abstract

Introduction

Training and Testing

Performance Comparison

Methodology

Conclusion

References

Cite this paper

Related essay topics

Related articles

Most popular essays

Join our 150k of happy users