Predicting Readmission of Diabetic Patients Using Machine Learning: Analytical Essay

This essay sample was donated by a student to help the academic community. Papers provided by EduBirdie writers usually outdo students' samples.

Cite this essay cite-image

1. Dataset Description

UCI Machine Learning repository - Diabetes 130-US hospitals for years 1999-2008 Data Set

This research includes a publicly available dataset taken from the Center for Clinical and Translational Research, Virginia Commonwealth University. It consists of over a million records collected across 130 US hospitals and from various healthcare providers over 10 years (1999 – 2008) [1]. It consists of fifty features representing diabetic patients’ information, mainly regarding readmission. As per our research in the dataset, the essential features that can affect our model are:

  • Admission source – It consists of 21 unique parameters of patients’ admission
  • Discharge disposition information – Includes 29 values indicating patient discharge location
  • Medication changes – Includes information about patients’ medication changes
  • Diagnosis information – Consists of ICD-9 (International Statistical Classification of Diseases and Related Health Problems) code [2]
  • Drug usage – Lists drug dosage information among 23 different types of drugs.
  • Readmission time – Shows if patient readmission was within or after 30 days or no readmission at all.

The train-test split initially includes 80% training and 20% test set data. Also, 5 folds cross-validation is to be applied to get the best evaluation parameters for the given model.

Save your time!
We can take care of your essay
  • Proper editing and formatting
  • Free revision, title page, and bibliography
  • Flexible prices and money-back guarantee
Place an order

2. Introduction

Background: A considerable number of problems have been solved in the healthcare sector using machine learning techniques. We plan on researching one such domain. Hospital readmissions not only prove costly but also risks the patients’ medical condition. Moreover, hospital readmission has been a decisive factor in ranking health center credibility. An increase in hospital visits after discharge is costly and time-consuming for both hospitals and patients [3].

Major studies [4] propose that if there is unplanned readmission within 30 days, it indicates treatment or diagnosis error, which could be avoided. However, if readmission is after 30 days, it depends on the patients’ lifestyle or several other factors [5]. So, an early prediction of readmitting the patients becomes an important task.

Current research and existing models on similar research predict readmission in less than 30 days after discharge [6]. Our research includes predicting unplanned readmission in diabetic patients using multiclass classification. It includes testing whether patients are readmitted within or after 30 days or not readmitted at all. The primary tasks to perform include data preprocessing steps such as data reduction, data cleaning, and data transformation. Furthermore, a good model requires extracting essential features. So, we plan on using various feature selection algorithms to obtain the best features. Using such features, different models such as Random Forest, Support vector machine, Logistic regression, Multilayer perceptron, Naïve Bayes, and Ensemble model is to be tested and compared to obtain the best evaluation parameters (accuracy, precision, recall, F1-score, AUC curve).

3. Methodology/Approaches

Following are the goals of our research:

Predict if the patient will be:

  • Readmitted within 30 days (• Readmitted after 30 days (>30)
  • Not be readmitted (No)

To achieve the goal, we will perform the following tasks:

  1. Task 1: Data Analysis for Decision Making

The first step includes collecting data, analyzing the data by projecting graphs among various features, check correlation among the features, and interpret results. Based on the results, an idea about essential features and outliers is obtained.

  1. Task 2: Data Cleaning

The data contains ‘?’ instead of standard missing values such as ‘NaN’ or ‘NULL’. So, encoding such data and removing redundant features becomes an important task. This step also includes replacing or modifying the dirty data.

  1. Task 3: Data Preprocessing

Process missing data: The features with more than 50% missing data and irrelevant to predicting the target variable are removed.

Encode categorical data: Imputation of categorical data such as gender, race to be done using oneHotEncoder and Label Encoding.

Scale features and apply transformation: In the dataset, some of the features are highly skewed. So, to balance the data, we plan to use various transformation functions such as normalization function, sigmoid function, log function, and cube root function.

  1. Task 4: Feature Selection and Addition

Selecting essential features for the model: In Machine Learning, when there are too many features, it is better to select only the relevant features. We plan to use various algorithms such as SelectKbest, SelectPercentile, and Boruta algorithm for feature selection.

Feature addition: By combining some of the features in the dataset, we can create additional features. It helps to predict the target variable better.

  1. Task 5: Model Building

After splitting the data into train and test, algorithms to select the best model are applied based on accuracy, and test data is fit on it.

We plan to use the following multiclass classification algorithms:

  • Logistic Regression
  • Random Forest
  • Decision trees
  • ExtraTreeclassifier
  • LDA
  • LogisticRgressionCV
  • LinearSVC
  • Gaussian Naïve Bayes
  • XGboost
  • Support vector machine
  • Neural networks (MLPClassifier, feed-forward backpropagation network)
  • GradientBoosting
  1. Task 6: Evaluation and Prediction

Evaluation parameters are essential to satisfy the goal of the research. We plan to evaluate the models using various matrices such as confusion matrix, F1 score, precision, and recall.

4. Potential Tools

  • Python
  • Jupyter Notebook
  • Kaggle Notebook
  • Google Colab

5. Potential Timeline



15th March-30th March

Data analysis for decision making

Data cleaning

Data preprocessing

31st March-5th April

Feature selection and Addition

Progress report

6th April-25th April

Model building

Evaluation and Prediction

26th April-1st May

Project Report

Project Presentation

6. References

  1. Beata Strack, 'Impact of HbA1c Measurement on Hospital Readmission Rates: Analysis of 70,000 Clinical Database Patient Records,' BioMed Research International, 2014. [Online]. Available:
  2. 'Wikipedia,' 30 December 2019. [Online]. Available:
  3. N. Hammoudeh, 'Predicting Hospital Readmission among Diabetics using Deep Learning,' November 2018. [Online]. Available:
  4. D. Mordaunt, 'Improving 30-day readmission risk predictions using machine learning,' in Health Informatics New Zealand (HiNZ) Conference, 2016.
  5., '30-day unplanned readmission and death measures,' 2017. [Online]. Available:
  6. Ti’jay Goudjerkan, 'Predicting 30-Day Hospital Readmission for Diabetes Patients using Multilayer Perceptron,' Patients using Multilayer Perceptron, vol. 10, no. 2, pp. 268-275, 2019.
Make sure you submit a unique essay

Our writers will provide you with an essay sample written from scratch: any topic, any deadline, any instructions.

Cite this paper

Predicting Readmission of Diabetic Patients Using Machine Learning: Analytical Essay. (2022, September 27). Edubirdie. Retrieved July 18, 2024, from
“Predicting Readmission of Diabetic Patients Using Machine Learning: Analytical Essay.” Edubirdie, 27 Sept. 2022,
Predicting Readmission of Diabetic Patients Using Machine Learning: Analytical Essay. [online]. Available at: <> [Accessed 18 Jul. 2024].
Predicting Readmission of Diabetic Patients Using Machine Learning: Analytical Essay [Internet]. Edubirdie. 2022 Sept 27 [cited 2024 Jul 18]. Available from:

Join our 150k of happy users

  • Get original paper written according to your instructions
  • Save time for what matters most
Place an order

Fair Use Policy

EduBirdie considers academic integrity to be the essential part of the learning process and does not support any violation of the academic standards. Should you have any questions regarding our Fair Use Policy or become aware of any violations, please do not hesitate to contact us via

Check it out!
search Stuck on your essay?

We are here 24/7 to write your paper in as fast as 3 hours.