predicting-30-day-readmission

Module 5 Final Project - Predicting 30-day Hospital Readmission

Student name: Cynthia Pedrasa
Student pace: self paced
Scheduled project review date/time: Monday, Jun 15, 2020 Time: 10:00am-11:00am (EDT)
Instructor name: Jeff Herman
Deliverables:

  1. Jupyter Notebooks:
    a. Data Preprocessing
    b. Model Evaluation

  2. Blog Post:

  3. Executive Summary:

Introduction

For the final project for Module 5, we have chosen a binary classification project on predicting 30-day Readmisssion Risk of patients with Diabetes.

Objectives

Hospital readmissions are associated with unfavorable patient outcomes and high financial costs.

A successful predictive model will help the Healthcare Organization:

drawing

The Project

### Data Science Workflow

Understanding the typical work flow on how the data science process works is important in business understanding and problem solving. Using the OSEMN Framework the student will go through the different steps of the framework in an iterative and non-linear process.

In this Module 5 Project, the student will build and test different binary classifier algorithms to predict 30-day hospital readmissions of patients with diabetes, based on the electronic medical records. The models will be tuned to improve accuracies and the model with the best score will be selected to make accurate predictions of data.
Part I of the project includes the Introduction, data load, data scrubbing thru completion of pre-processing of the final dataframe. Part II of the project includes the Data Modeling, tuning, evaluation, performance metrics and finalization/saving of the model. for later prediction use.

In this project we would like to find the answers to the following questions:

Hospital Readmissions Diabetes Data Set

UCI Machine Learning Datasets Repository

The dataset represents 10 years (1999-2008) of clinical care at 130 US hospitals and integrated delivery networks. It includes over 50 features representing patient and hospital outcomes. Information was extracted from the database for encounters that satisfied the following criteria.

  1. It is an inpatient encounter (a hospital admission)
  2. It is a diabetic encounter, that is, one during which any kind of diabetes was entered to the system as a diagnosis.
  3. The length of stay was at least 1 day and at most 14 days.
  4. Laboratory tests were performed during the encounter.
  5. Medications were administered during the encounter.

DATA-SPECIFIC INFORMATION FOR: [diabetic_data.csv]

  1. Number of variables: 50
  2. Number of instances/rows: 101766
  3. Variable List:

Target Variable:

Attribute Description
Readmitted Days to inpatient readmission. Values: “<30” if the patient was readmitted in less than 30 days, “>30” if the patient was readmitted in more than 30 days, and “No” for no record of readmissionFeatures:

Predictors:

Attribute Description
Encounter ID Unique identifier of an encounter
Patient number Unique identifier of a patient
Race Values Caucasian, Asian, African American, Hispanic, and other
Gender male, female, and unknown/invalid
Age Grouped in 10-year intervals: 0, 10), 10, 20), …, 90, 100)
Weight Weight in pounds
Admission type Integer identifier corresponding to 9 distinct values, for example, emergency, urgent, elective, newborn, and not available
Discharge disposition Integer identifier corresponding to 29 distinct values, for example, discharged to home, expired, and not available
Admission source Integer identifier corresponding to 21 distinct values, for example, physician referral, emergency room, and transfer from a hospital
Time in hospital Integer number of days between admission and discharge
Payer code Integer identifier corresponding to 23 distinct values, for example, Blue Cross/Blue Shield, Medicare, and self-pay Medical
Medical specialty Integer identifier of a specialty of the admitting physician, corresponding to 84 distinct values, for example, cardiology, internal medicine, family/general practice, and surgeon
Number of lab procedures Number of lab tests performed during the encounter
Number of procedures Numeric Number of procedures (other than lab tests) performed during the encounter
Number of medications Number of distinct generic names administered during the encounter
Number of outpatient visits Number of outpatient visits of the patient in the year preceding the encounter
Number of emergency visits Number of emergency visits of the patient in the year preceding the encounter
Number of inpatient visits Number of inpatient visits of the patient in the year preceding the encounter
Diagnosis 1 The primary diagnosis (coded as first three digits of ICD9); 848 distinct values
Diagnosis 2 Secondary diagnosis (coded as first three digits of ICD9); 923 distinct values
Diagnosis 3 Additional secondary diagnosis (coded as first three digits of ICD9); 954 distinct values
Number of diagnoses Number of diagnoses entered to the system 0%
Glucose serum test result Indicates the range of the result or if the test was not taken. Values: “>200,” “>300,” “normal,” and “none” if not measured
A1c test result Indicates the range of the result or if the test was not taken. Values: “>8” if the result was greater than 8%, “>7” if the result was greater than 7% but less than 8%, “normal” if the result was less than 7%, and “none” if not measured.
Change of medications Indicates if there was a change in diabetic medications (either dosage or generic name). + + Values: “change” and “no change”
Diabetes medications Indicates if there was any diabetic medication prescribed. Values: “yes” and “no”
24 features for medications For the generic names: metformin, repaglinide, nateglinide, chlorpropamide, glimepiride, acetohexamide, glipizide, glyburide, tolbutamide, pioglitazone, rosiglitazone, acarbose, miglitol, troglitazone, tolazamide, examide, sitagliptin, insulin, glyburide-metformin, glipizide-metformin, glimepiride- pioglitazone, metformin-rosiglitazone, and metformin- pioglitazone, the feature indicates whether the drug was prescribed or there was a change in the dosage. Values: “up” if the dosage was increased during the encounter, “down” if the dosage was decreased, “steady” if the dosage did not change, and “no” if the drug was not prescribed
  1. Missing data codes:
Attributes with Null values Description
race 2273
weight 98569
payer_code 40256
medical_specialty 49949
diag_1 21
diag_1 358
diag_3 1423

Prerequisites

You may need to install some software and packages.

  1. Install Anaconda (https://docs.anaconda.com/anaconda/install/)

  2. Install Scikit-learn (https://anaconda.org/anaconda/scikit-learn)
    conda install -c anaconda scikit-learn
    
  3. Install Imbalanced-Learn Library (https://anaconda.org/conda-forge/imbalanced-learn)
    conda install -c conda-forge imbalanced-learn
    
  4. Install XGBoost Library (https://anaconda.org/conda-forge/xgboost)
    conda install -c conda-forge xgboost
    

Acknowledgments