Blog
Loan Eligibility Prediction Using ML
The idea behind this project is to understand how do we predict the loan Status of a customer based on the data of previous loans.
We are going to use the most widely used concepts such as Regression, Classifiers (Random Forest Classifier & Extra Trees Classifier), Feature Importance.
What we will be doing?
- Explore the Data
- Draw some conclusions from the data set(visualizing the data)
- Find which feature has more importance
- Split the data into Train and Test
- Train a model
- Evaluate the model performance
The DataSet that we are gonna use can be downloaded through this link:
https://www.kaggle.com/altruistdelhite04/loan-prediction-problem-dataset
Code for predicting loan status
# importing important libraries
import numpy as np
import pandas as pd
#reading the data
data=pd.read_csv(‘train_u6lujuX_CVtuZ9i.csv’)
#understanding the data
print(data.head())
print (data.shape)
Output Window:
# removing uneccessary columns
data.drop([‘Loan_ID’],axis=1,inplace=True)
#converting the data into catogorical variables
c_data=pd.get_dummies(data=data, columns=[‘Gender’, ‘Married’, ‘Dependents’, ‘Education’,’Self_Employed’, ‘ApplicantIncome’, ‘CoapplicantIncome’, ‘LoanAmount’,’Loan_Amount_Term’, ‘Credit_History’, ‘Property_Area’, ‘Loan_Status’])
print(c_data.head())
Output Window:
#visualising feature importance
from sklearn.ensemble import ExtraTreesClassifier
import matplotlib as plt
x_1=c_data.iloc[:,0:1022]
#independent columns
y_1=c_data.iloc[:,-1]#target column i.e loanstatus
clf=ExtraTreesClassifier()
clf.fit(x_1,y_1)
print(clf.feature_importances_)
Output Window:
[0.00669548 0.00604855 0.00583627 ... 0.00706081 0.010482 0.0048473 ]
#finding out feature with more weightage
feat_imp=pd.Series(clf.feature_importances_,index=x_1.columns).sort_values(ascending=False,inplace=False)
top_imp_features=feat_imp.nlargest(5)
print(top_imp_features)
feat_imp.nlargest(5).plot(kind=’barh’)
Output Window:
Credit_History_0.0 0.128080
Credit_History_1.0 0.089349
CoapplicantIncome_0.0 0.010859
Property_Area_Semiurban 0.010482
ApplicantIncome_10000 0.009264
dtype: float64
#splitting the data into train test split
x=c_data[[‘Credit_History_0.0′,’Credit_History_1.0′,’Property_Area_Semiurban’,’ApplicantIncome_10000′]]
y=c_data[‘Loan_Status_Y’]
from sklearn.model_selection import train_test_split
x_train,x_test,y_train,y_test=train_test_split(x,y,test_size=0.3,random_state=5)
from sklearn.ensemble import RandomForestClassifier
rfc=RandomForestClassifier(n_estimators=100)
rfc.fit(x_train,y_train)
#reading test data
data_2=pd.read_csv(r”C:\Users\Shanmukh\.spyder-py3\test_Y3wMUE5_7gLdaTN.csv”)
#removing unneccesary columns from test data
data_2.drop([‘Loan_ID’],axis=1,inplace=True)
#converting the test data into catogorical data
c2_data=pd.get_dummies(data=data_2, columns=[‘Credit_History’,’CoapplicantIncome’,’Property_Area’,’ApplicantIncome’])
x2=c2_data[[‘Credit_History_0.0′,’Credit_History_1.0′,’Property_Area_Semiurban’,’ApplicantIncome_10000′]]
#predicting loan status based on our train data for the test data
y_pred=rfc.predict(x_test)
print(y_pred)
predicted_y=rfc.predict(x2)
print(predicted_y)
Output Window:
#finding out accuracy
from sklearn.metrics import accuracy_score
accuracy=accuracy_score(y_test,y_pred)
print(‘Accuracy’,accuracy*100)
Output Window:
Accuracy 84.32432432432432
This project gives you good experience on exploring diverse concepts in python. After successfully completing this project, you will be able to start your own beginner level projects on Machine Learning.
Happy Coding!!!