# Loan Eligibility Prediction Using ML

The idea behind this project is to understand how do we predict the loan Status of a customer based on the data of previous loans.

We are going to use the most widely used concepts such as Regression, Classifiers (Random Forest Classifier & Extra Trees Classifier), Feature Importance.

What we will be doing?

• Explore the Data
• Draw some conclusions from the data set(visualizing the data)
• Find which feature has more importance
• Split the data into Train and Test
• Train a model
• Evaluate the model performance

https://www.kaggle.com/altruistdelhite04/loan-prediction-problem-dataset

# Code for predicting loan status

# importing important libraries

import numpy as np
import pandas as pd

#understanding the data

print (data.shape)

Output Window: # removing uneccessary columns

data.drop([‘Loan_ID’],axis=1,inplace=True)

#converting the data into catogorical variables

c_data=pd.get_dummies(data=data, columns=[‘Gender’, ‘Married’, ‘Dependents’, ‘Education’,’Self_Employed’, ‘ApplicantIncome’, ‘CoapplicantIncome’, ‘LoanAmount’,’Loan_Amount_Term’, ‘Credit_History’, ‘Property_Area’, ‘Loan_Status’])

Output Window:  #visualising feature importance

from sklearn.ensemble import ExtraTreesClassifier
import matplotlib as plt
x_1=c_data.iloc[:,0:1022]

#independent columns

y_1=c_data.iloc[:,-1]#target column i.e loanstatus
clf=ExtraTreesClassifier()
clf.fit(x_1,y_1)
print(clf.feature_importances_)

Output Window:

`[0.00669548 0.00604855 0.00583627 ... 0.00706081 0.010482   0.0048473 ]`

#finding out feature with more weightage

feat_imp=pd.Series(clf.feature_importances_,index=x_1.columns).sort_values(ascending=False,inplace=False)
top_imp_features=feat_imp.nlargest(5)
print(top_imp_features)
feat_imp.nlargest(5).plot(kind=’barh’)

Output Window:

```Credit_History_0.0         0.128080
Credit_History_1.0         0.089349
CoapplicantIncome_0.0      0.010859
Property_Area_Semiurban    0.010482
ApplicantIncome_10000      0.009264
dtype: float64``` #splitting the data into train test split

x=c_data[[‘Credit_History_0.0′,’Credit_History_1.0′,’Property_Area_Semiurban’,’ApplicantIncome_10000′]]
y=c_data[‘Loan_Status_Y’]
from sklearn.model_selection import train_test_split
x_train,x_test,y_train,y_test=train_test_split(x,y,test_size=0.3,random_state=5)
from sklearn.ensemble import RandomForestClassifier
rfc=RandomForestClassifier(n_estimators=100)
rfc.fit(x_train,y_train)

#removing unneccesary columns from test data

data_2.drop([‘Loan_ID’],axis=1,inplace=True)

#converting the test data into catogorical data

c2_data=pd.get_dummies(data=data_2, columns=[‘Credit_History’,’CoapplicantIncome’,’Property_Area’,’ApplicantIncome’])
x2=c2_data[[‘Credit_History_0.0′,’Credit_History_1.0′,’Property_Area_Semiurban’,’ApplicantIncome_10000′]]

#predicting loan status based on our train data for the test data

y_pred=rfc.predict(x_test)
print(y_pred)
predicted_y=rfc.predict(x2)
print(predicted_y)

Output Window: #finding out accuracy

from sklearn.metrics import accuracy_score
accuracy=accuracy_score(y_test,y_pred)
print(‘Accuracy’,accuracy*100)

Output Window:

`Accuracy 84.32432432432432`

This project gives you good experience on exploring diverse concepts in python. After successfully completing this project, you will be able to start your own beginner level projects on Machine Learning.

Happy Coding!!! 