Cardiovascular Disease Detection using Different Machine Learning Algorithms

#Importing Libraries

#Importing Dataset

df

df.shape

(70000, 13)

df.isnull().sum()

id             0
age            0
gender         0
height         0
weight         0
ap_hi          0
ap_lo          0
cholesterol    0
gluc           0
smoke          0
alco           0
active         0
cardio         0
dtype: int64

#Decribing the whole Datset

#Showing the Number of 0 and 1's, 1(Cardiovascular)

0    35021
1    34979
Name: cardio, dtype: int64

#Plotting the Dataset

#Scattering the Plots

#Correlation

#Correlation Matrix Visualization

<AxesSubplot:>

sns.countplot(df['cardio'])

J:\ML\Anaconda\lib\site-packages\seaborn\_decorators.py:36: FutureWarning: Pass the following variable as a keyword arg: x. From version 0.12, the only valid positional argument will be `data`, and passing other arguments without an explicit keyword will result in an error or misinterpretation.
  warnings.warn(

<AxesSubplot:xlabel='cardio', ylabel='count'>

Visualizing each column with the Output Column

sns.countplot(data=df, x="gender", hue="cardio")

<AxesSubplot:xlabel='gender', ylabel='count'>

sns.countplot(data = df, x = 'age', hue = 'cardio')

<AxesSubplot:xlabel='age', ylabel='count'>

#Converting the age into round figure

0        50.0
1        55.0
2        52.0
3        48.0
4        48.0
         ... 
69995    53.0
69996    62.0
69997    52.0
69998    61.0
69999    56.0
Name: new_age, Length: 70000, dtype: float64

sns.countplot(data = df, x = df['new_age'], hue = 'cardio')

<AxesSubplot:xlabel='new_age', ylabel='count'>

df.drop(['new_age'], axis = 'columns')

Dividing Features and Label Columns

x = df.drop(['cardio', 'id'], axis = 'columns')

#Feature Columns

y = df['cardio']

#Output Column

0        0
1        1
2        1
3        1
4        0
        ..
69995    0
69996    1
69997    1
69998    1
69999    0
Name: cardio, Length: 70000, dtype: int64

Dividing into Training and Testing Data

from sklearn.model_selection import train_test_split

xtrain, xtest, ytrain, ytest = train_test_split(x, y, test_size = .30, random_state = 1)

#Showing xtrain

#Showing ytrain

23561    0
34858    1
54953    0
59230    0
1730     1
        ..
49100    1
20609    0
21440    0
50057    1
5192     1
Name: cardio, Length: 49000, dtype: int64

Model Developing using Random Forest

from sklearn.ensemble import RandomForestClassifier

rfc = RandomForestClassifier(n_estimators = 100)

rfc.fit(xtrain, ytrain)

RandomForestClassifier()

pred = rfc.predict(xtest)

rfc.score(xtest, ytest)

0.7165714285714285

cr = classification_report(ytest, pred)

print (cr)

              precision    recall  f1-score   support

           0       0.70      0.74      0.72     10352
           1       0.73      0.69      0.71     10648

    accuracy                           0.72     21000
   macro avg       0.72      0.72      0.72     21000
weighted avg       0.72      0.72      0.72     21000

cm = confusion_matrix(ytest, pred)

print (cm)

[[7681 2671]
 [3281 7367]]

#Visualization of the Confusion Matrix

Text(33.0, 0.5, 'Actual Label')

Model Developing using Decision Tree

from sklearn.tree import DecisionTreeClassifier

dtc = DecisionTreeClassifier()

dtc.fit(xtrain, ytrain)

DecisionTreeClassifier()

pred = dtc.predict(xtest)

dtc.score(xtest, ytest)

0.6322380952380953

cr = classification_report(ytest, pred)

print (cr)

              precision    recall  f1-score   support

           0       0.62      0.64      0.63     10352
           1       0.64      0.63      0.63     10648

    accuracy                           0.63     21000
   macro avg       0.63      0.63      0.63     21000
weighted avg       0.63      0.63      0.63     21000

cm = confusion_matrix(ytest, pred)

print (cm)

[[6614 3738]
 [3985 6663]]

#Visualization of the Confusion Matrix

Text(33.0, 0.5, 'Actual Label')

Model Developing using Support Vector Machine

from sklearn.svm import SVC

svm = SVC()

svm.fit(xtrain, ytrain)

SVC()

svm.score(xtest, ytest)

0.6046666666666667

pred = svm.predict(xtest)

cr = classification_report(ytest, pred)

print (cr)

              precision    recall  f1-score   support

           0       0.59      0.66      0.62     10352
           1       0.63      0.55      0.58     10648

    accuracy                           0.60     21000
   macro avg       0.61      0.61      0.60     21000
weighted avg       0.61      0.60      0.60     21000

cm = confusion_matrix(ytest, pred)

print (cm)

[[6884 3468]
 [4834 5814]]

#Visualization of the Confusion Matrix

Text(33.0, 0.5, 'Actual Label')

Model Developing using Logistic Regression

from sklearn.linear_model import LogisticRegression

lr = LogisticRegression()

lr.fit(xtrain, ytrain)

LogisticRegression()

lr.score(xtest, ytest)

0.7041904761904761

pred = lr.predict(xtest)

cr = classification_report(ytest, pred)

print (cr)

              precision    recall  f1-score   support

           0       0.68      0.75      0.71     10352
           1       0.73      0.66      0.69     10648

    accuracy                           0.70     21000
   macro avg       0.71      0.70      0.70     21000
weighted avg       0.71      0.70      0.70     21000

cm = confusion_matrix(ytest, pred)

print (cm)

[[7744 2608]
 [3604 7044]]

#Visualization of the Confusion Matrix

Text(33.0, 0.5, 'Actual Label')

Model Developing using Gaussian Naive Bayes

from sklearn.naive_bayes import GaussianNB

gnb = GaussianNB()

gnb.fit(xtrain, ytrain)

GaussianNB()

gnb.score(xtest, ytest)

0.5910952380952381

pred = gnb.predict(xtest)

cr = classification_report(ytest, pred)

print (cr)

              precision    recall  f1-score   support

           0       0.55      0.89      0.68     10352
           1       0.74      0.30      0.43     10648

    accuracy                           0.59     21000
   macro avg       0.64      0.60      0.55     21000
weighted avg       0.65      0.59      0.55     21000

cm = confusion_matrix(ytest, pred)

print (cm)

[[9207 1145]
 [7442 3206]]

#Visualization of the Confusion Matrix

Text(33.0, 0.5, 'Actual Label')

Model Developing using K-Nearest Neighbors

from sklearn.neighbors import KNeighborsClassifier

knn = KNeighborsClassifier()

knn.fit(xtrain, ytrain)

KNeighborsClassifier()

knn.score(xtest, ytest)

0.682047619047619

pred = knn.predict(xtest)

cr = classification_report(ytest, pred)

print (cr)

              precision    recall  f1-score   support

           0       0.67      0.71      0.69     10352
           1       0.70      0.66      0.68     10648

    accuracy                           0.68     21000
   macro avg       0.68      0.68      0.68     21000
weighted avg       0.68      0.68      0.68     21000

cm = confusion_matrix(ytest, pred)

print (cm)

[[7328 3024]
 [3653 6995]]

#Visualization of the Confusion Matrix

Text(33.0, 0.5, 'Actual Label')

Model Developing using Linear Discriminant Analysis

from sklearn.discriminant_analysis import LinearDiscriminantAnalysis

lda = LinearDiscriminantAnalysis()

lda.fit(xtrain, ytrain)

LinearDiscriminantAnalysis()

lda.score(xtest, ytest)

0.6458095238095238

pred = lda.predict(xtest)

cr = classification_report(ytest, pred)

print (cr)

              precision    recall  f1-score   support

           0       0.63      0.69      0.66     10352
           1       0.67      0.60      0.63     10648

    accuracy                           0.65     21000
   macro avg       0.65      0.65      0.65     21000
weighted avg       0.65      0.65      0.65     21000

cm = confusion_matrix(ytest, pred)

print (cm)

[[7176 3176]
 [4262 6386]]

#Visualization of the Confusion Matrix

Text(33.0, 0.5, 'Actual Label')

x = np.array(["RF", "DT", "SVM", "LR", "GNB", "KNN", "LDA"])

​

	id	age	gender	height	weight	ap_hi	ap_lo	cholesterol	gluc	smoke	alco	active	cardio
count	70000.000000	70000.000000	70000.000000	70000.000000	70000.000000	70000.000000	70000.000000	70000.000000	70000.000000	70000.000000	70000.000000	70000.000000	70000.000000
mean	49972.419900	19468.865814	1.349571	164.359229	74.205690	128.817286	96.630414	1.366871	1.226457	0.088129	0.053771	0.803729	0.499700
std	28851.302323	2467.251667	0.476838	8.210126	14.395757	154.011419	188.472530	0.680250	0.572270	0.283484	0.225568	0.397179	0.500003
min	0.000000	10798.000000	1.000000	55.000000	10.000000	-150.000000	-70.000000	1.000000	1.000000	0.000000	0.000000	0.000000	0.000000
25%	25006.750000	17664.000000	1.000000	159.000000	65.000000	120.000000	80.000000	1.000000	1.000000	0.000000	0.000000	1.000000	0.000000
50%	50001.500000	19703.000000	1.000000	165.000000	72.000000	120.000000	80.000000	1.000000	1.000000	0.000000	0.000000	1.000000	0.000000
75%	74889.250000	21327.000000	2.000000	170.000000	82.000000	140.000000	90.000000	2.000000	1.000000	0.000000	0.000000	1.000000	1.000000
max	99999.000000	23713.000000	2.000000	250.000000	200.000000	16020.000000	11000.000000	3.000000	3.000000	1.000000	1.000000	1.000000	1.000000

	id	age	gender	height	weight	ap_hi	ap_lo	cholesterol	gluc	smoke	alco	active	cardio
id	1.000000	0.003457	0.003502	-0.003038	-0.001830	0.003356	-0.002529	0.006106	0.002467	-0.003699	0.001210	0.003755	0.003799
age	0.003457	1.000000	-0.022811	-0.081515	0.053684	0.020764	0.017647	0.154424	0.098703	-0.047633	-0.029723	-0.009927	0.238159
gender	0.003502	-0.022811	1.000000	0.499033	0.155406	0.006005	0.015254	-0.035821	-0.020491	0.338135	0.170966	0.005866	0.008109
height	-0.003038	-0.081515	0.499033	1.000000	0.290968	0.005488	0.006150	-0.050226	-0.018595	0.187989	0.094419	-0.006570	-0.010821
weight	-0.001830	0.053684	0.155406	0.290968	1.000000	0.030702	0.043710	0.141768	0.106857	0.067780	0.067113	-0.016867	0.181660
ap_hi	0.003356	0.020764	0.006005	0.005488	0.030702	1.000000	0.016086	0.023778	0.011841	-0.000922	0.001408	-0.000033	0.054475
ap_lo	-0.002529	0.017647	0.015254	0.006150	0.043710	0.016086	1.000000	0.024019	0.010806	0.005186	0.010601	0.004780	0.065719
cholesterol	0.006106	0.154424	-0.035821	-0.050226	0.141768	0.023778	0.024019	1.000000	0.451578	0.010354	0.035760	0.009911	0.221147
gluc	0.002467	0.098703	-0.020491	-0.018595	0.106857	0.011841	0.010806	0.451578	1.000000	-0.004756	0.011246	-0.006770	0.089307
smoke	-0.003699	-0.047633	0.338135	0.187989	0.067780	-0.000922	0.005186	0.010354	-0.004756	1.000000	0.340094	0.025858	-0.015486
alco	0.001210	-0.029723	0.170966	0.094419	0.067113	0.001408	0.010601	0.035760	0.011246	0.340094	1.000000	0.025476	-0.007330
active	0.003755	-0.009927	0.005866	-0.006570	-0.016867	-0.000033	0.004780	0.009911	-0.006770	0.025858	0.025476	1.000000	-0.035653
cardio	0.003799	0.238159	0.008109	-0.010821	0.181660	0.054475	0.065719	0.221147	0.089307	-0.015486	-0.007330	-0.035653	1.000000

Cardiovascular Disease Detection using Different Machine Learning Algorithms, A complete project

Cardiovascular Disease Detection using Different Machine Learning Algorithms

Visualizing each column with the Output Column

Dividing Features and Label Columns

Dividing into Training and Testing Data

Model Developing using Random Forest

Model Developing using Decision Tree

Model Developing using Support Vector Machine

Model Developing using Logistic Regression

Model Developing using Gaussian Naive Bayes

Model Developing using K-Nearest Neighbors

Model Developing using Linear Discriminant Analysis

1 comment:

Click any Bellow Tabs/Tags

LATEST

FOLLOW ME

Popular Posts

Total Pageviews

Subscribe Me

Popular Articles

Archive

Latest Articles

Categories

Comments

Md. Alamgir Hossain

	id	age	gender	height	weight	ap_hi	ap_lo	cholesterol	gluc	smoke	alco	active	cardio
0	0	18393	2	168	62.0	110	80	1	1	0	0	1	0
1	1	20228	1	156	85.0	140	90	3	1	0	0	1	1
2	2	18857	1	165	64.0	130	70	3	1	0	0	0	1
3	3	17623	2	169	82.0	150	100	1	1	0	0	1	1
4	4	17474	1	156	56.0	100	60	1	1	0	0	0	0
...	...	...	...	...	...	...	...	...	...	...	...	...	...
69995	99993	19240	2	168	76.0	120	80	1	1	1	0	1	0
69996	99995	22601	1	158	126.0	140	90	2	2	0	0	1	1
69997	99996	19066	2	183	105.0	180	90	3	1	0	1	0	1
69998	99998	22431	1	163	72.0	135	80	1	2	0	0	0	1
69999	99999	20540	1	170	72.0	120	80	2	1	0	0	1	0

	age	gender	height	weight	ap_hi	ap_lo	cholesterol	gluc	smoke	alco	active	new_age
0	18393	2	168	62.0	110	80	1	1	0	0	1	50.0
1	20228	1	156	85.0	140	90	3	1	0	0	1	55.0
2	18857	1	165	64.0	130	70	3	1	0	0	0	52.0
3	17623	2	169	82.0	150	100	1	1	0	0	1	48.0
4	17474	1	156	56.0	100	60	1	1	0	0	0	48.0
...	...	...	...	...	...	...	...	...	...	...	...	...
69995	19240	2	168	76.0	120	80	1	1	1	0	1	53.0
69996	22601	1	158	126.0	140	90	2	2	0	0	1	62.0
69997	19066	2	183	105.0	180	90	3	1	0	1	0	52.0
69998	22431	1	163	72.0	135	80	1	2	0	0	0	61.0
69999	20540	1	170	72.0	120	80	2	1	0	0	1	56.0

	age	gender	height	weight	ap_hi	ap_lo	cholesterol	gluc	smoke	alco	active	new_age
23561	16136	2	169	71.0	100	80	1	1	1	0	1	44.0
34858	14615	1	158	69.0	140	80	2	1	0	0	1	40.0
54953	20507	1	164	65.0	120	80	1	1	0	0	1	56.0
59230	16720	1	153	53.0	100	60	1	1	0	0	1	46.0
1730	21050	1	159	71.0	140	90	1	1	0	0	1	58.0
...	...	...	...	...	...	...	...	...	...	...	...	...
49100	21289	2	175	78.0	120	80	1	1	0	0	1	58.0
20609	19116	1	164	68.0	120	80	1	1	0	0	0	52.0
21440	18049	2	178	82.0	120	80	1	1	0	0	1	49.0
50057	21957	1	169	77.0	120	80	1	1	0	0	0	60.0
5192	20671	1	174	65.0	160	90	2	2	0	0	1	57.0

Top Links Menu

Cardiovascular Disease Detection using Different Machine Learning Algorithms, A complete project

Cardiovascular Disease Detection using Different Machine Learning Algorithms

Visualizing each column with the Output Column

Dividing Features and Label Columns

Dividing into Training and Testing Data

Model Developing using Random Forest

Model Developing using Decision Tree

Model Developing using Support Vector Machine

Model Developing using Logistic Regression

Model Developing using Gaussian Naive Bayes

Model Developing using K-Nearest Neighbors

Model Developing using Linear Discriminant Analysis

1 comment:

Click any Bellow Tabs/Tags

LATEST

FOLLOW ME

Popular Posts

Total Pageviews

Subscribe Me

Popular Articles

Archive

Latest Articles

Categories

Comments

Md. Alamgir Hossain