Cardiovascular Disease Detection using Different Machine Learning Algorithms, A complete project


Download Dataset from Here: DataSet for CardioVascular Disease Detection


 

Cardiovascular Disease Detection using Different Machine Learning Algorithms

In [2]:
In [3]:
In [4]:
Out[4]:
idagegenderheightweightap_hiap_locholesterolglucsmokealcoactivecardio
0018393216862.011080110010
1120228115685.014090310011
2218857116564.013070310001
3317623216982.0150100110011
4417474115656.010060110000
..........................................
699959999319240216876.012080111010
6999699995226011158126.014090220011
6999799996190662183105.018090310101
699989999822431116372.013580120001
699999999920540117072.012080210010

70000 rows × 13 columns

In [5]:
Out[5]:
(70000, 13)
In [6]:
Out[6]:
id             0
age            0
gender         0
height         0
weight         0
ap_hi          0
ap_lo          0
cholesterol    0
gluc           0
smoke          0
alco           0
active         0
cardio         0
dtype: int64
In [7]:
Out[7]:
idagegenderheightweightap_hiap_locholesterolglucsmokealcoactivecardio
count70000.00000070000.00000070000.00000070000.00000070000.00000070000.00000070000.00000070000.00000070000.00000070000.00000070000.00000070000.00000070000.000000
mean49972.41990019468.8658141.349571164.35922974.205690128.81728696.6304141.3668711.2264570.0881290.0537710.8037290.499700
std28851.3023232467.2516670.4768388.21012614.395757154.011419188.4725300.6802500.5722700.2834840.2255680.3971790.500003
min0.00000010798.0000001.00000055.00000010.000000-150.000000-70.0000001.0000001.0000000.0000000.0000000.0000000.000000
25%25006.75000017664.0000001.000000159.00000065.000000120.00000080.0000001.0000001.0000000.0000000.0000001.0000000.000000
50%50001.50000019703.0000001.000000165.00000072.000000120.00000080.0000001.0000001.0000000.0000000.0000001.0000000.000000
75%74889.25000021327.0000002.000000170.00000082.000000140.00000090.0000002.0000001.0000000.0000000.0000001.0000001.000000
max99999.00000023713.0000002.000000250.000000200.00000016020.00000011000.0000003.0000003.0000001.0000001.0000001.0000001.000000
In [8]:
Out[8]:
0    35021
1    34979
Name: cardio, dtype: int64
In [9]:
In [10]:
In [11]:
Out[11]:
idagegenderheightweightap_hiap_locholesterolglucsmokealcoactivecardio
id1.0000000.0034570.003502-0.003038-0.0018300.003356-0.0025290.0061060.002467-0.0036990.0012100.0037550.003799
age0.0034571.000000-0.022811-0.0815150.0536840.0207640.0176470.1544240.098703-0.047633-0.029723-0.0099270.238159
gender0.003502-0.0228111.0000000.4990330.1554060.0060050.015254-0.035821-0.0204910.3381350.1709660.0058660.008109
height-0.003038-0.0815150.4990331.0000000.2909680.0054880.006150-0.050226-0.0185950.1879890.094419-0.006570-0.010821
weight-0.0018300.0536840.1554060.2909681.0000000.0307020.0437100.1417680.1068570.0677800.067113-0.0168670.181660
ap_hi0.0033560.0207640.0060050.0054880.0307021.0000000.0160860.0237780.011841-0.0009220.001408-0.0000330.054475
ap_lo-0.0025290.0176470.0152540.0061500.0437100.0160861.0000000.0240190.0108060.0051860.0106010.0047800.065719
cholesterol0.0061060.154424-0.035821-0.0502260.1417680.0237780.0240191.0000000.4515780.0103540.0357600.0099110.221147
gluc0.0024670.098703-0.020491-0.0185950.1068570.0118410.0108060.4515781.000000-0.0047560.011246-0.0067700.089307
smoke-0.003699-0.0476330.3381350.1879890.067780-0.0009220.0051860.010354-0.0047561.0000000.3400940.025858-0.015486
alco0.001210-0.0297230.1709660.0944190.0671130.0014080.0106010.0357600.0112460.3400941.0000000.025476-0.007330
active0.003755-0.0099270.005866-0.006570-0.016867-0.0000330.0047800.009911-0.0067700.0258580.0254761.000000-0.035653
cardio0.0037990.2381590.008109-0.0108210.1816600.0544750.0657190.2211470.089307-0.015486-0.007330-0.0356531.000000
In [12]:
Out[12]:
<AxesSubplot:>
In [13]:
J:\ML\Anaconda\lib\site-packages\seaborn\_decorators.py:36: FutureWarning: Pass the following variable as a keyword arg: x. From version 0.12, the only valid positional argument will be `data`, and passing other arguments without an explicit keyword will result in an error or misinterpretation.
  warnings.warn(
Out[13]:
<AxesSubplot:xlabel='cardio', ylabel='count'>

Visualizing each column with the Output Column

In [14]:
Out[14]:
<AxesSubplot:xlabel='gender', ylabel='count'>
In [15]:
Out[15]:
<AxesSubplot:xlabel='age', ylabel='count'>
In [16]:
In [17]:
Out[17]:
0        50.0
1        55.0
2        52.0
3        48.0
4        48.0
         ... 
69995    53.0
69996    62.0
69997    52.0
69998    61.0
69999    56.0
Name: new_age, Length: 70000, dtype: float64
In [18]:
Out[18]:
<AxesSubplot:xlabel='new_age', ylabel='count'>
In [19]:
Out[19]:
idagegenderheightweightap_hiap_locholesterolglucsmokealcoactivecardio
0018393216862.011080110010
1120228115685.014090310011
2218857116564.013070310001
3317623216982.0150100110011
4417474115656.010060110000
..........................................
699959999319240216876.012080111010
6999699995226011158126.014090220011
6999799996190662183105.018090310101
699989999822431116372.013580120001
699999999920540117072.012080210010

70000 rows × 13 columns

Dividing Features and Label Columns

In [20]:
In [21]:
Out[21]:
agegenderheightweightap_hiap_locholesterolglucsmokealcoactivenew_age
018393216862.0110801100150.0
120228115685.0140903100155.0
218857116564.0130703100052.0
317623216982.01501001100148.0
417474115656.0100601100048.0
.......................................
6999519240216876.0120801110153.0
69996226011158126.0140902200162.0
69997190662183105.0180903101052.0
6999822431116372.0135801200061.0
6999920540117072.0120802100156.0

70000 rows × 12 columns

In [22]:
In [23]:
Out[23]:
0        0
1        1
2        1
3        1
4        0
        ..
69995    0
69996    1
69997    1
69998    1
69999    0
Name: cardio, Length: 70000, dtype: int64

Dividing into Training and Testing Data

In [24]:
In [25]:
In [26]:
Out[26]:
agegenderheightweightap_hiap_locholesterolglucsmokealcoactivenew_age
2356116136216971.0100801110144.0
3485814615115869.0140802100140.0
5495320507116465.0120801100156.0
5923016720115353.0100601100146.0
173021050115971.0140901100158.0
.......................................
4910021289217578.0120801100158.0
2060919116116468.0120801100052.0
2144018049217882.0120801100149.0
5005721957116977.0120801100060.0
519220671117465.0160902200157.0

49000 rows × 12 columns

In [27]:
Out[27]:
23561    0
34858    1
54953    0
59230    0
1730     1
        ..
49100    1
20609    0
21440    0
50057    1
5192     1
Name: cardio, Length: 49000, dtype: int64

Model Developing using Random Forest

In [28]:
In [29]:
In [30]:
Out[30]:
RandomForestClassifier()
In [31]:
In [32]:
Out[32]:
0.7165714285714285
In [33]:
In [34]:
              precision    recall  f1-score   support

           0       0.70      0.74      0.72     10352
           1       0.73      0.69      0.71     10648

    accuracy                           0.72     21000
   macro avg       0.72      0.72      0.72     21000
weighted avg       0.72      0.72      0.72     21000

In [35]:
In [36]:
[[7681 2671]
 [3281 7367]]
In [37]:
Out[37]:
Text(33.0, 0.5, 'Actual Label')

Model Developing using Decision Tree

In [38]:
In [39]:
In [40]:
Out[40]:
DecisionTreeClassifier()
In [41]:
In [42]:
Out[42]:
0.6322380952380953
In [43]:
In [44]:
              precision    recall  f1-score   support

           0       0.62      0.64      0.63     10352
           1       0.64      0.63      0.63     10648

    accuracy                           0.63     21000
   macro avg       0.63      0.63      0.63     21000
weighted avg       0.63      0.63      0.63     21000

In [45]:
In [46]:
[[6614 3738]
 [3985 6663]]
In [47]:
Out[47]:
Text(33.0, 0.5, 'Actual Label')

Model Developing using Support Vector Machine

In [48]:
In [49]:
In [50]:
Out[50]:
SVC()
In [51]:
Out[51]:
0.6046666666666667
In [52]:
In [53]:
In [54]:
              precision    recall  f1-score   support

           0       0.59      0.66      0.62     10352
           1       0.63      0.55      0.58     10648

    accuracy                           0.60     21000
   macro avg       0.61      0.61      0.60     21000
weighted avg       0.61      0.60      0.60     21000

In [55]:
In [56]:
[[6884 3468]
 [4834 5814]]
In [57]:
Out[57]:
Text(33.0, 0.5, 'Actual Label')

Model Developing using Logistic Regression

In [58]:
In [59]:
In [60]:
Out[60]:
LogisticRegression()
In [61]:
Out[61]:
0.7041904761904761
In [62]:
In [63]:
In [64]:
              precision    recall  f1-score   support

           0       0.68      0.75      0.71     10352
           1       0.73      0.66      0.69     10648

    accuracy                           0.70     21000
   macro avg       0.71      0.70      0.70     21000
weighted avg       0.71      0.70      0.70     21000

In [65]:
In [66]:
[[7744 2608]
 [3604 7044]]
In [67]:
Out[67]:
Text(33.0, 0.5, 'Actual Label')

Model Developing using Gaussian Naive Bayes

In [68]:
In [69]:
In [70]:
Out[70]:
GaussianNB()
In [71]:
Out[71]:
0.5910952380952381
In [72]:
In [73]:
In [74]:
              precision    recall  f1-score   support

           0       0.55      0.89      0.68     10352
           1       0.74      0.30      0.43     10648

    accuracy                           0.59     21000
   macro avg       0.64      0.60      0.55     21000
weighted avg       0.65      0.59      0.55     21000

In [75]:
In [76]:
[[9207 1145]
 [7442 3206]]
In [77]:
Out[77]:
Text(33.0, 0.5, 'Actual Label')

Model Developing using K-Nearest Neighbors

In [78]:
In [79]:
In [80]:
Out[80]:
KNeighborsClassifier()
In [81]:
Out[81]:
0.682047619047619
In [82]:
In [83]:
In [84]:
              precision    recall  f1-score   support

           0       0.67      0.71      0.69     10352
           1       0.70      0.66      0.68     10648

    accuracy                           0.68     21000
   macro avg       0.68      0.68      0.68     21000
weighted avg       0.68      0.68      0.68     21000

In [85]:
In [86]:
[[7328 3024]
 [3653 6995]]
In [87]:
Out[87]:
Text(33.0, 0.5, 'Actual Label')

Model Developing using Linear Discriminant Analysis

In [88]:
In [89]:
In [90]:
Out[90]:
LinearDiscriminantAnalysis()
In [91]:
Out[91]:
0.6458095238095238
In [92]:
In [93]:
In [94]:
              precision    recall  f1-score   support

           0       0.63      0.69      0.66     10352
           1       0.67      0.60      0.63     10648

    accuracy                           0.65     21000
   macro avg       0.65      0.65      0.65     21000
weighted avg       0.65      0.65      0.65     21000

In [95]:
In [96]:
[[7176 3176]
 [4262 6386]]
In [97]:
Out[97]:
Text(33.0, 0.5, 'Actual Label')
In [98]:
In [ ]:

1 comment: