Objectives Introduction Notations Example References
ANALYSIS OF ISCHEMIC HEART DISEASE DATA USING LOGIC REGRESSION
Nadeem Shafique Butt Muhammad Qaiser Shahbaz Asif Hanif 27 Nov 2010
Objectives Introduction Notations Example References
Many observational studies establish whether certain risk factors are associated with a disease. In some situations it is important to study higher order interaction but with commonly available methods it is difficult to study higher order interactions specifically in case of binary covariates. We describe “Logic Regression� method proposed by Ruczinski et al. (2003). For illustration we use data collected from January 2006 to December 2008 from patients admitted in cardiology department at Mayo Hospital Lahore
yylr/ y YiX x HT i S i
Objectives
Introduction Notations Example References
Regression is most important tool in field of Statistics to analyze data and make inference about associations between predictor and response. However, in most regression problems a model is developed that only relates the predictors as they are main effects to response. Interactions between predictors are considered as well but usually kept very simple (2-way or 3-way at maximum)
yylr/ y YiX x HT i S i
Objectives
Introduction Notations Example References
Logic Regression: Given a set of binary predictors X, “Logic Regression” try to create new and better predictors for the response by considering Boolean combination of those binary predictors: Example: If the response variable is binary as well, the method attempt to find decision rules such as if X1, X2,X3 and X4 are true, or X5 or X6 but not X7 then response is more likely to be close to zero The method try to find Boolean statements involving the binary predictors that enhance the prediction for the response variable
y i/ 2 pi Y i HT
i S i
Objectives
Introduction Notations Example References
The aim of this technique is to find those combinations of binary variables that have the highest predictive power for the response. These combination are Boolean Logic Expression and since the predictors are binary, any combination of predictors will be binary.
y i/ 2 pi Y i HT
i S i
Objectives
Introduction Notations Example References
This can be easily seen from the table that higher order interaction will have all zeros. Solution in this case is to find out a classification rule that correctly assign a case to either Y=0 or Y=1 using Boolean Equation
y i/ 2N pi pYi i
d HT S 2pi i i 1 i1
Objectives
Introduction Notations Example References
Search Algorithms: Given a fixed number of predictors, there are only finite many Boolean expression that yield different predictions. If there are “k” predictors then there are 2^2^k different prediction scenarios. And if there are “n” cases and “k” predictors then there might be up to k^n different logic trees. Greedy search algorithm is used find out the best Boolean Combination of the predictors that maximize the predictability
y i/ 2N pi pYi i
d HT S 2pi i i 1 i1
Objectives
Introduction Notations Example References
In this paper we have used Logic Regression technique to model ISCHEMIC HEART DISEASE DATA as proposed by Ruczinski et al. (2003) using “LogicReg” package of “R”
y i/ 2N pi pYi i
d HT S 2pi i i 1 i1
Objectives Introduction
Notations Example References
c c L X1 X 2 X 3 X 4 X 5 X 6 X 7 Operators
: AND :OR c
X : NOT
y i/ 2N pi pYi i
d HT S 2pi i i 1 i1
Objectives Introduction
Notations Example References
y i/ 2N pi pYi i
d HT S 2pi i i 1 i1
Objectives Introduction Notations
Example References
Details of Data Data Collection Duration:
Jan 2006 – Dec 2008
Venue:
Cardiology Department, Mayo Hospital Lahore
Patient Definition:
Under treatment of chest pain, cardiac failure and Syncope.
History Recorded:
Clinical features, cardiovascular risk factors such as hypertension, DM, smoking habits and dyslipidaemic.
Exclusion criteria:
contained sever liver disease, CLD, acute and chronic inflammatory diseases, immunological diseases and sever anemia.
Finally coronary angiography was done on all patients by Judkin’s technique.
y i/ 2N pi pYi i
d HT S 2pi i i 1 i1
Objectives Introduction Notations
Example References
Variable Label
Codes
HD
CHD
0=Normal, 1=CHD
Gender
Gender
0=Female, 1=Male
DM
Diabetes Mellitus
0=No, 1= Yes
HTN
Hypertension
0=No, 1= Yes
IHDF
Family history ischemic heart disease
0=No, 1= Yes
FH
Family history of hypertension
0=No, 1= Yes
DF
Family History of Diabetes Mellitus
0=No, 1= Yes
Smoking
Smoking History
0=No, 1= Yes
Viral
Viral ailment
0=No, 1= Yes
y i/ 2N pi pYi i
d HT S 2pi i i 1 i1
Objectives Introduction Notations
Example References
Logic Regression Model L1: +4.15 * ((((not DM) or (not Gender)) or (HTN or (not IHDF))) and ((Smoking or DF) and ((not DF) or (not HTN))))
L2: -2.35 * (((DM or (not DF)) and IHDF) or ((Gender and (not DF)) or (FH and (not Gender))))
L3: +3.6 * (((IHDF and (not DM)) or ((not Gender) or (not Smoking))) and ((DF or Smoking) or (FH or Gender)))
y i/ 2N pi pYi i
d HT S 2pi i i 1 i1
Objectives Introduction Notations
Example References
• Logic Trees
y i/ 2N pi pYi i
d HT S 2pi i i 1 i1
Objectives Introduction Notations
Example References
y i/ 2N pi pYi i
d HT S 2pi i i 1 i1
Objectives Introduction Notations
Example References
Objectives Introduction Notations Example
References
1.
Thanks