Apr 29, 2013

Stroke Prediction

Stroke is the third leading cause of death in the United States. It is also the principal cause of serious long-term disability. Stroke risk prediction can contribute significantly to its prevention and early treatment. Numerous medical studies and data analyses have been conducted to identify effective predictors of stroke.

Traditional studies adopted features (risk factors) that are verified  by  clinical trials  or  selected manually by medical experts. For example, one famous study by Lumley and others[1] built a 5-year  stroke prediction model using a set of 16 manually selected features. However, these manually selected features could miss some important indicators. For example,  past studies  have shown that there exist additional risk factors  for stroke such  as  creatinine level,  time  to  walk  15 feet,  and others.

The Framingham Study [2] surveyed a wide range of stroke risk factors including blood pressure, the use of anti-hypertensive therapy, diabetes mellitus, cigarette smoking, prior cardiovascular disease, and atrial fibrillation. With  a large  number of features in current medical  datasets, it  is a  cumbersome task to  identify  and  verify  each  risk  factor  manually.  Machine learning algorithms are capable of identifying features highly related to stroke occurrence efficiently from the huge set of features. By doing so, it can improve the prediction accuracy of stroke risk, in addition to discover new risk factors.

In a study by Khosla and others [3], a machine-learning based predictive model was built on stroke data, and several feature selection methods were investigated. Their model was based on automatically selected features. It outperformed existing stroke model. In addition, they were able to identify risk factors that have not been discovered by traditional approaches. The newly identified factors include:

Total medications
Any ECG  abnormality
Min.  ankle  arm  ratio
Maximal  inflation level
Calculated 100 point score
General  health
Minimental score 35 point

It’s exciting to see machine learning play a more important role in medicine and health management.


[1]  T.  Lumley, R. A. Kronmal, M. Cushman, T. A. Manolio, and S. Goldstein. A stroke prediction score in the elderly: Validation and web-based application. Journal of Clinical Epidemiology, 55(2):129–136, February 2002.
[2] P. A. Wolf, R. B. D'Agostino, A. J. Belanger, and W. B. Kannel. Probability of stroke: A risk profile from the Framingham study. Stroke, 22:312{318, March 1991.
[3] Aditya Khosla, Yu Cao, Cliff Chiung-Yu Lin, Hsu-Kuang Chiu, Junling Hu, and Honglak Lee. "An integrated machine learning approach to stroke prediction." In Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 183-192. ACM, 2010.