### Learning to Prevent Healthcare-Associated Infections: Leveraging Data Across Time and Space to Improve Local Predictions

, 2014

"... The proliferation of electronic medical records holds out the promise of using machine learn-ing and data mining to build models that will help healthcare providers improve patient outcomes. However, building useful models from these datasets presents many technical problems. Among the challenges ar ..."

Abstract
- Add to MetaCart

(Show Context)
The proliferation of electronic medical records holds out the promise of using machine learn-ing and data mining to build models that will help healthcare providers improve patient outcomes. However, building useful models from these datasets presents many technical problems. Among the challenges are the large number of factors (both intrinsic and extrin-sic) influencing a patient's risk of an adverse outcome, the inherent evolution of that risk over time, and the relative rarity of adverse outcomes, institutional differences and the lack of ground truth. In this thesis we tackle these challenges in the context of predicting healthcare-associated infections (HAIs). HAIs are a serious problem in US acute care hospitals, affecting approx-imately 4 % of all inpatients on any given day. Despite best efforts to reduce incidence, HAIs remain stubbornly prevalent. We hypothesize that one of the reasons why is lack of an effective clinical tool for accurately measuring patient risk. Therefore, we develop accurate models for predicting which patients are at risk of ac-quiring an infection with Clostridium difficile (a common HAI). In contrast to previous

### Prediction of USD/JPY Exchange Rate Time Series Directional Status by KNN with Dynamic Time Warping AS Distance Function

"... Abstract--- Exchange rate prediction is a challenging topic in the recent decade. Various studies have been done to improve the prediction regarding the accuracy in terms of level error and directional status error. The aim of this paper is to introduce a methodology that uses KNN (K-nearest neighbo ..."

Abstract
- Add to MetaCart

(Show Context)
Abstract--- Exchange rate prediction is a challenging topic in the recent decade. Various studies have been done to improve the prediction regarding the accuracy in terms of level error and directional status error. The aim of this paper is to introduce a methodology that uses KNN (K-nearest neighbors) and DTW (dynamic time warping) to improve the fluctuation prediction and to have better evaluation parameters in the literature of financial market forecasting, comparing to other researches. The study is done with USD/JPY(United States Dollar/Japanese Yen) exchange rate time series and the results show improvement of prediction regarding the direction of time series. USD/JPY exchange rates are gathered from 1971 to 2012 and are partitioned into 30 element segments regarding the monthly cyclic behavior of the time series. Then two different set of these 30 element segments are divided with 7:3 ratio and the KNN is used to find out the 3 nearest neighbors regarding the DTW as similarity function. By a chosen function introduced also in this research, the directional status of the last element is predicted and the prediction result is then compared with other results in the literature of exchange rate prediction.

### How to Classify Tutorial Dialogue? Comparing Feature Vectors vs. Sequences

"... A key issue in using machine learning to classify tutorial dialogues is how to represent time-varying data. Standard classifiers take as input a feature vector and output its predicted label. It is possible to formulate tutorial dialogue classification problems in this way. However, a feature vector ..."

Abstract
- Add to MetaCart

A key issue in using machine learning to classify tutorial dialogues is how to represent time-varying data. Standard classifiers take as input a feature vector and output its predicted label. It is possible to formulate tutorial dialogue classification problems in this way. However, a feature vector representation requires mapping a dialogue onto a fixed number of features, and does not innately exploit its sequential nature. In contrast, this paper explores a recent method that classifies sequences, using a technique new to the Educational Data Mining community – Hidden Conditional Random Fields [Quattoni et al., 2007]. We illustrate its application to a data set from Project LISTEN's Reading Tutor, and compare it to three baselines using the same data, crossvalidation splits, and feature set. Our technique produces state-of-the-art classification accuracy in predicting reading task completion. We consider the contributions of this paper to be (i) introducing HCRFs to the EDM community, (ii) formulating tutorial dialogue classification as a sequence classification problem, and (iii) evaluating and comparing dialogue classification.

### Protein Data Representation: A Survey

"... One of the critical issues in bioinformatics is the data structure used for representing the protein data; this representation is a base for the operations applied such as sequence alignment, structure alignment and motif finding. In this paper, a survey of different representations and wellknown da ..."

Abstract
- Add to MetaCart

(Show Context)
One of the critical issues in bioinformatics is the data structure used for representing the protein data; this representation is a base for the operations applied such as sequence alignment, structure alignment and motif finding. In this paper, a survey of different representations and wellknown data structures used for protein data is presented from a computer science perspective. This work presents a survey and summarizes the efforts done for protein data representation and approximation. Hence, it could be a basic reference for research that is aiming to develop applications in the field of bioinformatics.

### Spectral Learning of Infinite Mixture of Hidden Markov Models for Human Action Recognition

"... In this work, we approach the sequence clustering and classification problem using an infinite mixture. For each class, we learn a mixture of multiple observation Hidden Markov Models (HMM). The exact inference requires the evaluation of an intractable integral over HMM parameters. Assuming that the ..."

Abstract
- Add to MetaCart

(Show Context)
In this work, we approach the sequence clustering and classification problem using an infinite mixture. For each class, we learn a mixture of multiple observation Hidden Markov Models (HMM). The exact inference requires the evaluation of an intractable integral over HMM parameters. Assuming that the observed sequences are sufficiently long, we approximate the intractable integral by a spectral method based on the approach by Hsu et.al [1]. We apply the resulting algorithm on human action recognition problem. We show that doing clustering in the training phase improves the classification accuracy, as many human action sequences tend to be multimodal. Our results also suggest an improved accuracy when compared to a more conventional Expectation Maximization approach. 1

### Convolution Kernels for Discriminative Learning from Streaming Text

"... Time series modeling is an important problem with many ap-plications in different domains. Here we consider discrimi-native learning from time series, where we seek to predict an output response variable based on time series input. We develop a method based on convolution kernels to model dis-crimin ..."

Abstract
- Add to MetaCart

Time series modeling is an important problem with many ap-plications in different domains. Here we consider discrimi-native learning from time series, where we seek to predict an output response variable based on time series input. We develop a method based on convolution kernels to model dis-criminative learning over streams of text. Our method outper-forms competitive baselines in three synthetic and two real datasets, rumour frequency modeling and popularity predic-tion tasks.

### Approaches for Pattern Discovery Using Sequential Data Mining

"... In this chapter we first introduce sequence data. We then discuss different approaches for mining of patterns from sequence data, studied in literature. Apriori based methods and the pattern growth methods are the earliest and the most influential methods for sequential pattern mining. There is also ..."

Abstract
- Add to MetaCart

In this chapter we first introduce sequence data. We then discuss different approaches for mining of patterns from sequence data, studied in literature. Apriori based methods and the pattern growth methods are the earliest and the most influential methods for sequential pattern mining. There is also a vertical format based method which works on a dual representation of the sequence database. Work has also been done for mining patterns with constraints, mining closed patterns, mining patterns from multi-dimensional databases, mining closed repetitive gapped subsequences, and other forms of sequential pattern mining. Some works also focus on mining incremental patterns and mining from stream data. We present at least one method of each of these types and discuss their advantages and disadvantages. We conclude with a summary

### On Finding the Point Where There Is No Return: Turning Point Mining on Game Data

"... Gaming expertise is usually accumulated through playing or watching many game instances, and identifying critical moments in these game instances called turning points. Turning point rules (shorten as TPRs) are game patterns that almost always lead to some irreversible outcomes. In this paper, we fo ..."

Abstract
- Add to MetaCart

(Show Context)
Gaming expertise is usually accumulated through playing or watching many game instances, and identifying critical moments in these game instances called turning points. Turning point rules (shorten as TPRs) are game patterns that almost always lead to some irreversible outcomes. In this paper, we formulate the notion of irreversible outcome property which can be combined with pattern mining so as to automatically extract TPRs from any given game datasets. We specifically extend the well-known PrefixSpan sequence mining algorithm by incorporating the irreversible outcome property. To show the usefulness of TPRs, we apply them to Tetris, a popular game. We mine TPRs from Tetris games and generate challenging game sequences so as to help training an intelligent Tetris algorithm. Our experiment results show that 1) TPRs can be found from historical game data automatically with reasonable scalability, 2) our TPRs are able to help Tetris algorithm perform better when it is trained with challenging game sequences. 1