Results 1 -
1 of
1
More Features Are Not Always Better: Evaluating Generalizing Models in Incident Type Classification of Tweets
"... Social media represents a rich source of up-to-date information about events such as in-cidents. The sheer amount of available infor-mation makes machine learning approaches a necessity for further processing. This learn-ing problem is often concerned with region-ally restricted datasets such as dat ..."
Abstract
- Add to MetaCart
(Show Context)
Social media represents a rich source of up-to-date information about events such as in-cidents. The sheer amount of available infor-mation makes machine learning approaches a necessity for further processing. This learn-ing problem is often concerned with region-ally restricted datasets such as data from only one city. Because social media data such as tweets varies considerably across differ-ent cities, the training of efficient models re-quires labeling data from each city of inter-est, which is costly and time consuming. In this study, we investigate which features are most suitable for training generalizable models, i.e., models that show good per-formance across different datasets. We re-implemented the most popular features from the state of the art in addition to other novel approaches, and evaluated them on data from ten different cities. We show that many so-phisticated features are not necessarily valu-able for training a generalized model and are outperformed by classic features such as plain word-n-grams and character-n-grams. 1