Results 1 - 10
of
628
An extensive empirical study of feature selection metrics for text classification
- J. of Machine Learning Research
, 2003
"... Machine learning for text classification is the cornerstone of document categorization, news filtering, document routing, and personalization. In text domains, effective feature selection is essential to make the learning task efficient and more accurate. This paper presents an empirical comparison ..."
Abstract
-
Cited by 496 (15 self)
- Add to MetaCart
Machine learning for text classification is the cornerstone of document categorization, news filtering, document routing, and personalization. In text domains, effective feature selection is essential to make the learning task efficient and more accurate. This paper presents an empirical comparison
Collaborative filtering with privacy via factor analysis
- In Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval
, 2002
"... Collaborative filtering is valuable in e-commerce, and for direct recommendations for music, movies, news etc. But today’s systems use centralized databases and have several disadvantages, including privacy risks. As we move toward ubiquitous computing, there is a great potential for individuals to ..."
Abstract
-
Cited by 210 (9 self)
- Add to MetaCart
Collaborative filtering is valuable in e-commerce, and for direct recommendations for music, movies, news etc. But today’s systems use centralized databases and have several disadvantages, including privacy risks. As we move toward ubiquitous computing, there is a great potential for individuals
Peer-to-Peer Information Retrieval Using Self-Organizing Semantic Overlay Networks
, 2003
"... Content-based full-text search is a challenging problem in Peer-toPeer (P2P) systems. Traditional approaches have either been centralized or use flooding to ensure accuracy of the results returned. In this paper, we present pSearch, a decentralized non-flooding P2P information retrieval system. pSea ..."
Abstract
-
Cited by 242 (7 self)
- Add to MetaCart
show that pSearch can achieve performance comparable to centralized information retrieval systems by searching only a small number of nodes. For a system with 128,000 nodes and 528,543 documents (from news, magazines, etc.), pSearch searches only 19 nodes and transmits only 95.5KB data during
A Contextual-Bandit Approach to Personalized News Article Recommendation
"... Personalized web services strive to adapt their services (advertisements, news articles, etc.) to individual users by making use of both content and user information. Despite a few recent advances, this problem remains challenging for at least two reasons. First, web service is featured with dynamic ..."
Abstract
-
Cited by 178 (16 self)
- Add to MetaCart
Personalized web services strive to adapt their services (advertisements, news articles, etc.) to individual users by making use of both content and user information. Despite a few recent advances, this problem remains challenging for at least two reasons. First, web service is featured
Periods, Capitalized Words, etc.
- Computational Linguistics
, 1999
"... this paper we present an approach which tackles three problems: sentence boundary disambiguation, disambiguation of capitalized words when they are used in positions where capitalization is expected and identification of abbreviations. All these tasks are important tasks of text normalization, which ..."
Abstract
-
Cited by 7 (0 self)
- Add to MetaCart
on capitalized words and about 99.3-99.7% accuracy on sentence boundaries. This performance is the highest quoted in the literature for the tasks. We also present the results of applying our system to a corpus of news in Russian and training a part-of-speech tagger which uses a maximum entropy model
equipment vendors and manufacturers, etc.
, 1996
"... erous Internet sites and listservs information for printers on pollution prevention, regulatory compliance and s information may be useful to other printers. information on web sites and listservs of which th P b Sites A’s electronic library of ’ n on pollution preve and environment ude: contacts, t ..."
Abstract
- Add to MetaCart
, trainin and news; federal regulations, executive orders, laws; pollution prevention technical information, resources. on-line access to the tion Prevention Reference Register. It also incorporates the “Solvents Alternatives Umbrella ” which allows users to tion about substitutes for toxic s many as 16
Periods, Capitalized Words, etc.
"... In this paper we present an approach which tackles three problems: sentence boundary disambiguation, disambiguation of capitalized words when they are used inpositions where capitalization is expected and identi cation of abbreviations. All these tasks are important tasks of text normalization, whic ..."
Abstract
- Add to MetaCart
on capitalized words and about 99.3-99.7 % accuracy on sentence boundaries. This performance is the highest quoted in the literature for the tasks. We also present the results of applying our system to a corpus of news in Russian and training a part-of-speech tagger which uses a maximum entropy model
TAGME: On-the-fly annotation of short text fragents (by Wikipedia entities). Available on http://arxiv.org/abs/1006.3498
"... We designed and implemented Tagme, a system that is able to efficiently and judiciously augment a plain-text with pertinent hyperlinks to Wikipedia pages. The specialty of Tagme with respect to known systems [5, 8] is that it may annotate texts which are short and poorly composed, such as snippets o ..."
Abstract
-
Cited by 82 (6 self)
- Add to MetaCart
of search-engine results, tweets, news, etc.. This annotation is extremely informative, so any task that is currently addressed using the bag-of-words paradigm could benefit from using this annotation to draw upon (the millions of) Wikipedia pages and their inter-relations. Categories andSubject Descriptors
Who Says What to Whom on Twitter
, 2011
"... We study several longstanding questions in media communications research, in the context of the microblogging service Twitter, regarding the production, flow, and consumption of information. To do so, we exploit a recently introduced feature of Twitter known as “lists ” to distinguish between elite ..."
Abstract
-
Cited by 136 (7 self)
- Add to MetaCart
the media produces the most information, but celebrities are the most followed. We also find significant homophily within categories: celebrities listen to celebrities, while bloggers listen to bloggers etc; however, bloggers in general rebroadcast more information than the other categories. Next we re
Ranking a stream of news
- In WWW ’05: Proceedings of the 14th international conference on World Wide Web
, 2005
"... According to a recent survey made by Nielsen NetRatings, searching on news articles is one of the most important activity online. Indeed, Google, Yahoo, MSN and many others have proposed commercial search engines for indexing news feeds. Despite this commercial interest, no academic research has foc ..."
Abstract
-
Cited by 39 (1 self)
- Add to MetaCart
,000 pieces of news, produced in two months by more then 2000 news sources belonging to 13 different categories (World, U.S, Europe, Sports, Business, etc). This collection is extracted from the index of comeTo-MyHead, an academic news search engine available online. 1.
Results 1 - 10
of
628