• Documents
  • Authors
  • Tables
  • Log in
  • Sign up
  • MetaCart
  • DMCA
  • Donate

CiteSeerX logo

Advanced Search Include Citations

Tools

Sorted by:
Try your query at:
Semantic Scholar Scholar Academic
Google Bing DBLP
Results 1 - 10 of 628
Next 10 →

An extensive empirical study of feature selection metrics for text classification

by George Forman, Isabelle Guyon, André Elisseeff - J. of Machine Learning Research , 2003
"... Machine learning for text classification is the cornerstone of document categorization, news filtering, document routing, and personalization. In text domains, effective feature selection is essential to make the learning task efficient and more accurate. This paper presents an empirical comparison ..."
Abstract - Cited by 496 (15 self) - Add to MetaCart
Machine learning for text classification is the cornerstone of document categorization, news filtering, document routing, and personalization. In text domains, effective feature selection is essential to make the learning task efficient and more accurate. This paper presents an empirical comparison

Collaborative filtering with privacy via factor analysis

by John Canny - In Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval , 2002
"... Collaborative filtering is valuable in e-commerce, and for direct recommendations for music, movies, news etc. But today’s systems use centralized databases and have several disadvantages, including privacy risks. As we move toward ubiquitous computing, there is a great potential for individuals to ..."
Abstract - Cited by 210 (9 self) - Add to MetaCart
Collaborative filtering is valuable in e-commerce, and for direct recommendations for music, movies, news etc. But today’s systems use centralized databases and have several disadvantages, including privacy risks. As we move toward ubiquitous computing, there is a great potential for individuals

Peer-to-Peer Information Retrieval Using Self-Organizing Semantic Overlay Networks

by Chunqiang Tang, Zhichen Xu, Sandhya Dwarkadas , 2003
"... Content-based full-text search is a challenging problem in Peer-toPeer (P2P) systems. Traditional approaches have either been centralized or use flooding to ensure accuracy of the results returned. In this paper, we present pSearch, a decentralized non-flooding P2P information retrieval system. pSea ..."
Abstract - Cited by 242 (7 self) - Add to MetaCart
show that pSearch can achieve performance comparable to centralized information retrieval systems by searching only a small number of nodes. For a system with 128,000 nodes and 528,543 documents (from news, magazines, etc.), pSearch searches only 19 nodes and transmits only 95.5KB data during

A Contextual-Bandit Approach to Personalized News Article Recommendation

by Lihong Li, Wei Chu, John Langford, Robert E. Schapire
"... Personalized web services strive to adapt their services (advertisements, news articles, etc.) to individual users by making use of both content and user information. Despite a few recent advances, this problem remains challenging for at least two reasons. First, web service is featured with dynamic ..."
Abstract - Cited by 178 (16 self) - Add to MetaCart
Personalized web services strive to adapt their services (advertisements, news articles, etc.) to individual users by making use of both content and user information. Despite a few recent advances, this problem remains challenging for at least two reasons. First, web service is featured

Periods, Capitalized Words, etc.

by Andrei Mikheev - Computational Linguistics , 1999
"... this paper we present an approach which tackles three problems: sentence boundary disambiguation, disambiguation of capitalized words when they are used in positions where capitalization is expected and identification of abbreviations. All these tasks are important tasks of text normalization, which ..."
Abstract - Cited by 7 (0 self) - Add to MetaCart
on capitalized words and about 99.3-99.7% accuracy on sentence boundaries. This performance is the highest quoted in the literature for the tasks. We also present the results of applying our system to a corpus of news in Russian and training a part-of-speech tagger which uses a maximum entropy model

equipment vendors and manufacturers, etc.

by unknown authors , 1996
"... erous Internet sites and listservs information for printers on pollution prevention, regulatory compliance and s information may be useful to other printers. information on web sites and listservs of which th P b Sites A’s electronic library of ’ n on pollution preve and environment ude: contacts, t ..."
Abstract - Add to MetaCart
, trainin and news; federal regulations, executive orders, laws; pollution prevention technical information, resources. on-line access to the tion Prevention Reference Register. It also incorporates the “Solvents Alternatives Umbrella ” which allows users to tion about substitutes for toxic s many as 16

Periods, Capitalized Words, etc.

by Andrei Mikheev Y
"... In this paper we present an approach which tackles three problems: sentence boundary disambiguation, disambiguation of capitalized words when they are used inpositions where capitalization is expected and identi cation of abbreviations. All these tasks are important tasks of text normalization, whic ..."
Abstract - Add to MetaCart
on capitalized words and about 99.3-99.7 % accuracy on sentence boundaries. This performance is the highest quoted in the literature for the tasks. We also present the results of applying our system to a corpus of news in Russian and training a part-of-speech tagger which uses a maximum entropy model

TAGME: On-the-fly annotation of short text fragents (by Wikipedia entities). Available on http://arxiv.org/abs/1006.3498

by Paolo Ferragina, Ugo Scaiella
"... We designed and implemented Tagme, a system that is able to efficiently and judiciously augment a plain-text with pertinent hyperlinks to Wikipedia pages. The specialty of Tagme with respect to known systems [5, 8] is that it may annotate texts which are short and poorly composed, such as snippets o ..."
Abstract - Cited by 82 (6 self) - Add to MetaCart
of search-engine results, tweets, news, etc.. This annotation is extremely informative, so any task that is currently addressed using the bag-of-words paradigm could benefit from using this annotation to draw upon (the millions of) Wikipedia pages and their inter-relations. Categories andSubject Descriptors

Who Says What to Whom on Twitter

by Shaomei Wu, Winter A. Mason , 2011
"... We study several longstanding questions in media communications research, in the context of the microblogging service Twitter, regarding the production, flow, and consumption of information. To do so, we exploit a recently introduced feature of Twitter known as “lists ” to distinguish between elite ..."
Abstract - Cited by 136 (7 self) - Add to MetaCart
the media produces the most information, but celebrities are the most followed. We also find significant homophily within categories: celebrities listen to celebrities, while bloggers listen to bloggers etc; however, bloggers in general rebroadcast more information than the other categories. Next we re

Ranking a stream of news

by Gianna M. Del Corso, Antonio Gullí, Francesco Romani - In WWW ’05: Proceedings of the 14th international conference on World Wide Web , 2005
"... According to a recent survey made by Nielsen NetRatings, searching on news articles is one of the most important activity online. Indeed, Google, Yahoo, MSN and many others have proposed commercial search engines for indexing news feeds. Despite this commercial interest, no academic research has foc ..."
Abstract - Cited by 39 (1 self) - Add to MetaCart
,000 pieces of news, produced in two months by more then 2000 news sources belonging to 13 different categories (World, U.S, Europe, Sports, Business, etc). This collection is extracted from the index of comeTo-MyHead, an academic news search engine available online. 1.
Next 10 →
Results 1 - 10 of 628
Powered by: Apache Solr
  • About CiteSeerX
  • Submit and Index Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2019 The Pennsylvania State University