CiteSeer: The Scientific Literature Digital Library

About CiteSeer

CiteSeer is a scientific literature digital library and search engine that focuses primarily on the literature in computer and information science. CiteSeer aims to improve the dissemination and feedback of the scientific literature and to provide improvements in functionality, usability, availability, cost, comprehensiveness, efficiency, and timeliness in the access of scientific and scholarly knowledge. CiteSeer was developed in 1997 at the NEC Research Institute, Princeton, New Jersey, by Steve Lawrence, Lee Giles and Kurt Bollacker. It is now hosted at the Pennsylvania State University's College of Information Sciences and Technology under the direction of Professor Lee Giles. Isaac Councill is the CiteSeer adminstrator and technical director. The CiteSeer model was used to create a similar search engine, SmealSearch, for academic business documents. CiteSeer also provides mirrors at other sites.

CiteSeer was the first digital library and search engine to provide automated citation indexing and citation linking using the method of autonomous citation indexing.

Rather than creating just another digital library, CiteSeer attempts to provide resources such as algorithms, data, metadata, services, techniques, and software that can be used to promote other digital libraries. CiteSeer indexes PostScript and PDF research articles on the Web, and provides the following features.


Summary of CiteSeer

Autonomous Citation Indexing (ACI)

CiteSeer uses ACI to automatically create a citation index that can be used for literature search and evaluation. Compared to traditional citation indices, ACI provides improvements in cost, availability, comprehensiveness, efficiency, and timeliness. For more information, see Digital Libraries and Autonomous Citation Indexing.

All cited documents

CiteSeer computes citation statistics and related documents for all articles cited in the database, not just the indexed articles.

Reference linking

As with many online publishers, CiteSeer allows browsing the database using citation links.

Citation context

CiteSeer can show the context of citations to a given paper, allowing a researcher to quickly and easily see what other researchers have to say about an article of interest.

Awareness and tracking

CiteSeer provides automatic notification of new citations to given papers, and new papers matching a user profile.

Related documents

CiteSeer locates related documents using citation and word based measures and displays an active and continuously updated bibliography for each document.

Similar documents

CiteSeer shows the percentage of matching sentences between documents.

Full-text indexing

CiteSeer indexes the full-text of the entire articles and citations. Full boolean, phrase and proximity search is supported.

Query-sensitive summaries

CiteSeer provides the context of how query terms are used in articles instead of a generic summary, improving the efficiency of search.

Citation graph analysis

CiteSeer analyzes the graph of citations, e.g. to provide hubs and authorities ranking (ala Kleinberg).

Page images

CiteSeer allows quick and easy viewing of page images.

Up-to-date

CiteSeer is regularly updated.

Powerful search

e.g. CiteSeer allows using author initials to narrow a citation search.

Harvesting of articles

CiteSeer uses search engines and crawling plus document submissions to harvest papers on the Web.

Metadata of articles

CiteSeer automatically extracts and provides metadata from all indexed articles.

Links to other metadata

When possible CiteSeer automatically links to other metadata resources such as DBLP and the ACM Digital Library.

Acknowledgement indexing

CiteSeer is the first search engine and digital library to offer automatic acknowledgement indexing.

Freely available

The full source code of CiteSeer is available at no cost for non-commercial use.


Document metadata
Computer science directory

Most cited authors in computer science
Most accessed documents in CiteSeer

Feedback


Credits:

Many have contributed to CiteSeer and its continuing development. In a list in which some are surely missing, we would like to thank Joshua Alspector, Jose Nelson Amaral, Anders Ardo, Bill Arms, Shumeet Baluja, Arunava Banerjee, Eric Baum, Donna Bergmark, Levent Bolelli, Shannon Bradshaw, Vivek Bhatnagar, Jay Budzik, Robert Cameron, Jack Carroll, Rich Caruana, Ingemar Cox, Sandip Debnath, Seyda Ertekin, Scott Fahlman, Gary Flake, Ed Fox, Eugene Garfield, Bill Gear, Paul Ginsparg, Eric Glover, Abby Goodrum, Marco Gori, Allan Gottlieb, Jim Gray, Hui Han, Steve Hanson, Stevan Harnad, Eric Hellman, Hui Han, Haym Hirsh, Steve Hitchcock, Jian Huang, Gerd Hoff, Ernesto Di Iorio, Jim Jansen, Paul Kantor, Jon Kleinberg, Thomas Krichel, Bob Krovetz, Carl Lagoze, Andrea LaPaugh, Wang-Chien Lee, Jay Lepreau, Michael Lesk, Huajing Li, Marco Maggini, Eren Manavoglu, Andrew McCallum, Steve Minton, Tom Mitchell, Prasenjit Mitra, Finn Nielsen, Michael Nelson, Craig Nevill-Manning, Andrew Ng, Andrew Odlyzko, David Pennock, Yves Petinot, Brian Pinkerton, Alexandrin Popescul, Augusto Pucci, Betsy Richmond, Ben Schafer, Bruce Schatz, Terrence Sejnowski, Anand Sivasubramaniam, Warren Smith, Yang Song, Amanda Spink, Yang Sun, Harold Stone, Pradeep Teregowda, Kostas Tsioutsiouliklis, Valerie Tucci, Lyle Ungar, Frits Vaandrager, Moshe Vardi, David Waltz, James Ze Wang, Ian Witten, Hongyuag Zha, Ding Zhou, and Ziming Zhuang.

Special credits:

Cuauhtémoc Rivera designed the main search page.

CiteSeer.IST extracts title and author information from the header of PostScript files which was first done by Andrew Ng.

The New Zealand Digital Library was the first to index the full text of PostScript research articles.

The idea of citation indexing of the scientific literature was created by Dr. Eugene Garfield.