MetaCartSign in to MyCiteSeer

Include Citations | Advanced Search | Help

Include Citations | Advanced Search | Help

  Building a scalable and accurate copy detection mechanism (1996) [53 citations — 9 self]

Download:
pdf | ps
by Narayanan Shivakumar, Hector Garcia-molina
In Proceedings of 1st ACM Conference on Digital Libraries (DL'96
http://www-db.stanford.edu/~shiva/research/Pubs/performance.ps
Add To MetaCart

Abstract:

Often, publishers are reluctant to offer valuable digital documents on the Internet for fear that they will be re-transmitted or copied widely. A Copy Detection Mechanism can help identify such copying. For example, publishers may register their documents with a copy detection server, and the server can then automatically check public sources such as UseNet articles and Web sites for potential illegal copies. The server can search for exact copies, and also for cases where significant portions of documents have been copied. In this paper we study, for the first time, the performance of various copy detection mechanisms, including the disk storage requirements, main memory requirements, response times for registration, and response time for querying. We also contrast performance to the accuracy of the mechanisms (how well they detect partial copies). The results are obtained using SCAM, an experimental server we have implemented, and a collection of 50,000 netnews articles. 1

Citations

881 Term weighting approaches in automatic text retrieval – Salton, Buckley - 1988
386 Suffix arrays: A new method for on-line string searches – Manber, Myers - 1993
160 GLIMPSE: A Tool to Search through Entire File Systems – Manber, Wu - 1994
115 Copy detection mechanisms for digital documents – BRIN, DAVIS, et al. - 1995
74 Electronic marking and identification techniques to discourage document coping – Brassil, Maxemchuk, et al. - 1994
69 SCAM: a copy detection mechanism for digital documents – Shivakumar, Garcia-Molina - 1995
36 Adaptive sentence boundary disambiguation – Karttunen, Palmer, et al. - 1994
33 New indices for text: PAT trees and PAT arrays – Gonnet, Baeza-Yates, et al. - 1992
24 Encryption and secure computer networks – Popek, Kline - 1979
22 Copyright protection for electronic publishing over computer networks – Choudhury, Maxemchuk, et al. - 1994
9 Document marking and identification using both line and word shifting – Brassil, Low, et al. - 1994
8 A method for protecting copyright on networks – Griswold - 1993
8 Duplicate detection in information dissemination – Yan, Garcia-Molina - 1995
5 Computer networks are said to offer new opportunities for plagarists. The Chronicle of Higher Education – Wheeler - 1993
4 Editorial: Plagiarism in the web – Denning - 1995
4 The state of retrieval system evaluation. Information processing & management – Salton - 1992