by Narayanan Shivakumar, Hector Garcia-molina
In Proceedings of 1st ACM Conference on Digital Libraries (DL'96
http://www-db.stanford.edu/~shiva/research/Pubs/performance.ps
Add To MetaCart
Abstract:
Often, publishers are reluctant to offer valuable digital documents on the Internet for fear that they will be re-transmitted or copied widely. A Copy Detection Mechanism can help identify such copying. For example, publishers may register their documents with a copy detection server, and the server can then automatically check public sources such as UseNet articles and Web sites for potential illegal copies. The server can search for exact copies, and also for cases where significant portions of documents have been copied. In this paper we study, for the first time, the performance of various copy detection mechanisms, including the disk storage requirements, main memory requirements, response times for registration, and response time for querying. We also contrast performance to the accuracy of the mechanisms (how well they detect partial copies). The results are obtained using SCAM, an experimental server we have implemented, and a collection of 50,000 netnews articles. 1
Citations
|
881
|
Term weighting approaches in automatic text retrieval
– Salton, Buckley
- 1988
|
|
386
|
Suffix arrays: A new method for on-line string searches
– Manber, Myers
- 1993
|
|
160
|
GLIMPSE: A Tool to Search through Entire File Systems
– Manber, Wu
- 1994
|
|
115
|
Copy detection mechanisms for digital documents
– BRIN, DAVIS, et al.
- 1995
|
|
74
|
Electronic marking and identification techniques to discourage document coping
– Brassil, Maxemchuk, et al.
- 1994
|
|
69
|
SCAM: a copy detection mechanism for digital documents
– Shivakumar, Garcia-Molina
- 1995
|
|
36
|
Adaptive sentence boundary disambiguation
– Karttunen, Palmer, et al.
- 1994
|
|
33
|
New indices for text: PAT trees and PAT arrays
– Gonnet, Baeza-Yates, et al.
- 1992
|
|
24
|
Encryption and secure computer networks
– Popek, Kline
- 1979
|
|
22
|
Copyright protection for electronic publishing over computer networks
– Choudhury, Maxemchuk, et al.
- 1994
|
|
9
|
Document marking and identification using both line and word shifting
– Brassil, Low, et al.
- 1994
|
|
8
|
A method for protecting copyright on networks
– Griswold
- 1993
|
|
8
|
Duplicate detection in information dissemination
– Yan, Garcia-Molina
- 1995
|
|
5
|
Computer networks are said to offer new opportunities for plagarists. The Chronicle of Higher Education
– Wheeler
- 1993
|
|
4
|
Editorial: Plagiarism in the web
– Denning
- 1995
|
|
4
|
The state of retrieval system evaluation. Information processing & management
– Salton
- 1992
|