Results 1 -
5 of
5
Email thread reassembly using similarity matching
- In Proc. of CEAS
, 2006
"... Email thread reassembly is the task of linking messages by parentchild relationships. In this paper, we present two approaches to address this problem. One exploits previously undocumented header information from the Microsoft Exchange Protocol. The other uses string similarity metrics and a heurist ..."
Abstract
-
Cited by 25 (0 self)
- Add to MetaCart
(Show Context)
Email thread reassembly is the task of linking messages by parentchild relationships. In this paper, we present two approaches to address this problem. One exploits previously undocumented header information from the Microsoft Exchange Protocol. The other uses string similarity metrics and a heuristic algorithm to reassemble threads in the absence of header information. The pros and cons of both methods are discussed. The similarity matching method is evaluated using the Enron email corpus and found to perform well. 1.
Window-based Enterprise Expert Search
- In Proceeddings of the 15th Text REtrieval Conference (TREC
, 2006
"... Abstract. This is the first year for the participation of the City University Centre of Interactive System Research (CISR) in the Expert Search Task. In this paper, we describe an expert search experiment based on windowbased techniques, that is, we build profile for each expert by using information ..."
Abstract
-
Cited by 6 (2 self)
- Add to MetaCart
(Show Context)
Abstract. This is the first year for the participation of the City University Centre of Interactive System Research (CISR) in the Expert Search Task. In this paper, we describe an expert search experiment based on windowbased techniques, that is, we build profile for each expert by using information around the expert’s name and email address in the documents. We then use the traditional IR techniques to search and rank experts. Our experiment is done on Okapi and BM25 is used as the ranking model. Results show that parameter b does have an effect on the retrieval effectiveness and using a smaller value for b produces better results. 1.
An exploratory study of the W3C mailing list test collection for retrieval of emails with pro/con arguments
- Presented at The Third Conference on Email and Anti-Spam
, 2006
"... The W3C mailing list test collection, an information retrieval test collection for email, was developed for the TREC Enterprise Search track in 2005. One task in that track was to retrieve emails that contribute at least one pro/con related to a specific topic. This paper describes the test collecti ..."
Abstract
-
Cited by 4 (1 self)
- Add to MetaCart
(Show Context)
The W3C mailing list test collection, an information retrieval test collection for email, was developed for the TREC Enterprise Search track in 2005. One task in that track was to retrieve emails that contribute at least one pro/con related to a specific topic. This paper describes the test collection and presents a preliminary evaluation of its suitability for evaluating such systems, including an analysis of topic types found in the collection, characterization of interassessor agreement on pro/con judgments, and an example of the evaluation results that can be obtained using the collection. There is clear evidence that the collection is useful in its present form, but several areas for improvement can be identified. In particular, some topic types found in the collection do not seem well suited to pro/con judgment. The paper concludes with suggestions for future work on the design of test collections and information retrieval systems for this task. 1.
Headerless, Quoteless, but not Hopeless? Using Pairwise Email Classification to Disentangle Email Threads
"... Thread disentanglement is the task of sep-arating out conversations whose thread structure is implicit, distorted, or lost. In this paper, we perform email thread dis-entanglement through pairwise classifica-tion, using text similarity measures on non-quoted texts in emails. We show that i) content ..."
Abstract
- Add to MetaCart
Thread disentanglement is the task of sep-arating out conversations whose thread structure is implicit, distorted, or lost. In this paper, we perform email thread dis-entanglement through pairwise classifica-tion, using text similarity measures on non-quoted texts in emails. We show that i) content text similarity metrics out-perform style and structure text similar-ity metrics in both a class-balanced and class-imbalanced setting, and ii) although feature performance is dependent on the semantic similarity of the corpus, con-tent features are still effective even when controlling for semantic similarity. We make available the Enron Threads Corpus, a newly-extracted corpus of 70,178 multi-email threads with emails from the Enron Email Corpus. 1