| Fawcett, T., & Provost, F. (1997). Adaptive fraud detection. Data Mining and Knowledge Discovery, 3, 291--316. |
....of Computer Science and Engineering, Chinese University of Hong Kong, Shatin, N.T. Hong Kong. adafu cse.cuhk.edu.hk,jtang cse.cuhk.edu.hk 1 Introduction We are interested in the problem of outlier detection, which is the discovery of data that deviate a lot from other data patterns. Hawkins [7] characterizes an outlier in a quite intuitive way as follows: An outlier is an observation that deviates so much from other observations as to arouse suspicion that it was generated by a different mechanism. Most methods in the early work that detects outliers independently have been developed ....
T. Fawcett, F. Provost: "Adaptive Fraud Detection", Data Mining and Knowledge Discovery Journal, Kluwer Academic Publishers, Vol. 1, No. 3, 1997, pp 291 - 316.
....research projects that also applied data mining to intrusion detection. An overview of these projects and a general treatment of data mining in computer security can be found in a recent book edited by Barbara and Jajodia [4] Data mining for fraud detection is investigated by Fawcett and Provost [15], and by Chan and Stolfo [10] Alarm correlation systems [12, 14, 40, 42] try to group alarms so that the alarms of the same group pertain to the same phenomenon (e.g. the same attack) In that way, they o#er a more condensed view on the security issues raised by an IDS. The work by Dain and ....
T. Fawcett and F. Provost. Adaptive Fraud Detection. 1997.
....data, especially in biomedical engineering, telecommunications, geospatial exploration, and climate and Earth ecosystem modeling. BY JIAWEI HAN, RUSS B. ALTMAN, VIPIN KUMAR, HEIKKI MANNILA, AND DARYL PREGIBON TERRY MIURA billing; it enables data mining applications in tollfraud detection [1] and consumer marketing [2] Perhaps the best known marketing application of data mining, albeit via unconfirmed anecdote, concerns MCI s Friends Family promotion launched in the domestic U.S. market in 1991. As the anecdote goes, market researchers observed relatively small subgraphs in this ....
Fawcett, T. and Provost, F. Adaptive fraud detection. Data Min. Knowl. Disc. 1 (1997), 291--316, 1997.
....these are more likely to represent cases of fraud. Fraud detection in insurance, banking and telecommunications are major application areas for data mining. Detected outliers can indicate individuals or groups of customers that have behaviour outside the range of what is considered normal [8, 6, 21]. Studies from the field of statistics have typically considered outliers to be residuals or deviations from a regression or density model of the data: An outlier is an observation that deviates so much from other observations as to arouse suspicion that it was generated by a di#erent ....
T. Fawcett and F. Provost. Adaptive fraud detection. Data Mining and Knowledge Discovery Journal, 1(3):291--316, 1997.
....constructed in a truly one to one manner since these rules are specified by the expert rather than learned from the data and are applicable only to groups of customers. In addition to the developments in the industry, the profiling problem was also studied in the data mining academic community in [30, 31, 6, 2, 24]. In particular, 30, 31] studied this problem within the context of fraud detection in the cellular phone industry. This was done by learning rules pertaining to individual customers from the cellular phone usage data using the rule learning system RL [26] However, these discovered rules were ....
....since these rules are specified by the expert rather than learned from the data and are applicable only to groups of customers. In addition to the developments in the industry, the profiling problem was also studied in the data mining academic community in [30, 31, 6, 2, 24] In particular, [30, 31] studied this problem within the context of fraud detection in the cellular phone industry. This was done by learning rules pertaining to individual customers from the cellular phone usage data using the rule learning system RL [26] However, these discovered rules were used not for the purpose of ....
[Article contains additional citation context not shown here]
T. Fawcett and F. Provost. Adaptive fraud detection. Journal of Data Mining and Knowledge Discovery, 1(3):291 316, 1997.
....feature values that vary for different instances. In such a case, although misclassification type is the same, costs can be considerably diverse. Fawcett et al. are among the first ones who incorporated feature dependent costs in their classification problem. In cellular cloning fraud detection [23] used a variable cost matrix based on the fraudulent airtime used. Naturally, this is due to the more cost of prolonged fake calls. Static cost notion is inappropriate in such situations. Since the credit card fraud detection domain is extremely dependent on the dollar amount of each credit card ....
T. Fawcett and F. Provost. Adaptive fraud detection. Journal of Data Mining and Knowledge Discovery, 1(3):291-316, 1997.
....distributed databases of information about transaction 48 behaviors to produce models of probably fraudulent transactions. An orthogonal approach to modeling transactions would be to model user behavior. An application of this method, but in cellular phone fraud detection has been examined in [ Fawcett Provost, 1997 ] The key di#culties in our strategy are: financial companies do not share their data for a number of (competitive and legal) reasons; the databases that companies maintain on transaction behavior are huge and growing rapidly; real time analysis is highly desirable to update models when new ....
.... correctly a fraudulent transaction x i and FA(C j , x i ) FA: False Alarm) returns one when classifier In comparing the classifiers, one can replace the TP FP spread, which defines a certain family of curves in the ROC plot, with a di#erent metric or even with a complete analysis [Provost Fawcett, 1997; 1998] in the ROC space. 52 C j misclassifies a legitimate transaction x i . I(x i , Y ) inspects the transaction amount transamt of transaction x i , and returns one if it greater than the overhead Y and zero otherwise, while n denotes that number of examples in the data set used in the ....
[Article contains additional citation context not shown here]
Fawcett, T., and Provost, F. 1997. Adaptive fraud detection. Data Mining and Knowledge Discovery 1(3):291--316.
....in a truly one to one manner since these rules are specified by the expert rather than learned from the data and are applicable only to groups of customers. In addition to the developments in the industry, the profiling problem was also studied in the data mining academic community in [FP96, FP97, ASY98, AT99, Cha99] In particular, FP96, FP97] studied this problem within the context of fraud detection in the cellular phone industry. This was done by learning rules pertaining to individual customers from the cellular phone usage data using the rule learning system RL [CP90] However, ....
....are specified by the expert rather than learned from the data and are applicable only to groups of customers. In addition to the developments in the industry, the profiling problem was also studied in the data mining academic community in [FP96, FP97, ASY98, AT99, Cha99] In particular, FP96, FP97] studied this problem within the context of fraud detection in the cellular phone industry. This was done by learning rules pertaining to individual customers from the cellular phone usage data using the rule learning system RL [CP90] However, these discovered rules were used not for the purpose ....
[Article contains additional citation context not shown here]
T. Fawcett and F. Provost. Adaptive fraud detection. Journal of Data Mining and Knowledge Discovery, 1(3):291--316, 1997.
....of evaded tax is maximized, and . minimizing audit costs, i.e. define subjects to be selected for audit in such a way that the resources needed to carry out the audits are minimized. The capability of designing systems for supporting this form of decisions poses technically precise challenges [3, 7]: is there a KDD methodology for audit planning which may be tuned according to these options a desiderata also referred to as h# s p data mining What and how data mining tools may be adopted How extracted knowledge may be combined with domain knowledge to obtain useful audit models We ....
Fawcett, T and Provost, F. "Adaptive Fraud Detection", 9h#hHvvthqF'#yrqtr 9v+p'o/oor...'#W'y# #I'# , pp. 291-316, 1997.
....failures in the past 10 seconds) should be used. Traditional feature selection techniques, as discussed in the machine learning literature, can not be directly applied here since prior work typically does not consider sequential correlation of features across record boundaries. Fawcett and Provost (Fawcett and Provost, 1997) presented some very interesting ideas on automatic selection of features for a cellular fraud detector. Their method is very effective in detecting superimposition fraud in which fraudulent activities are conducted using a legitimate account. Many intrusions can not be easily categorized as ....
....or session records into a single attribute string is not appropriate since each attribute carries distinct and yet important information. Section 5.2.2 demonstrates that it is important to be able to analyze the data using different combinations of the attributes. In DC 1 (Detector Constructor) (Fawcett and Provost, 1997), a rule learning step is first used to obtain each customer s fraudulent patterns, and rule selection is then used to obtain a set of general fraudulent patterns for the entire population. A monitor construction step is used to obtain the sensitivity measures of different users to these general ....
Fawcett, T. and F. Provost: 1997, `Adaptive Fraud Detection'. Data Mining and Knowledge Discovery 1, 291--316.
....account the misclassification costs of the class attribute instances. That is they assume that all the instances of a class attribute have the same cost, an assumption that is rarely valid in real world applications. A widely known example of these cost sensitive applications is fraud detection [5]. A method to address this problem is the use of cost sensitive classification. Indeed if we specify the costs attached to each class value in a cost matrix, a cost sensitive classifier can use these value when classifying a new example. This change in the classification task is likely to affect ....
T.Fawcett and F.Provost. (1997) Adaptive fraud detection. Data mining and Knowledge Discovery, 1(3). http://www.croftj.net/~fawcett/DMKD-97.ps.gz
....to useless results (for example, 1] and [8] Has statistics had any success in analyzing massive data The rest of this paper considers that question in the context of fraud detection, a topic that has previously been discussed in the data mining knowledge discovery literature (e. g, 2] and [13]) More information on our approach and other approaches to fraud detection is given in [3] 3 Statistical Fraud Detection 3.1 Describing Account Variability Our goal is a system for detecting telecommunications fraud as it is happens, whether it is calling card fraud, in which a stolen credit ....
Fawcett, T. and Provost, F. (1997). Adaptive fraud detection. Data Mining and Knowledge Discovery, 1:291--316.
....for card cloning. Although a velocity trap is a powerful method of detecting card cloning, it is ineffective against other types of fraud. Therefore there is great interest in detection systems which detect fraud based on an analysis of behavioral patterns (Barson et al. 1996, Burge et al. 1997, Fawcett and Provost, 1997, Taniguchi et al. 1998) In an absolute analysis, a user is classified as a fraudster based on features derived from daily statistics summarizing the call pattern such as the average number of calls. In a differential analysis, the detection is based on measures describing the changes in those ....
Fawcett, T. and Provost, F. (1997). Adaptive Fraud Detection. Journal of Data Mining and Knowledge Discovery, , Vol. 1, No. 3, pp. 1-28.
....is another two dimensional map. Example 3: Fraud detection in transactions. Many applications of machine learning involve analyzing time series of transactions (e.g. telephone calls, insurance claims, TCP connection attempts) to identify changes in behavior associated with fraudulent activity (Fawcett Provost, 1997). This can be formalized as a problem of mapping an input sequence of transactions to an output sequence of alarms. Example 4: Finding all volcanoes on Venus (Burl, Asker, Smyth, Fayyad, Perona, Crumpler, Aubele, 1998) Many visual applications involve scanning images to identify objects of ....
Fawcett, T., & Provost, F. (1997). Adaptive fraud detection. Knowledge Discovery and Data Mining, 1, 291--316.
....Instead, efforts tend to be concentrated on new learning algorithms or feature selection methods to try to win improvements in performance. However, recent publications concerning machine learning in other domains give reason to believe that feature engineering techniques can improve performance [FAP97, KHM98]. Furthermore, the recent successful application of symbolic machine learning techniques to text classification (notably with the RIPPER decision rule learning system [COS96, COH96a] opens up new avenues for feature engineering techniques to improve performance. 1 The terms text classification ....
Tom Fawcett and Foster Provost. Adaptive Fraud Detection. In Data Mining and Knowledge Discovery 1-28. 1997.
....massive data. But, has statistics had any success in analyzing massive data The rest of this paper considers that question in the context of fraud detection, a topic that has previously been discussed in the data mining and knowledge discovery literature (e.g. Burge and Shawe Taylor, 1997] and [Fawcett and Provost, 1997]) More information on fraud detection from the perspective in this paper is given in [Cahill et al., 2000] 3 Statistical Fraud Detection There are many kinds of telecommunications fraud. In calling card fraud, a stolen credit card is used to place a call. In wireless fraud, a cellular phone may ....
Fawcett, T. and Provost, F. (1997). Adaptive fraud detection. Data Mining and Knowledge Discovery, 1:291--316.
....2.2.1 ERROR COST CONDITIONAL ON INDIVIDUAL CASE The cost of a classification error may depend on the nature of the particular case. For example, in detection of fraud, the cost of missing a particular case of fraud will depend on the amount of money involved in that particular case (Fawcett and Provost, 1996, 1997). Similarly, the cost of a certain kind of mistaken medical diagnosis may be conditional on the particular patient who is misdiagnosed. For example, the misdiagnosis may be more costly in elderly patients. 2 It may be possible to represent this situation with a constant error cost by ....
....The sensor readings must be classified as either alarm or noalarm . The cost of the classification depends on whether the classification is correct and also on the timeliness of the classification. The alarm is not useful unless there is sufficient time for an adequate response to the alarm (Fawcett and Provost, 1996, 1997, 1999) Again, it may be possible to represent this situation with a constant error cost by distinguishing sub classes. Instead of two classes, alarm and no alarm , there could be alarm with lots of time , alarm with a little time , alarm with no time , and no alarm . Again, this is an ....
Fawcett, T., and Provost, F.J. (1997). Adaptive fraud detection. Data Mining and Knowledge Discovery, 1 (3).
....expect thresholds to apply and restrict their activity on any one account to levels that cannot be detected by most thresholding systems. Thus, there are both too many false alarms for legitimate calling and too many missed cases of fraud. Thresholding can be improved, though. For example, Fawcett and Provost (1997) develop an 4 innovative method for choosing account speci c thresholds rather than universal thresholds that apply to all accounts or all accounts in a segment. Their procedure takes daily trac summaries for a set of accounts that experienced at least 30 days of fraud free trac before being hit ....
....and access to account information. The account speci c thresholds can be updated periodically by re tting trees and sequentially selecting the account summaries to threshold. Re training requires more resources than running the detection algorithm does, but re training may be needed infrequently. Fawcett and Provost (1997) describe an application of their methods to a set of fewer than 1,000 accounts, each of which had at least 30 days of fraud free activity followed by a period of wireless cloning fraud. Account speci c thresholding has limitations, though. Perhaps most importantly, a procedure that requires a ....
[Article contains additional citation context not shown here]
Fawcett, T. and Provost, F. (1997). Adaptive fraud detection, Data Mining and Knowledge Discovery 1: 291-316.
....expect thresholds to apply and restrict their activity on any one account to levels that cannot be detected by most thresholding systems. Thus, there are both too many false alarms for legitimate calling and too many missed cases of fraud. Thresholding can be improved, though. For example, Fawcett and Provost, 1997 develop an innovative method for choosing account specific thresholds rather than universal thresholds that apply to all accounts or all accounts in a segment. Their procedure takes daily traffic summaries for a set of accounts that experienced at least 30 days of fraud free traffic before being ....
....access to account information. The account specific thresholds can be updated periodically by re fitting trees and sequentially selecting the account summaries to threshold. Re training requires more resources than running the detection algorithm does, but re training may be needed infrequently. Fawcett and Provost, 1997 describe an application of 6 their methods to a set of fewer than 1,000 accounts, each of which had at least 30 days of fraud free activity followed by a period of wireless cloning fraud. Account specific thresholding has limitations, though. Perhaps most importantly, a procedure that requires ....
[Article contains additional citation context not shown here]
Fawcett, T. and Provost, F. (1997). Adaptive fraud detection. Data Mining and Knowledge Discovery, 1:291--316.
....rules and subsequently proceed to choose the classi er that results in the smallest error. This strategy is brittle as it fails to take into account the facts that, rstly, the class of interest is typically under represented (e.g. detecting tumours, engine faults, credit card fraud, etc. (Fawcett and Provost 1997, Kadirkamanathan and Patel 1998, Tarassenko, Hayton, Cerneaz and Brady 1995) Secondly, the error costs associated with, say, informing a group of healthy patients that they are ill is not intuitively the same as telling a group of ill patients that they are healthy. Furthermore, due to economic ....
....that they are ill is not intuitively the same as telling a group of ill patients that they are healthy. Furthermore, due to economic pressures or other constraints, we might wish to choose the percentage of these error costs that we are prepared to accept. This operating cost can vary over time (Fawcett and Provost 1997, Kubat, Holte and Matwin 1998, Melvin 1996) Here, we will consider strategies for combining classi ers that take into account these di erent costs and provide an optimal solution for the entire range of operating conditions. We now turn our attention to the problem of selecting samples from one ....
Fawcett, T. and Provost, F. (1997). Adaptive fraud detection, Data Mining and Knowledge Discovery 3(1): 291-316.
....to select the best classifier if the costs and class frequencies are known ahead of time. But often they are not fixed until the time of application making ROC analysis important. The relationship between decision theory and ROC analysis is discussed in Lusted s book [7] In Fawcett and Provost s [4, 5] work on cellular fraud detection, they noted that the cost and amount of fraud varies over time and location. This was one motivation for their research into ROC analysis. Our own experience with imbalanced classes [6] dealt with the detection of oil spills and the number of non spills far ....
T. Fawcett and F. Provost. Adaptive fraud detection. Journal of Data Mining and Knowledge Discovery, 1:195--215, 1997.
....standard learning methods which assume a balanced distribution of the classes. For example, the problem occurs and hinders classification in applications as diverse as the detection of oil spills in satellite radar images (Kubat, Holte, Matwin 1998) the detection of fraudulent telephone calls (Fawcett Provost 1997) and in flight helicopter gearbox fault monitoring (Japkowicz, Myers, Gluck 1995) To this point, there have only been a few attempts at dealing with the class imbalance problem (e.g. Paz I would like to thank Danny Silver and Afzal Upal for their very helpful comments on a draft of this ....
....and Afzal Upal for their very helpful comments on a draft of this paper. 1 In this paper, we only consider the case of conceptlearning. However, the discussion also applies to multi class problems. zani et al. 1994) Japkowicz, Myers, Gluck 1995) Ling Li 1998) Kubat Matwin 1997) (Fawcett Provost 1997), Kubat, Holte, Matwin 1998) and these attempts were mostly conducted in isolation. In particular, there has not been, to date, any systematic strive to link specific types of imbalances to the degree of inadequacy of standard classifiers. Furthermore, no comparison of the various methods ....
Fawcett, T. E., and Provost, F. 1997. Adaptive fraud detection. Data Mining and Knowledge Discovery 1(3):291--316.
....planning, e.g. the tradeoff between maximizing audit benefits vs. minimizing audit costs. 1 Introduction Fraud detection is becoming a central application area for knowledge discovery in databases, as it poses challenging technical and methodological problems, many of which are still open [1, 2]. A major task in fraud detection is that of constructing models, or profiles, of fraudulent behavior, which may serve in decision support systems for: preventing frauds (a priori fraud detection) or . planning audit strategies (a posteriori fraud detection) The first case is typical of ....
Fawcett, T, Provost, F., "Adaptive Fraud Detection", Data Mining and Knowledge Discovery, Vol. 1, No. 1, pp. 291-316, (1997).
....local outliers proposed in this paper. Given the importance of the area, fraud detection has received more attention than the general area of outlier detection. Depending on the specifics of the application do mains, elaborate fraud models and fraud detection algorithms have been developed (e.g. [8], 6] In contrast to fraud detection, the kinds of outlier detection work discussed so far are more exploratory in nature. Outlier detection may indeed lead to the construction of fraud models. 3 Problems of Current (non local) Approaches As we have seen in section 2, most of the existing work ....
Fawcett T., Provost F.: "Adaptive Fraud Detection", Data Mining and Knowledge Discovery Journal, Kluwer Academic Publishers, Vol. 1, No. 3, pp. 291-316.
....rules and subsequently proceed to choose the classifier that results in the smallest error. This strategy is brittle as it fails to take into account the facts that, firstly, the class of interest is typically under represented (e.g. detecting tumours, engine faults, credit card fraud, etc. (Fawcett and Provost 1997, Kadirkamanathan and Patel 1998, Tarassenko, Hayton, Cerneaz and Brady 1995) Secondly, the error costs associated with, say, informing a group of healthy patients that they are ill is not intuitively the same as telling a group of ill patients that they are healthy. Furthermore, due to economic ....
....that they are ill is not intuitively the same as telling a group of ill patients that they are healthy. Furthermore, due to economic pressures or other constraints, we might wish to choose the percentage of these error costs that we are prepared to accept. This operating cost can vary over time (Fawcett and Provost 1997, Kubat, Holte and Matwin 1998, Melvin 1996) Here, we will consider strategies for combining classifiers that take into account these different costs and provide an optimal solution for the entire range of operating conditions. We now turn our attention to the problem of selecting samples from ....
Fawcett, T. and Provost, F. (1997). Adaptive fraud detection, Data Mining and Knowledge Discovery 3(1): 291--316.
....error rate. While academic research has continued to improve the misclassification error rate, applications in business, medicine, and science have shown that real problems require more subtle measures of performance (see, for example, Pazzani, Merz, Murphy, Ali, Hume, Brunk, 1994; Fawcett Provost, 1997; Kubat, Holte, Matwin, 1997; Provost, Fawcett, Kohavi, 1998; Bradford, Kunz, Kohavi, Brunk, Brodley, 1998) In particular, one important problem is that in these applications, different kinds of errors have different costs. For example, in medical applications, the cost of a false positive ....
Fawcett, T., & Provost, F. (1997). Adaptive fraud detection. Data Mining and Knowledge Discovery, 1 (3).
....This suggests that, in this case, the BAYES classifier tends to predict the majority class. Figure 2 plots FN versus FP in the three domains and each curve, composed of data points from different distributions, represents a learning algorithm. These curves are the same as the ROC curves (Provost Fawcett 1997) since T P is 1 Gamma FN (we chose to use FN because T P is not used for our discussion in this paper) The plot allows us to compare the FN and FP performance of different algorithms. The ideal classifier would appear at the lower left corner (0,0) In the fraud domain we observe that BAYES is ....
....P (N) Theta Cost(FP ) Theta FP; 1) where P (P ) and P (N) are the probabilities of positive s and negative s, Cost(FN) and Cost(FP ) are the respective costs of a false negative and a false positive, FN and FP are the respective FN and FP rates. This function was also defined by Provost and Fawcett (1997). We further define the cost ratio as: CostRatio = Cost(FN) Cost(FP ) 2) Three cases can be derived from Equation 1: 1. Error rate: ErrorRate = P (P ) Theta FN P (N) Theta FP (3) From Equations 1 and 3: P (P ) ThetaC ost(F N) P (P ) P (N) ThetaCost(F P ) P (N) Cost(FN) Cost(FP ) ....
[Article contains additional citation context not shown here]
Fawcett, T., and Provost, F. 1997. Adaptive fraud detection.
....to measure the improvements in accuracy produced by various algorithms. While academic research has continued to improve the misclassification error rate, applications in business, medicine, and science have shown that real problems require more subtle measures of performance (see, for example, (Fawcett Provost, 1997; Kubat, Holte, Matwin, 1997; Provost, Fawcett, Kohavi, 1998) In particular, one important problem is that in these applications, different kinds of errors have different costs. For example, in medical applications, the cost of a false positive diagnosis is usually the cost of putting the ....
Fawcett, T., & Provost, F. (1997). Adaptive fraud detection. Data Mining and Knowledge Discovery, 1 (3).
....P (N) Theta Cost(FP ) Theta FP; 1) where P (P ) and P (N) are the probabilities of positive s and negative s, Cost(FN) and Cost(FP ) are the respective costs of a false negative and a false positive, FN and FP are the respective FN and FP rates. This function was also defined by Provost and Fawcett (1997). We further define the cost ratio as: CostRatio = Cost(FN) Cost(FP ) 2) Three cases can be derived from Equation 1: 1. Error rate: ErrorRate = P (P ) Theta FN P (N) Theta FP (3) 0 0.2 0.4 0.6 0.8 1 10 20 30 40 50 60 70 80 90 Distribution of minority class Credit Card Fraud (C4.5) FN FP ....
....(or accuracy) is commonly used in evaluating learning algorithms; cost sensitive learning has not been well investigated. In a bibliography collected by Turney (1998) on cost sensitive learning, 34 articles were published between 1974 and 1997 an average of fewer than 1.5 articles per year. Fawcett and Provost (1997) considered non uniform cost per error in their cellular phone fraud detection task and exhaustively searched (with a fixed increment) for the Linear Threshold Unit s threshold that minimizes the total cost. Without modifying the learning algorithms, our approach handles non uniform cost per error ....
Fawcett, T., and Provost, F. 1997. Adaptive fraud detection. Data Mining and Knowledge Discovery 1:291--316.
....between skewed class distribution and unequal error costs occurs in many computer vision applications, in which a vision system generates thousands of candidates but only a handful correspond to objects of interest. It also holds many other applications of machine learning, such as fraud detection (Fawcett Provost 1997), discourse analysis (Soderland Lehnert 1994) and telecommunications risk management (Ezawa, Singh, Norton 1996) These issues raise two challenges. First, they suggest the need for modified learning algorithms that can achieve high accuracy on the minority class. Second, they require an ....
....in terms of false positives and false negatives. Given information about the relative costs of errors, say from conversations with domain experts or from a domain analysis, we could then compute a weighted accuracy for each algorithm that takes cost into account (e.g. Pazzani et al. 1994; Fawcett Provost 1997). However, in this case, we had no access to image analysts or enough information about the results of their interpretations to determine the actual costs for the domain. In such situations, rather than aiming for a single performance measure, as typically done in machine learning ex Rooftop ....
[Article contains additional citation context not shown here]
Fawcett, T., and Provost, F. 1997. Adaptive fraud detection. Data Mining and Knowledge Discovery 1:291--316.
....to measure the improvements in accuracy produced by various algorithms. While academic research has continued to improve the misclassification error rate, applications in business, medicine, and science have shown that real problems require more subtle measures of performance (see, for example, (Fawcett Provost, 1997; Kubat, Holte, Matwin, 1997; Provost, Fawcett, Kohavi, 1998) In particular, one important problem is that in these applications, different kinds of errors have different costs. For example, in medical applications, the cost of a false positive diagnosis is usually the cost of putting the ....
Fawcett, T., & Provost, F. (1997). Adaptive fraud detection. Data Mining and Knowledge Discovery, 1 (3).
No context found.
Fawcett, T., & Provost, F. (1997). Adaptive fraud detection. Data Mining and Knowledge Discovery, 3, 291--316.
No context found.
Fawcett, T., & Provost, F. (1997). Adaptive fraud detection. Data Mining and Knowledge Discovery, 1:3, 291-- 316.
.... are known precisely, and often are subject to change (Zahavi Levin, 1997; Friedman Wyatt, 1997; Klinkenberg Thorsten, 2000) For example, in fraud detection we cannot ignore misclassi cation costs or the skewed class distribution, nor can we assume that our estimates are precise or static (Fawcett Provost, 1997). We need a method for the management, comparison, c 2000 Kluwer Academic Publishers. Printed in the Netherlands. final.tex; 20 07 2000; 14:16; p.1 2 and application of multiple classi ers that is robust in imprecise and changing environments. We describe the ROC convex hull (rocch) method, ....
....of imprecision that are common in real world environments. Specifically, costs and bene ts usually are not known precisely, and target (prior) class distributions often are known only approximately as well. This observation has been made by many authors (Bradley, 1997; Catlett, 1995; Provost Fawcett, 1997), and is in fact the concern of a large sub eld of decision analysis (Weinstein Fineberg, 1980) Imprecision also arises because the environment may change between the time the system is conceived and the time it is used, and even as it is used. For example, levels of fraud and levels of ....
[Article contains additional citation context not shown here]
Fawcett, T., & Provost, F. (1997). Adaptive Fraud Detection. Data Mining and Knowledge Discovery, 1 (3), 291-316.
.... Aronis, 1996) 5 Human computer collaborative discovery projects can use rule space search as a basic tool (Lee, Buchanan, Aronis, 1998) Rule space search has even been used to automate other parts of the knowledge discovery process, for example for feature construction from complex data (Fawcett Provost, 1997). These are just a few examples, taken from work in which we have been involved. They begin to illustrate the algorithm s exibility. The notion of rule learning as search was discussed explicitly early by Simon and Lea (Simon Lea, 1973) Mitchell (Mitchell, 1982) provides a detailed overview ....
....con dence and complexity bounds to de ne interestingness. Weiss, et al. Weiss, Galen, Tadepalli, 1990) describe a gat rule learning search program (PVM) and several pruning heuristics for maximizing predictive value. The RL programs (Clearwater Provost, 6 1990; Provost Buchanan, 1995; Fawcett Provost, 1997), perform gat rule space search, and have been used with a variety of interestingness and pruning criteria, such as (de ned below) various forms of con dence, support, complexity, w beam, and domain knowledge constraints. It is important to note that the basic rule space search applies whether or ....
[Article contains additional citation context not shown here]
Fawcett, T., & Provost, F. (1997). Adaptive fraud detection. Data Mining and Knowledge Discovery, 1 (3), 291-316.
.... and Feigenbaum 1978) Examples of MetaDENDRAL style rule learning include the Brute programs (Riddle, Segal, and Etzioni 1994; Segal and Etzioni 1994a) PVM (Weiss, Galen, and Tadepalli 1990) ITRULE (Smyth and Goodman 1992) the RL programs (Clearwater and Provost 1990; Provost and Buchanan 1995; Fawcett and Provost 1997), SE trees (Rymon 1993) and even Schlimmer s determination learning algorithm (Schlimmer 1993) These programs view rule learning as an explicit search of the rule space rooted at the rule with no conditions in the antecedent, with rules becoming more specific (by adding conditions) as they get ....
Fawcett, T. and F. Provost (1997). Adaptive fraud detection. Data Mining and Knowledge Discovery 1 (3), 291--316.
....41 positive examples we have 896 negative 6 M. KUBAT, R. HOLTE, AND S. MATWIN examples; the majority class thus comprises almost 96 of the data. Highly imbalanced training sets occur in applications where the classifier is to detect a rare but important event, such as fraudulent telephone calls (Fawcett Provost, 1997), unreliable telecommunications customers (Ezawa, Singh Norton, 1996) failures or delays in a manufacturing process (Riddle, Segal Etzioni, 1994) rare diagnoses such as the thyroid diseases in the UCI repository (Murphy Aha, 1994) or carcinogenicity of chemical compounds (Lee, Buchanan ....
....set of images, and it will be applied on images that were not part of this set. This fact should be taken into account in the evaluation of the system. This problem has been mentioned by several other authors, including Burl et al. this issue) Cherkauer and Shavlik (1994) Ezawa et al. 1996) Fawcett and Provost (1997), Kubat, Pfurtscheller and Flotzinger (1994) and Pfurtscheller, Flotzinger and Kalcher (1992) For instance, in the SKICAT system (Fayyad, Weir Djorgovski, 1993) the batches were plates, from which image regions were selected. When the system trained on images from one plate was applied to ....
[Article contains additional citation context not shown here]
Fawcett, T., & Provost, F. (1997). Adaptive Fraud Detection. Data Mining and Knowledge Discovery, 1(3), 291--316.
....flesh out these areas of weakness and others pointed out by the collected papers. It is important to emphasize that these lessons are very general, and are becoming more and more apparent as machine learning technologies are being applied more widely. In our own applied work, in fraud detection (Fawcett and Provost 1997), telecommunications network diagnosis (Danyluk and Provost 1993) and scientific discovery (Provost and Aronis 1996; Aronis, Provost and Buchanan 1996) the same research needs are evident. The authors point to many other published applications papers for further support. In order to obtain ....
Fawcett, T. and Provost, F. J. (1997). Adaptive fraud detection. Data Mining and Knowledge Discovery, 1.
....example, the optimal cost benefit tradeoffs and the target class priors seldom are known precisely, and often are subject to change. For example, in fraud detection we cannot ignore either the cost or class distribution, nor can we assume that our distribution specifications are precise or static (Fawcett Provost, 1997). We need a method for the management, comparison, and application of multiple classifiers that is robust to imprecise and changing environments. We describe the ROC convex hull (rocch) method, which combines techniques from ROC analysis, decision analysis and computational geometry. The ROC ....
....accounts among a large population of customers, screening medical tests for rare diseases, and checking an assembly line for defective parts. Because the unusual or interesting class is rare among the general population, the class distribution is very skewed (Ezawa, Singh, Norton, 1996; Fawcett Provost, 1997; Kubat, Holte, Matwin, 1998; Saitta Neri, 1998) As the class distribution becomes more skewed, evaluation based on accuracy breaks down. Consider a domain where the classes appear in a 999:1 ratio. A simple rule always classify as the maximum likelihood class gives a 99.9 accuracy. This ....
Fawcett, T., & Provost, F. (1997). Adaptive fraud detection. Data Mining and Knowledge Discovery, 1 (3), 291--316.
....than another. This fact is well documented, primarily in other fields (statistics, medical diagnosis, pattern recognition and decision theory) As an example, consider machine learning for fraud detection, where the cost of missing a case of fraud is quite different from the cost of a false alarm (Fawcett and Provost, 1997) . Accuracy maximization also assumes that the class distribution (class priors) is known for the target environment. Unfortunately, for our benchmark data sets, we often do not know whether the existing distribution is the natural distribution, or whether it has been stratified. The iris data set ....
....First, the transformation is valid only for two class problems. Whether it can be approximated effectively for multiclass problems is an open question. Second, we do not know appropriate costs for these data sets and, as noted by many applied researchers (Bradley , 1997; Catlett, 1995; Provost and Fawcett, 1997) , assigning these costs precisely is virtually impossible. Third, as described above, generally we do not know whether the class distribution in a natural data set is the true target class distribution. Because of these uncertainties we cannot claim to be able to transform these ....
[Article contains additional citation context not shown here]
T. Fawcett and F. Provost. (1997) Adaptive fraud detection. Data Mining and Knowledge Discovery, 1(3). Available: http://www.croftj.net/~fawcett/ DMKD-97.ps.gz.
No context found.
T. Fawcett & Provost, F. (1997), Adaptive fraud detection, Data Mining and Knowledge Discovery, 1(3):291--316.
No context found.
Fawcett, T. & Provost, F. (1997). Adaptive Fraud Detection. Data Mining and Knowledge Discovery 1(3): 291-316.
No context found.
Fawcett, T., and Provost, F.J. 1997. Adaptive Fraud Detection. Data Mining and Knowledge Discovery 1(3): 291-316.
No context found.
[Fawcett and Provost, 1997] Fawcett, T., Provost, F., Adaptive Fraud Detection.
No context found.
Fawcett T. and Provost F. (1997), Adaptive Fraud Detection. Data Mining and Knowledge Discovery, 1(3):291--316.
No context found.
Fawcett T. and Provost F. (1997), Adaptive Fraud Detection. Data Mining and Knowledge Discovery, 1(3):291--316.
No context found.
Tom Fawcett and Foster Provost. Adaptive fraud detection. Journal of Data Mining and Knowledge Discovery, 1(3):291--316, 1997.
No context found.
T. Fawcett, F. Provost: "Adaptive Fraud Detection", Data Mining and Knowledge Discovery Journal, Kluwer Academic Publishers, Vol. 1, No. 3, 1997, pp 291 - 316.
No context found.
Fawcett, T. E. and Provost, F., 1997. "Adaptive Fraud Detection", Data Mining and Knowledge Discovery, 1(3):291-316.
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC