Results 1 -
8 of
8
On-line Unsupervised Outlier Detection Using Finite Mixtures with Discounting Learning Algorithms
- PROCEEDINGS OF THE SIXTH ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING
, 2000
"... Outlier detection is a fundamental issue in data mining, speci cally in fraud detection, network intrusion detection, network monitoring, etc. SmartSifter, which we abbreviate as SS, is an outlier detection engine adrressing this problem from the viewpoint of statistical learning theory. This paper ..."
Abstract
-
Cited by 41 (4 self)
- Add to MetaCart
Outlier detection is a fundamental issue in data mining, speci cally in fraud detection, network intrusion detection, network monitoring, etc. SmartSifter, which we abbreviate as SS, is an outlier detection engine adrressing this problem from the viewpoint of statistical learning theory. This paper provides a theoretical basis for SS and empirically demonstrates its effectiveness. SS detects outliers in an online process through the on-line unsupervised learning of a probabilistic model (using a finite mixture model) of the information source. Each time a datum is input SS employs an on-line discounting learning algorithm to learn the probabilistic model. A score is given to the datum based on the learned model, with a high score indicating a high possibility of being a statistical outlier. The novel features of SS are: 1) it is adaptive to non-stationary sources of data; 2) a score has a clear statistical/information-theoretic meaning; 3) it is computationally inexpensive; and 4) it can handle both categorical and continuous variables. An experimental application to network intrusion detection shows that SS was able to identify data with high scores that corresponded to attacks, with low computational costs. Further experimental application has identified a number of meaningful rare cases in actual health insurance pathology data from Australia's Health Insurance Commission.
ATLaS: A native extension of sql for data mining
- In SIAM International Conference on Data Mining (SDM
"... A lack of power and extensibility in their query languages has seriously limited the generality of DBMSs and hampered their ability to support data mining applications. Thus, there is a pressing need for more general mechanisms for extending DBMSs to support efficiently database-centric data mining ..."
Abstract
-
Cited by 8 (3 self)
- Add to MetaCart
A lack of power and extensibility in their query languages has seriously limited the generality of DBMSs and hampered their ability to support data mining applications. Thus, there is a pressing need for more general mechanisms for extending DBMSs to support efficiently database-centric data mining appliacations. To satisfy this need, we propose a new extensibility mechanism for SQL-compliant DBMSs, and demonstrate its power in supporting decision support applications. The key extension is the ability of defining new table functions and aggregate functions in SQL— rather than in external procedural languages as Object-Relational (O-R) DBMSs currently do. This simple extension turns SQL into a powerful language for decision-support applications, including ROLAPs, time-series queries, stream-oriented processing, and data mining functions. First, we discuss the use of ATLaS for data mining applications, and then the architecture and techniques used in its realization. 1
Using Data Mining Techniques In Fiscal Fraud Detection
- In Proc. DaWak'99, First Int. Conf. on Data Warehousing and Knowledge Discovery
, 1999
"... . Planning adequate audit strategies is a key success factor in "a posteriori" fraud detection, e.g., in the fiscal and insurance domains, where audits are intended to detect tax evasion and fraudulent claims. A case study is presented in this paper, which illustrates how techniques based on clas ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
. Planning adequate audit strategies is a key success factor in "a posteriori" fraud detection, e.g., in the fiscal and insurance domains, where audits are intended to detect tax evasion and fraudulent claims. A case study is presented in this paper, which illustrates how techniques based on classification can be used to support the task of planning audit strategies. The proposed approach is sensible to some conflicting issues of audit planning, e.g., the tradeoff between maximizing audit benefits vs. minimizing audit costs. 1 Introduction Fraud detection is becoming a central application area for knowledge discovery in databases, as it poses challenging technical and methodological problems, many of which are still open [1, 2]. A major task in fraud detection is that of constructing models, or profiles, of fraudulent behavior, which may serve in decision support systems for: . preventing frauds (a priori fraud detection), or . planning audit strategies (a posteriori fraud de...
Risk Analysis applied to Tax Evasion using data mining methodology
"... I documenti di lavoro non riflettono necessariamente l’opinione ufficiale dell’Agenzia delle Entrate ed impegnano unicamente gli autori. Possono essere liberamente utilizzati e riprodotti per finalità di uso personale, studio, ricerca o comunque non commerciali a condizione che sia citata la fonte s ..."
Abstract
- Add to MetaCart
I documenti di lavoro non riflettono necessariamente l’opinione ufficiale dell’Agenzia delle Entrate ed impegnano unicamente gli autori. Possono essere liberamente utilizzati e riprodotti per finalità di uso personale, studio, ricerca o comunque non commerciali a condizione che sia citata la fonte secondo la seguente dicitura, impressa in caratteri
Nondeterministic, Nonmonotonic Logic Databases
"... We consider in this paper an extension of Datalog with mechanisms for temporal, nonmonotonic and nondeterministic reasoning, which we refer to as Datalog++. We show, by means of examples, its flexibility in expressing queries concerning aggregates and data cube. Also, we show how iterated fixpoint a ..."
Abstract
- Add to MetaCart
We consider in this paper an extension of Datalog with mechanisms for temporal, nonmonotonic and nondeterministic reasoning, which we refer to as Datalog++. We show, by means of examples, its flexibility in expressing queries concerning aggregates and data cube. Also, we show how iterated fixpoint and stable model semantics can be combined to the purpose of clarifying the semantics of Datalog++ programs, and supporting their efficient execution. Finally, we provide a more concrete implementation strategy, on which basis the design of optimization techniques tailored for Datalog++ is addressed.
Splash: Integrated Ad-Hoc Querying of Data and Statistical Models
"... Abstract — This paper presents a system called Splash, which integrates statistical modeling and SQL for the purpose of adhoc querying and analysis. Splash supports a novel, simple, and practical abstraction of statistical modeling as an aggregate function, which in turn provides for natural integra ..."
Abstract
- Add to MetaCart
Abstract — This paper presents a system called Splash, which integrates statistical modeling and SQL for the purpose of adhoc querying and analysis. Splash supports a novel, simple, and practical abstraction of statistical modeling as an aggregate function, which in turn provides for natural integration with standard SQL queries and a relational DBMS. In addition, we introduce and implement a novel representatives operator to help explain statistical models using a limited number of representative examples. We present a proof-of-concept implementation of the system, which includes several performance optimizations. An experimental study indicates that our system scales well to large input datasets. Further, to demonstrate the simplicity and usability of the new abstractions, we conducted a case study using Splash to perform a series of exploratory analyses using network log data. Our study indicates that the query-based interface is simpler than a common data mining software package, and for ad-hoc analysis, it often requires less programming effort to use. I.
Securing Critical Infrastructures, Grenoble, October 2004 A COMPARISON OF CONVENTIONAL AND ONLINE FRAUD
"... Fraud is a growing problem experienced by most organisations [1] as well as affecting the general public. Impersonation of an individual, using identity information stolen from them, is the fastest ..."
Abstract
- Add to MetaCart
Fraud is a growing problem experienced by most organisations [1] as well as affecting the general public. Impersonation of an individual, using identity information stolen from them, is the fastest
Debt Detection in Social Security by Sequence Classification Using Both Positive and Negative Patterns
, 2004
"... Abstract. Debt detection is important for improving payment accuracy in social security. Since debt detection from customer transactional data can be generally modelled as a fraud detection problem, a straightforward solution is to extract features from transaction sequences and build a sequence cla ..."
Abstract
- Add to MetaCart
Abstract. Debt detection is important for improving payment accuracy in social security. Since debt detection from customer transactional data can be generally modelled as a fraud detection problem, a straightforward solution is to extract features from transaction sequences and build a sequence classifier for debts. The existing sequence classification methods based on sequential patterns consider only positive patterns. However, according to our experience in a large social security application, negative patterns are very useful in accurate debt detection. In this paper, we present a successful case study of debt detection in a large social security application. The central technique is building sequence classification using both positive and negative sequential patterns.

