Results 1 - 10
of
13
Formal Concept Sampling for Counting and Threshold-Free Local Pattern Mining
"... We describe a Metropolis-Hastings algorithm for sampling formal concepts, i.e., closed (item-) sets, according to any desired strictly positive distribution. Important applications are (a) estimating the number of all formal concepts as well as (b) discovering any number of interesting, non-redundan ..."
Abstract
-
Cited by 7 (1 self)
- Add to MetaCart
(Show Context)
We describe a Metropolis-Hastings algorithm for sampling formal concepts, i.e., closed (item-) sets, according to any desired strictly positive distribution. Important applications are (a) estimating the number of all formal concepts as well as (b) discovering any number of interesting, non-redundant, and representative local patterns. Setting (a) can be used for estimating the runtime of algorithms examining all formal concepts. An application of setting (b) is the construction of data mining systems that do not require any user-specified threshold like minimum frequency or confidence. 1
Detecting reviewer bias through web-based association mining
- In 2nd Workshop on Information Credibility on the Web (WICOW 2008) at ACM CIKM’08
, 2008
"... Online retailers and content distributors benefit from an active community that shares credible reviews and recommendations. Today, the most popular approach to encouraging credibility in these communities is self-regulation; community members rate reviews according to their accuracy and usefulness, ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
(Show Context)
Online retailers and content distributors benefit from an active community that shares credible reviews and recommendations. Today, the most popular approach to encouraging credibility in these communities is self-regulation; community members rate reviews according to their accuracy and usefulness, thus helping to weed out reviews that are inaccurate. This self-regulation, while powerful, is limited by its insularity. Commu-nity members generally base their assessments on a re-viewer’s comments and actions only within the commu-nity. This ignores relationships the reviewer has outside the community that may be quite relevant to evaluat-ing the reviewer’s comments; for example, a relationship between an author and reviewer. We present a simple method for mining the Web to detect many such as-sociations. Our method, together with self-regulation, provides for more comprehensive detection of bias in re-views by alerting the user to the potential for an undis-closed relationship between a reviewer and author. We provide preliminary results using book reviews in Ama-zon.com demonstrating that our approach is a high-precision method for detecting strong relationships be-tween reviewers and authors that may contribute to re-viewer bias.
Sherlock Holmes ’ Evil Twin: On The Impact of Global Inference for Online Privacy
"... User-supplied content—in the form of photos, videos, and text—is a crucial ingredient to many web sites and services today. However, many users who provide content do not realize that their uploads may be leaking personal information in forms hard to intuitively grasp. Correlation of seemingly innoc ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
(Show Context)
User-supplied content—in the form of photos, videos, and text—is a crucial ingredient to many web sites and services today. However, many users who provide content do not realize that their uploads may be leaking personal information in forms hard to intuitively grasp. Correlation of seemingly innocuous information can create inference chains that tell much more about individuals than they are aware of revealing. We contend that adversaries can systematically exploit such relationships by correlating information from different sources in what we term global inference attacks: assembling a comprehensive understanding from individual pieces found at a variety of locations, Sherlock-style. Not only are such attacks already technically viable given the capabilities that today’s multimedia content analysis and correlation technologies readily provide, but we also find business models that provide adversaries with powerful incentives for pursuing them.
IRILD: an Information Retrieval based method for Information Leak Detection
"... Abstract—The traditional approach for detecting information leaks is to generate fingerprints of sensitive data, by partitioning and hashing it, and then comparing these fingerprints against outgoing documents. Unfortunately, this approach incurs a high computation cost as every part of document nee ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
(Show Context)
Abstract—The traditional approach for detecting information leaks is to generate fingerprints of sensitive data, by partitioning and hashing it, and then comparing these fingerprints against outgoing documents. Unfortunately, this approach incurs a high computation cost as every part of document needs to be checked. As a result, it is not applicable to systems with a large number of documents that need to be protected. Additionally, the approach is prone to false positives if the fingerprints are common phrases. In this paper, we propose an improvement for this approach to offer a much faster processing time with less false positives. The core idea of our solution is to eliminate common phrases and non-sensitive phrases from the fingerprinting process. Non-sensitive phrases are identified by looking at available public documents of the organization that we want to protect from information leaks and common phrases are identified with the help of a search engine. In this way, our solution both accelerates leak detection and increases the accuracy of the result. Experiments were conducted on real-world data to prove the efficiency and effectiveness of the proposed solution. Keywords-privacy, information leaks, fingerprinting I.
Understanding Data Leak Prevention
"... Abstract—Data leaks involve the release of sensitive information to an untrusted third party, intentionally or otherwise. Many vendors currently offer data leak prevention products; surprisingly, however, there is very little academic research on this problem. In this paper, we attempt to motivate f ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
(Show Context)
Abstract—Data leaks involve the release of sensitive information to an untrusted third party, intentionally or otherwise. Many vendors currently offer data leak prevention products; surprisingly, however, there is very little academic research on this problem. In this paper, we attempt to motivate future work in this area through a review of the field and related research questions. Specifically, we define the data leak prevention problem, describe current approaches, and outline potential research directions in the field. As part of this discussion, we explore the idea that while intrusion detection techniques may be applicable to many aspects of the data leak prevention problem, the problem is distinct enough that it requires its own solutions.
Project Proposal- Detecting Privacy Leakage
"... The recent emergence of cloud computing has showed an enormous potential for a considerable impact on every aspect of our daily lives. This fad of technological advance, however, raises new privacy concerns which make people hesitate to adopt cloud computing. On the other hand, some new technologies ..."
Abstract
- Add to MetaCart
(Show Context)
The recent emergence of cloud computing has showed an enormous potential for a considerable impact on every aspect of our daily lives. This fad of technological advance, however, raises new privacy concerns which make people hesitate to adopt cloud computing. On the other hand, some new technologies give us a way to preserve privacy. Motivated by this dual role of privacy and the necessity of privacy preserving mechanism, I will investigate the problem of detecting privacy leakage in a cloud computing environment. 2 What needs to be done to solve this problem? First, I need to define the scope of problem to investigate by surveying related work and categorizing the types of privacy violation/leakage detections. Once I decide the scope of problem, I need to define a model and make assumptions for the model and adversaries. This might require for me to study background knowledge of statistical inference in data minding or intrusion detection systems. 3 What has been done? Krekke [5] identified two types of privacy violation detections and discussed a
Children’s Hospital of Eastern Ontario
"... We built a system which prevents leaks of personal health information inadvertently disclosed in heterogeneous text data. The system works with free-form texts. We empirically tested the system on files gathered from peer-to-peer file exchange networks. This study presents our text analysis apparatu ..."
Abstract
- Add to MetaCart
(Show Context)
We built a system which prevents leaks of personal health information inadvertently disclosed in heterogeneous text data. The system works with free-form texts. We empirically tested the system on files gathered from peer-to-peer file exchange networks. This study presents our text analysis apparatus. We discuss adaptation of lexical sources used in medical, scientific, domain for analysis of personal health information.
Adaptation of Language Resources
, 2009
"... The Workshop is endorsed by FLaReNet Project ..."
(Show Context)
A First Step towards Privacy Leakage Diagnosis and Protection Shinsaku Kiyomoto 1, and Toshiaki Tanaka 1
"... Abstract — In this paper, we present a first step for designing a privacy leakage diagnosis and protection system using two privacy definitions and a new definition, and then evaluate a prototype program. The diagnosis is based on major notions of privacy: k-anonymity and (c, l)-diversity. Furthermo ..."
Abstract
- Add to MetaCart
(Show Context)
Abstract — In this paper, we present a first step for designing a privacy leakage diagnosis and protection system using two privacy definitions and a new definition, and then evaluate a prototype program. The diagnosis is based on major notions of privacy: k-anonymity and (c, l)-diversity. Furthermore, the diagnosis include another method that analyze sensitivity of each attribute values. The prototype program realizes a computation time of less than 1 ms for the diagnosis and updating of data. Thus, it provides a privacy-leakage level within a feasible computation time.
VNS Group of Inst.
"... Abstract—Association rule mining has wide variety of research in the field of data mining, many of association rule mining approaches are well investigated in literature, but the major issue with ARM is, huge number of frequent patterns cannot produce direct knowledge or factual knowledge, hence to ..."
Abstract
- Add to MetaCart
(Show Context)
Abstract—Association rule mining has wide variety of research in the field of data mining, many of association rule mining approaches are well investigated in literature, but the major issue with ARM is, huge number of frequent patterns cannot produce direct knowledge or factual knowledge, hence to find factual knowledge and to discover inference, we propose a novel approach AFIRM in this paper followed by two step procedure, first is to discover frequent pattern by Appling ARM algorithm and second is to discover inference by adopting the concept of Fuzzy c-means clustering, for performance analysis, we apply this approach on a clinical dataset (contained symptoms information of patients) and we got highly effected disease in a couple of months or in a session as hidden knowledge or inference.