Results 1 - 10
of
14
Integrating Conflicting Data: The Role of Source Dependence
"... Many data management applications, such as setting up Web portals, managing enterprise data, managing community data, and sharing scientific data, require integrating data from multiple sources. Each of these sources provides a set of values and different sources can often provide conflicting values ..."
Abstract
-
Cited by 21 (5 self)
- Add to MetaCart
Many data management applications, such as setting up Web portals, managing enterprise data, managing community data, and sharing scientific data, require integrating data from multiple sources. Each of these sources provides a set of values and different sources can often provide conflicting values. To present quality data to users, it is critical that data integration systems can resolve conflicts and discover true values. Typically, we expect a true value to be provided by more sources than any particular false one, so we can take the value provided by the majority of the sources as the truth. Unfortunately, a false value can be spread through copying and that makes truth discovery extremely tricky. In this paper, we consider how to find true values from conflicting information when there are a large number of sources, among which some may copy from others. We present a novel approach that considers dependence between data sources in truth discovery. Intuitively, if two data sources provide a large number of common values and many of these values are rarely provided by other sources (e.g., particular false values), it is very likely that one copies from the other. We apply Bayesian analysis to decide dependence between sources and design an algorithm that iteratively detects dependence and discovers truth from conflicting information. We also extend our model by considering accuracy of data sources and similarity between values. Our experiments on synthetic data as well as real-world data show that our algorithm can significantly improve accuracy of truth discovery and is scalable when there are a large number of data sources. 1.
Combining Forecast Densities from VARs with Uncertain Instabilities
, 2008
"... Clark and McCracken (2008) argue that combining real-time point forecasts from VARs of output, prices and interest rates improves point forecast accuracy in the presence of uncertain model instabilities. In this paper, we generalize their approach to consider forecast density combinations and evalua ..."
Abstract
-
Cited by 8 (7 self)
- Add to MetaCart
Clark and McCracken (2008) argue that combining real-time point forecasts from VARs of output, prices and interest rates improves point forecast accuracy in the presence of uncertain model instabilities. In this paper, we generalize their approach to consider forecast density combinations and evaluations. Whereas Clark and Mc-Cracken (2008) show that the point forecast errors from particular equal-weight pairwise averages are typically comparable or better than benchmark univariate time series models, we show that neither approach produces accurate real-time forecast densities for recent US data. If greater weight is given to models that allow for the shifts in volatilities associated with the Great Moderation, predictive density accuracy improves substantially.
Sailing the Information Ocean with Awareness of Currents: Discovery and Application of Source Dependence
"... The Web has enabled the availability of a huge amount of useful information, but has also eased the ability to spread false information and rumors across multiple sources, making it hard to distinguish between what is true and what is not. Recent examples include the premature Steve Jobs obituary, t ..."
Abstract
-
Cited by 7 (5 self)
- Add to MetaCart
The Web has enabled the availability of a huge amount of useful information, but has also eased the ability to spread false information and rumors across multiple sources, making it hard to distinguish between what is true and what is not. Recent examples include the premature Steve Jobs obituary, the second bankruptcy of United airlines, the creation of Black Holes by the operation of the Large Hadron Collider, etc. Since it is important to permit the expression of dissenting and conflicting opinions, it would be a fallacy to try to ensure that the Web provides only consistent information. However, to help in separating the wheat from the chaff, it is essential to be able to determine dependence between sources. Given the huge number of data sources and the vast volume of conflicting data available on the Web, doing so in a scalable manner is extremely challenging and has not been addressed by existing work yet. In this paper, we present a set of research problems and propose some preliminary solutions on the issues involved in discovering dependence between sources. We also discuss how this knowledge can benefit a variety of technologies, such as data integration and Web 2.0, that help users manage and access the totality of the available information from various sources. 1.
Density forecast combination
- National Institute of Economic and Social Research Discussion Paper No
"... In this paper we investigate whether and how far density forecasts sensibly can be combined to produce a “better ” pooled density forecast. In so doing we bring together two important but hitherto largely unrelated areas of the forecasting literature in economics, density forecasting and forecast co ..."
Abstract
-
Cited by 6 (6 self)
- Add to MetaCart
In this paper we investigate whether and how far density forecasts sensibly can be combined to produce a “better ” pooled density forecast. In so doing we bring together two important but hitherto largely unrelated areas of the forecasting literature in economics, density forecasting and forecast combination. We provide simple Bayesian methods of pooling information across alternative density forecasts. We illustrate the proposed techniques in an application to two widely used published density forecasts for U.K. inflation. We examine whether in practice improved density forecasts for inflation, one year ahead, might have been obtained if one had combined the Bank of England and NIESR density forecasts or “fan charts”. 1
Logarithmic Pooling of Priors Linked by a Deterministic Simulation Model
- Journal of Computational and Graphical Statistics
, 1999
"... We consider Bayesian inference when priors and likelihoods are both available for inputs and outputs of a deterministic simulation model. This problem is fundamentally related to the issue of aggregating (i.e. pooling) expert opinion. We survey alternative strategies for aggregation, then describe c ..."
Abstract
-
Cited by 4 (1 self)
- Add to MetaCart
We consider Bayesian inference when priors and likelihoods are both available for inputs and outputs of a deterministic simulation model. This problem is fundamentally related to the issue of aggregating (i.e. pooling) expert opinion. We survey alternative strategies for aggregation, then describe computational approaches for implementing pooled inference for simulation models. Our approach (1) numerically transforms all priors to the same space, (2) uses log pooling to combine priors, and (3) then draws standard Bayesian inference. We use importance sampling methods, including an iterative, adaptive approach which is more flexible and has less bias in some instances than a simpler alternative. Our exploratory examples are the first steps toward extension of the approach for highly complex and even noninvertible models. Key Words: Prior Coherization, Adaptive Importance Sampling, Bayesian Statistics, Model Inversion. 1 Introduction Much research of natural processes and systems is bas...
Combining Expert Judgment By Hierarchical Modeling: An Application To Physician Staffing
, 1998
"... Expert panels are playing an increasingly important role in U.S. health policy decision making. A fundamental issue in these applications is how to synthesize the judgments of individual experts into a group judgment. In this paper we propose an approach to synthesis based on Bayesian hierarchical m ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
Expert panels are playing an increasingly important role in U.S. health policy decision making. A fundamental issue in these applications is how to synthesize the judgments of individual experts into a group judgment. In this paper we propose an approach to synthesis based on Bayesian hierarchical models, and apply it to the problem of determining physician staffing at medical centers operated by the U.S. Department of Veteran Affairs (VA). Our starting point is the so-called supra-Bayesian approach to synthesis, whose principal motivation in the present context is to generate an estimate of the uncertainty associated with a panel's evaluation of the number of physicians required under specified conditions. Hierarchical models are particularly natural in this context since variability in the experts' judgments results in part from heterogeneity in their baseline experiences at different VA medical centers. We derive alternative hierarchical Bayes synthesis distributions for the number ...
Description of the PRC CEO Algorithm for TREC-2
- The Second Text REtrieval Conference (TREC-2), volume NIST Special Publication 500-215
, 1993
"... This paper describes an application of the Combination of Expert Opinion technique to combine the results of multiple retrieval methods used on the TREC-2 collection. The methods being combined were weighted by their TREC-1 performance. 1. Introduction This paper describes work done on the TREC-2 p ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
This paper describes an application of the Combination of Expert Opinion technique to combine the results of multiple retrieval methods used on the TREC-2 collection. The methods being combined were weighted by their TREC-1 performance. 1. Introduction This paper describes work done on the TREC-2 project at PRC Inc. in collaboration with Professor Edward Fox and his colleagues at Virginia Polytechnic Institute and State University (VPI&SU). The reader should refer to the description of their system included in these working notes for further details on the common processing of the TREC-2 data shared by PRC and VPI&SU (Fox et al. 1993). PRC used its algorithm, the Combination of Expert Opinion (CEO), to combine the results of VPI&SU's runs. VPI&SU used a different combination technique for their final results. Originally the intent was that the CEO algorithm would be integrated with the SMART system used by VPI&SU. Both upper and lower level combination of results would take place, i.e...
A New Model for Probabilistic Information Retrieval on the Web
- Second SIAM International Conference on Data Mining (SDM 2002) Workshop on Web Analytics
, 2002
"... Academic research in information retrieval did not make its way into commercial retrieval products until the last 15 years. Early web search engines also made little use of information retrieval research, in part because of significant differences in the retrieval environment on the Web, such as hig ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
Academic research in information retrieval did not make its way into commercial retrieval products until the last 15 years. Early web search engines also made little use of information retrieval research, in part because of significant differences in the retrieval environment on the Web, such as higher transaction volume and much shorter queries. Recently, however, academic research has taken root in search engines. This paper describes recent developments with a probabilistic retrieval model originating prior to the Web, but with features which could lead to effective retrieval on the Web. Just as graph structure algorithms make use of the graph structure of hyper-linking on the Web, which can be considered a form of relevance judgments, the model of this paper suggests how relevance judgments of web searchers, not just web authors, can be taken into account in ranking. This paper also shows how the combination of expert opinion probabilistic information retrieval model can be made computationally efficient through a new derivation of the mean and standard deviation of the model’s main probability distribution.
Measuring Output Gap Uncertainty
, 2009
"... We propose a methodology for producing density forecasts for the output gap in real time using a large number of vector autoregessions in inflation and output gap measures. Density combination utilizes a linear mixture of experts framework to produce potentially non-Gaussian ensemble densities for t ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
We propose a methodology for producing density forecasts for the output gap in real time using a large number of vector autoregessions in inflation and output gap measures. Density combination utilizes a linear mixture of experts framework to produce potentially non-Gaussian ensemble densities for the unobserved output gap. In our application, we show that data revisions alter substantially our probabilistic assessments of the output gap using a variety of output gap measures derived from univariate detrending filters. The resulting ensemble produces well-calibrated forecast densities for US inflation in real time, in contrast to those from simple univariate autoregressions which ignore the contribution of the output gap. Combining evidence from both linear trends and more flexible univariate detrending filters induces strong multi-modality in the predictive densities for the unobserved output gap. The peaks associated with these two detrending methodologies indicate output gaps of opposite sign for some observations, reflecting the pervasive nature of model uncertainty in our US data.
Supra-Bayesian Pooling Of Priors Linked By A Deterministic Simulation Model
"... Deterministic simulation models are used to guide decision-making and enhance understanding of complex systems such as disease transmission, population dynamics, and tree plantation growth. Bayesian inference about parameters in deterministic simulation models can require the pooling of expert opini ..."
Abstract
- Add to MetaCart
Deterministic simulation models are used to guide decision-making and enhance understanding of complex systems such as disease transmission, population dynamics, and tree plantation growth. Bayesian inference about parameters in deterministic simulation models can require the pooling of expert opinion. One class of approaches to pooling expert opinion in this context is supra-Bayesian pooling, in which expert opinion is treated as data for an ultimate decision maker. This article details and compares two supra-Bayesian approaches|\event updating" and \parameter updating". The suitability of each approach in the context of deterministic simulation models is assessed based on theoretical properties, performance on examples, and the selection and sensitivity of required hyperparameters. In general, we favor a parameter updating approach because it uses more intuitive hyperparameters, it performs sensibly on examples, and because the alternative event updating approach fails to exhibit a desirable property (relative propensity consistency) in all cases. Inference in deterministic simulation models is an increasingly important statistical and practical problem, and supra-Bayesian methods represent one viable option for achieving a sensible pooling of expert opinion. 1.

