Results 1  10
of
305
Comprehensive Survey on Distance/Similarity Measures between Probability Density Functions
 INTERNATIONAL JOURNAL OF MATHEMATICAL MODELS AND METHODS IN APPLIED SCIENCES
, 2007
"... Distance or similarity measures are essential to solve many pattern recognition problems such as classification, clustering, and retrieval problems. Various distance/similarity measures that are applicable to compare two probability density functions, pdf in short, are reviewed and categorized in b ..."
Abstract

Cited by 93 (0 self)
 Add to MetaCart
Distance or similarity measures are essential to solve many pattern recognition problems such as classification, clustering, and retrieval problems. Various distance/similarity measures that are applicable to compare two probability density functions, pdf in short, are reviewed and categorized in both syntactic and semantic relationships. A correlation coefficient and a hierarchical clustering technique are adopted to reveal similarities among numerous distance/similarity measures.
Development of quantitative structureactivity relationships and its application in rational drug design
"... Abstract—Quantitative structureactivity relationships are mathematical models constructed based on the hypothesis that structure of chemical compounds is related to their biological activity. A linear regression model is often used to estimate and/or to predict the nature of the relationship betwe ..."
Abstract

Cited by 57 (1 self)
 Add to MetaCart
Abstract—Quantitative structureactivity relationships are mathematical models constructed based on the hypothesis that structure of chemical compounds is related to their biological activity. A linear regression model is often used to estimate and/or to predict the nature of the relationship between a measured activity and some measure or calculated descriptors. Linear regression helps to answer main three questions: does the biological activity depend on structure information; if so, the nature of the relationship is linear; and if yes, how good is the model in prediction of the biological activity of new compound(s). This manuscript presents the steps on linear regression analysis moving from theoretical knowledge to an example conducted on sets of endocrine disrupting chemicals. Keywordsrobust regression; validation; diagnostic; pre
Karl Pearson and the ChiSquared test,”
 International Statistical Review,
, 1983
"... ..."
(Show Context)
Relative DensityRatio Estimation for Robust Distribution Comparison
"... Divergence estimators based on direct approximation of densityratios without going through separate approximation of numerator and denominator densities have been successfully applied to machine learning tasks that involve distribution comparison such as outlier detection, transfer learning, and tw ..."
Abstract

Cited by 27 (18 self)
 Add to MetaCart
(Show Context)
Divergence estimators based on direct approximation of densityratios without going through separate approximation of numerator and denominator densities have been successfully applied to machine learning tasks that involve distribution comparison such as outlier detection, transfer learning, and twosample homogeneity test. However, since densityratio functions often possess high fluctuation, divergence estimation is still a challenging task in practice. In this paper, we propose to use relative divergences for distribution comparison, which involves approximation of relative densityratios. Since relative densityratios are always smoother than corresponding ordinary densityratios, our proposed method is favorable in terms of the nonparametric convergence speed. Furthermore, we show that the proposed divergence estimator has asymptotic variance independent of the model complexity under a parametric setup, implying that the proposed estimator hardly overfits even with complex models. Through experiments, we demonstrate the usefulness of the proposed approach. 1
The size distribution of Chinese cities
 Regional Science and Urban Economics
"... This paper uses Chinese urban data to investigate two important issues regarding city size distribution: the nature in which cities of different size grow relative to each other, and which distribution is the best approximation of city size distribution. First, we examine how cities of different siz ..."
Abstract

Cited by 22 (0 self)
 Add to MetaCart
(Show Context)
This paper uses Chinese urban data to investigate two important issues regarding city size distribution: the nature in which cities of different size grow relative to each other, and which distribution is the best approximation of city size distribution. First, we examine how cities of different sizes grow relative to each other. While most empirical literatures suggest that relative size and rank of cities remain stable over time, we find reform period since 1980 delivered significant structural change in Chinese urban system. The city size distribution remains stable before the reform but shows convergent pattern of growth in postreform period. Secondly, we use Pearson goodnessoffit test to examine which distribution is the best approximation of city size distribution. A parallel study of city size distribution in China and U.S. reveals substantial differences: a lognormal distribution in case of China and a Pareto distribution in case of US. JEL classification: C24 R12 Keyword: Zipf’s Law
Stochastic grammatical inference with Multinomial Tests
, 2002
"... We present a new statistical framework for stochastic grammatical inference algorithms based on a state merging strategy. We propose to use multinomial statistical tests to decide which states should be merged. This approach has three main advantages. First, since it is not based on asymptotic resul ..."
Abstract

Cited by 18 (1 self)
 Add to MetaCart
We present a new statistical framework for stochastic grammatical inference algorithms based on a state merging strategy. We propose to use multinomial statistical tests to decide which states should be merged. This approach has three main advantages. First, since it is not based on asymptotic results, small sample case can be specifically dealt with. Second, all the probabilities associated to a state are included in a single test so that statistical evidence is cumulated. Third, a statistical score is associated to each possible merging operation and can be used for bestfirst strategy. Improvement over classical stochastic grammatical inference algorithm is shown on artificial data.
Quantum Hypothesis Testing and NonEquilibrium Statistical Mechanics
"... We extend the mathematical theory of quantum hypothesis testing to the general W∗algebraic setting and explore its relation with recent developments in nonequilibrium quantum statistical mechanics. In particular, we relate the large deviation principle for the full counting statistics of entropy f ..."
Abstract

Cited by 13 (7 self)
 Add to MetaCart
We extend the mathematical theory of quantum hypothesis testing to the general W∗algebraic setting and explore its relation with recent developments in nonequilibrium quantum statistical mechanics. In particular, we relate the large deviation principle for the full counting statistics of entropy flow to quantum hypothesis testing of the arrow of time.
Three Centuries of Categorical Data Analysis: Loglinear Models and Maximum Likelihood Estimation
"... The common view of the history of contingency tables is that it begins in 1900 with the work of Pearson and Yule, but it extends back at least into the 19th century. Moreover it remains an active area of research today. In this paper we give an overview of this history focussing on the development o ..."
Abstract

Cited by 13 (2 self)
 Add to MetaCart
The common view of the history of contingency tables is that it begins in 1900 with the work of Pearson and Yule, but it extends back at least into the 19th century. Moreover it remains an active area of research today. In this paper we give an overview of this history focussing on the development of loglinear models and their estimation via the method of maximum likelihood. S. N. Roy played a crucial role in this development with two papers coauthored with his students S. K. Mitra and Marvin Kastenbaum, at roughly the midpoint temporally in this development. Then we describe a problem that eluded Roy and his students, that of the implications of sampling zeros for the existence of maximum likelihood estimates for loglinear models. Understanding the problem of nonexistence is crucial to the analysis of large sparse contingency tables. We introduce some relevant results from the application of algebraic geometry to the study of this statistical problem. 1
Refinement Inequalities Among Symmetric Divergence Measures The Australian
 Available online at: arXiv:math.PR/0501303 v1 19
, 2005
"... Abstract. There are three classical divergence measures in the literature on information theory and statistics, namely, JeffryesKullbackLeiber’s Jdivergence, SibsonBurbeaRao’s JensenShannon divegernce and Taneja’s arithemtic geometric mean divergence. These bear an interesting relationship am ..."
Abstract

Cited by 13 (9 self)
 Add to MetaCart
(Show Context)
Abstract. There are three classical divergence measures in the literature on information theory and statistics, namely, JeffryesKullbackLeiber’s Jdivergence, SibsonBurbeaRao’s JensenShannon divegernce and Taneja’s arithemtic geometric mean divergence. These bear an interesting relationship among each other and are based on logarithmic expressions. The divergence measures like Hellinger discrimination, symmetric χ 2 −divergence, and triangular discrimination are not based on logarithmic expressions. These six divergence measures are symmetric with respect to probability distributions. In this paper some interesting inequalities among these symmetric divergence measures are studied. Refinement over these inequalities is also given. Some inequalities due to Dragomir et al. [6] are also improved. Γn =