Results 1 
9 of
9
Decoding binary node labels from censored edge measurements: Phase transition and efficient recovery
, 2014
"... We consider the problem of clustering a graphG into two communities by observing a subset of the vertex correlations. Specifically, we consider the inverse problem with observed variables Y = BGx⊕Z, where BG is the incidence matrix of a graph G, x is the vector of unknown vertex variables (with a ..."
Abstract

Cited by 12 (6 self)
 Add to MetaCart
(Show Context)
We consider the problem of clustering a graphG into two communities by observing a subset of the vertex correlations. Specifically, we consider the inverse problem with observed variables Y = BGx⊕Z, where BG is the incidence matrix of a graph G, x is the vector of unknown vertex variables (with a uniform prior) and Z is a noise vector with Bernoulli(ε) i.i.d. entries. All variables and operations are Boolean. This model is motivated by coding, synchronization, and community detection problems. In particular, it corresponds to a stochastic block model or a correlation clustering problem with two communities and censored edges. Without noise, exact recovery (up to global flip) of x is possible if and only the graph G is connected, with a sharp threshold at the edge probability log(n)/n for ErdősRényi random graphs. The first goal of this paper is to determine how the edge probability p needs to scale to allow exact recovery in the presence of noise. Defining the degree (oversampling) rate of the graph by α = np / log(n), it is shown that exact recovery is possible if and only if α> 2/(1 − 2ε)2 + o(1/(1 − 2ε)2). In other words, 2/(1 − 2ε)2 is the information theoretic threshold for exact recovery at lowSNR. In addition, an efficient recovery algorithm based on semidefinite programming is proposed and shown to succeed in the threshold regime up to twice the optimal rate. For a deterministic graph G, defining the degree rate as α = d / log(n), where d is the minimum degree of the graph, it is shown that the proposed method achieves the rate α> 4((1 + λ)/(1 − λ)2)/(1 − 2ε)2 + o(1/(1 − 2ε)2), where 1 − λ is the spectral gap of the graph G. A preliminary version of this paper appeared in ISIT 2014 [ABBS14].
Nearoptimal joint object matching via convex relaxation. arxiv preprint arXiv:1402.1473
, 2014
"... Joint object matching aims at aggregating information from a large collection of similar instances (e.g. images, graphs, shapes) to improve the correspondences computed between pairs of objects, typically by exploiting global map compatibility. Despite some practical advances on this problem, fro ..."
Abstract

Cited by 9 (1 self)
 Add to MetaCart
(Show Context)
Joint object matching aims at aggregating information from a large collection of similar instances (e.g. images, graphs, shapes) to improve the correspondences computed between pairs of objects, typically by exploiting global map compatibility. Despite some practical advances on this problem, from the theoretical point of view, the errorcorrection ability of existing algorithms are limited by a constant barrier — none of them can provably recover the correct solution when more than a constant fraction of input correspondences are corrupted. Moreover, prior approaches focus mostly on fully similar objects, while it is practically more demanding and realistic to match instances that are only partially similar to each other. In this paper, we propose an algorithm to jointly match multiple objects that exhibit only partial similarities, where the provided pairwise feature correspondences can be densely corrupted. By encoding a consistent partial map collection into a 01 semidefinite matrix, we attempt recovery via a twostep procedure, that is, a spectral technique followed by a parameterfree convex program called MatchLift. Under a natural randomized model, MatchLift exhibits nearoptimal errorcorrection ability, i.e. it guarantees the recovery of the groundtruth maps even when a dominant fraction of the inputs are randomly corrupted. We evaluate the proposed algorithm on various benchmark data sets including synthetic examples and realworld examples, all of which confirm the practical applicability of the proposed algorithm.
Linear inverse problems on ErdősRényi graphs: Informationtheoretic limits and efficient recovery
"... Abstract—This paper considers the inverse problem with observed variables Y = BGX ⊕Z, where BG is the incidence matrix of a graph G, X is the vector of unknown vertex variables with a uniform prior, and Z is a noise vector with Bernoulli(ε) i.i.d. entries. All variables and operations are Boolean. T ..."
Abstract

Cited by 7 (5 self)
 Add to MetaCart
(Show Context)
Abstract—This paper considers the inverse problem with observed variables Y = BGX ⊕Z, where BG is the incidence matrix of a graph G, X is the vector of unknown vertex variables with a uniform prior, and Z is a noise vector with Bernoulli(ε) i.i.d. entries. All variables and operations are Boolean. This model is motivated by coding, synchronization, and community detection problems. In particular, it corresponds to a stochastic block model or a correlation clustering problem with two communities and censored edges. Without noise, exact recovery of X is possible if and only the graph G is connected, with a sharp threshold at the edge probability log(n)/n for ErdősRényi random graphs. The first goal of this paper is to determine how the edge probability p needs to scale to allow exact recovery in the presence of noise. Defining the degree (oversampling) rate of the graph by α = np / log(n), it is shown that exact recovery is possible if and only if α> 2/(1−2ε)2+o(1/(1−2ε)2). In other words, 2/(1−2ε)2 is the information theoretic threshold for exact recovery at lowSNR. In addition, an efficient recovery algorithm based on semidefinite programming is proposed and shown to succeed in the threshold regime up to twice the optimal rate. Full version available in [1]. I.
Tight error bounds for structured prediction
, 2014
"... Structured prediction tasks in machine learning involve the simultaneous prediction of multiple labels. This is typically done by maximizing a score function on the space of labels, which decomposes as a sum of pairwise elements, each depending on two specific labels. Intuitively, the more pairwise ..."
Abstract

Cited by 3 (0 self)
 Add to MetaCart
Structured prediction tasks in machine learning involve the simultaneous prediction of multiple labels. This is typically done by maximizing a score function on the space of labels, which decomposes as a sum of pairwise elements, each depending on two specific labels. Intuitively, the more pairwise terms are used, the better the expected accuracy. However, there is currently no theoretical account of this intuition. This paper takes a significant step in this direction. We formulate the problem as classifying the vertices of a known graph G = (V,E), where the vertices and edges of the graph are labelled and correlate semirandomly with the ground truth. We show that the prospects for achieving low expected Hamming error depend on the structure of the graph G in interesting ways. For example, if G is a very poor expander, like a path, then large expected Hamming error is inevitable. Our main positive result shows that, for a wide class of graphs including 2D grid graphs common in machine vision applications, there is a polynomialtime algorithm with small and informationtheoretically nearoptimal expected error. Our results provide a first step toward a theoretical justification for the empirical success of the efficient approximate inference algorithms that are used for structured prediction in models where exact inference is intractable.
Asymptotic Mutual Information for the TwoGroups Stochastic Block Model
, 2015
"... We develop an informationtheoretic view of the stochastic block model, a popular statistical model for the largescale structure of complex networks. A graph G from such a model is generated by first assigning vertex labels at random from a finite alphabet, and then connecting vertices with edge pr ..."
Abstract
 Add to MetaCart
We develop an informationtheoretic view of the stochastic block model, a popular statistical model for the largescale structure of complex networks. A graph G from such a model is generated by first assigning vertex labels at random from a finite alphabet, and then connecting vertices with edge probabilities depending on the labels of the endpoints. In the case of the symmetric twogroup model, we establish an explicit ‘singleletter’ characterization of the pervertex mutual information between the vertex labels and the graph. The explicit expression of the mutual information is intimately related to estimationtheoretic quantities, and –in particular – reveals a phase transition at the critical point for community detection. Below the critical point the pervertex mutual information is asymptotically the same as if edges were independent. Correspondingly, no algorithm can estimate the partition better than random guessing. Conversely, above the threshold, the pervertex mutual information is strictly smaller than the independentedges upper bound. In this regime there exists a procedure that estimates the vertex labels better than random guessing.
ISIT 2015 Tutorial: Information Theory and Machine Learning
"... Abstract We are in the midst of a data deluge, with an explosion in the volume and richness of data sets in fields including social networks, biology, natural language processing, and computer vision, among others. In all of these areas, machine learning has been extraordinarily successful in provi ..."
Abstract
 Add to MetaCart
(Show Context)
Abstract We are in the midst of a data deluge, with an explosion in the volume and richness of data sets in fields including social networks, biology, natural language processing, and computer vision, among others. In all of these areas, machine learning has been extraordinarily successful in providing tools and practical algorithms for extracting information from massive data sets (e.g., genetics, multispectral imaging, Google and FaceBook). Despite this tremendous practical success, relatively less attention has been paid to fundamental limits and tradeoffs, and information theory has a crucial role to play in this context. The goal of this tutorial is to demonstrate how informationtheoretic techniques and concepts can be brought to bear on machine learning problems in unorthodox and fruitful ways. We discuss how any learning problem can be formalized in a Shannontheoretic sense, albeit one that involves nontraditional notions of codewords and channels. This perspective allows informationtheoretic toolsincluding information measures, Fano's inequality, random coding arguments, and so onto be brought to bear on learning problems. We illustrate this broad perspective with discussions of several learning problems, including sparse approximation, dimensionality reduction, graph recovery, clustering, and community detection. We emphasise recent results establishing the fundamental limits of graphical model learning and community detection. We also discuss the distinction between the learningtheoretic capacity when arbitrary "decoding" algorithms are allowed, and notions of computationallyconstrained capacity. Finally, a number of open problems and conjectures at the interface of information theory and machine learning will be discussed.
Volume xx (200y), Number z, pp. 1–12 Consistent Partial Matching of Shape Collections via Sparse Modeling
"... Figure 1: A partial multiway correspondence obtained with our approach on a heterogeneous collection of shapes. Our method does not require initial pairwise maps as input, as it actively seeks a reliable correspondence by operating directly over the space of joint, cycleconsistent matches. Partial ..."
Abstract
 Add to MetaCart
(Show Context)
Figure 1: A partial multiway correspondence obtained with our approach on a heterogeneous collection of shapes. Our method does not require initial pairwise maps as input, as it actively seeks a reliable correspondence by operating directly over the space of joint, cycleconsistent matches. Partiallysimilar as well as outlier shapes are automatically detected and accounted for by adopting a sparse model for the joint correspondence. A subset of all matches is shown for visualization purposes. Recent efforts in the area of joint object matching approach the problem by taking as input a set of pairwise maps, which are then jointly optimized across the whole collection so that certain accuracy and consistency criteria are satisfied. One natural requirement is cycleconsistency – namely the fact that map composition should give the same result regardless of the path taken in the shape collection. In this paper, we introduce a novel approach to obtain consistent matches without requiring initial pairwise solutions to be given as input. We do so by optimizing a joint measure of metric distortion directly over the space of cycleconsistent maps; in order to allow for partiallysimilar and extraclass shapes, we formulate the problem as a series of quadratic programs with sparsityinducing constraints, making our technique a natural candidate for analyzing collections with a large presence of outliers. The particular form of the problem allows us to leverage results and tools from the field of evolutionary game theory. This enables a highly efficient optimization procedure which assures accurate and provably consistent solutions in a matter of minutes in collections with hundreds of shapes.
Exact Recovery Threshold in the Binary Censored Block Model
, 2015
"... Binary censored block model G = ([n], E) and ∈ [0, 1/2] 1 Color the vertices in green or red arbitrarily 2 If endpoints in same color, color edge in blue (orange) w.p. 1 − () ..."
Abstract
 Add to MetaCart
(Show Context)
Binary censored block model G = ([n], E) and ∈ [0, 1/2] 1 Color the vertices in green or red arbitrarily 2 If endpoints in same color, color edge in blue (orange) w.p. 1 − ()
How Hard is Inference for Structured Prediction?
, 2015
"... Structured prediction tasks in machine learning involve the simultaneous prediction of multiple labels. This is often done by maximizing a score function on the space of labels, which decomposes as a sum of pairwise elements, each depending on two specific labels. The goal of this paper is to develo ..."
Abstract
 Add to MetaCart
Structured prediction tasks in machine learning involve the simultaneous prediction of multiple labels. This is often done by maximizing a score function on the space of labels, which decomposes as a sum of pairwise elements, each depending on two specific labels. The goal of this paper is to develop a theoretical explanation of the empirical effectiveness of heuristic inference algorithms for solving such structured prediction problems. We study the minimumachievable expected Hamming error in such problems, highlighting the case of 2D grid graphs, which are common in machine vision applications. Our main theorems provide tight upper and lower bounds on this error, as well as a polynomialtime algorithm that achieves the bound.