Results 1 
9 of
9
New Algorithms for Learning Incoherent and Overcomplete Dictionaries
, 2014
"... In sparse recovery we are given a matrix A ∈ Rn×m (“the dictionary”) and a vector of the form AX where X is sparse, and the goal is to recover X. This is a central notion in signal processing, statistics and machine learning. But in applications such as sparse coding, edge detection, compression an ..."
Abstract

Cited by 16 (0 self)
 Add to MetaCart
(Show Context)
In sparse recovery we are given a matrix A ∈ Rn×m (“the dictionary”) and a vector of the form AX where X is sparse, and the goal is to recover X. This is a central notion in signal processing, statistics and machine learning. But in applications such as sparse coding, edge detection, compression and super resolution, the dictionary A is unknown and has to be learned from random examples of the form Y = AX where X is drawn from an appropriate distribution — this is the dictionary learning problem. In most settings, A is overcomplete: it has more columns than rows. This paper presents a polynomialtime algorithm for learning overcomplete dictionaries; the only previously known algorithm with provable guarantees is the recent work of Spielman et al. (2012) who gave an algorithm for the undercomplete case, which is rarely the case in applications. Our algorithm applies to incoherent dictionaries which have been a central object of study since they were introduced in seminal work of Donoho and Huo (1999). In particular, a dictionary is µincoherent if each pair of columns has inner product at most µ/ n. The algorithm makes natural stochastic assumptions about the unknown sparse vector X, which can contain k ≤ cmin(√n/µ log n,m1/2−η) nonzero entries (for any η> 0). This is close to the best k allowable by the best sparse recovery algorithms even if one knows the dictionary A exactly. Moreover, both the running time and sample complexity depend on log 1/, where is the target accuracy, and so our algorithms converge very quickly to the true dictionary. Our algorithm can also tolerate substantial amounts of noise provided it is incoherent with respect to the dictionary (e.g., Gaussian). In the noisy setting, our running time and sample complexity depend polynomially on 1/, and this is necessary.
Modeling and Detecting Community Hierarchies
"... Abstract. Community detection has in recent years emerged as an invaluable tool for describing and quantifying interactions in networks. In this paper we propose a theoretical model that explicitly formalizes both the tight connections within each community and the hierarchical nature of the communi ..."
Abstract

Cited by 4 (3 self)
 Add to MetaCart
(Show Context)
Abstract. Community detection has in recent years emerged as an invaluable tool for describing and quantifying interactions in networks. In this paper we propose a theoretical model that explicitly formalizes both the tight connections within each community and the hierarchical nature of the communities. We further present an efficient algorithm that provably detects all the communities in our model. Experiments demonstrate that our definition successfully models real world communities, and our algorithm compares favorably with existing approaches.
Provable Algorithms for Machine Learning Problems
, 2013
"... Modern machine learning algorithms can extract useful information from text, images and videos. All these applications involve solving NPhard problems in average case using heuristics. What properties of the input allow it to be solved efficiently? Theoretically analyzing the heuristics is often v ..."
Abstract

Cited by 2 (0 self)
 Add to MetaCart
(Show Context)
Modern machine learning algorithms can extract useful information from text, images and videos. All these applications involve solving NPhard problems in average case using heuristics. What properties of the input allow it to be solved efficiently? Theoretically analyzing the heuristics is often very challenging. Few results were known. This thesis takes a different approach: we identify natural properties of the input, then design new algorithms that provably works assuming the input has these properties. We are able to give new, provable and sometimes practical algorithms for learning tasks related to text corpus, images and social networks. The first part of the thesis presents new algorithms for learning thematic structure in documents. We show under a reasonable assumption, it is possible to provably learn many topic models, including the famous Latent Dirichlet Allocation. Our algorithm is the first provable algorithms for topic modeling. An implementation runs 50 times faster than latest MCMC implementation and produces comparable results. The second part of the thesis provides ideas for provably learning deep, sparse representations. We start with sparse linear representations, and give the first algorithm for dictionary learning problem with provable guarantees. Then we apply similar ideas to deep learning: under reasonable assumptions our algorithms can learn a deep network built by denoising autoencoders. The final part of the thesis develops a framework for learning latent variable models. We demonstrate how various latent variable models can be reduced to orthogonal tensor decomposition, and then be solved using tensor power method. We give a tight perturbation analysis for tensor power method, which reduces the number of samples required to learn many latent variable models. In theory, the assumptions in this thesis help us understand why intractable problems in machine learning can often be solved; in practice, the results suggest inherently new approaches for machine learning. We hope the assumptions and algorithms inspire new research problems and learning algorithms. iii
Uncovering the Small Community Structure in Large Networks: A Local Spectral Approach
"... Large graphs arise in a number of contexts and understanding their structure and extracting information from them is an important research area. Early algorithms on mining communities have focused on the global structure, and often run in time functional to the size of the entire graph. Nowadays, ..."
Abstract

Cited by 1 (1 self)
 Add to MetaCart
Large graphs arise in a number of contexts and understanding their structure and extracting information from them is an important research area. Early algorithms on mining communities have focused on the global structure, and often run in time functional to the size of the entire graph. Nowadays, as we often explore networks with billions of vertices and find communities of size hundreds, it is crucial to shift our attention from macroscopic structure to microscopic structure when dealing with large networks. A growing body of work has been adopting local expansion methods in order to identify the community from a few exemplary seed members. In this paper, we propose a novel approach for finding overlapping communities called LEMON (Local Expansion
Why Steinertree type algorithms work for community detection
"... We consider the problem of reconstructing a specific connected community S ⊂ V in a graph G = (V,E), where each node v is associated with a signal whose strength grows with the likelihood that v belongs to S. This problem appears in social or protein interaction network, the latter also referred t ..."
Abstract
 Add to MetaCart
We consider the problem of reconstructing a specific connected community S ⊂ V in a graph G = (V,E), where each node v is associated with a signal whose strength grows with the likelihood that v belongs to S. This problem appears in social or protein interaction network, the latter also referred to as the signaling pathway reconstruction problem (BaillyBechet et al., 2011). We study this community reconstruction problem under several natural generative models, and make the following two contributions. First, in the context of social networks, where the signals are modeled as boundedsupported random variables, we design an efficient algorithm for recovering most members in S with wellcontrolled false positive overhead, by utilizing the network structure for a large family of “homogeneous ” generative models. This positive result is complemented by an information theoretic lower bound for the case where the network structure is unknown or the network is heterogeneous. Second, we consider the case in which the graph represents the protein interaction network, in which it is customary to consider signals that have unbounded support, we generalize our first contribution to give the first theoretical justification of why existing Steinertree type heuristics work well in practice. 1
Provably Fast Inference of Latent Features from Networks with Applications to Learning Social Circles and Multilabel Classification
"... A well known phenomenon in social networks is homophily, the tendency of agents to connect with similar agents. A derivative of this phenomenon is the emergence of communities. Another phenomenon observed in numerous networks is the existence of certain agents that belong simultaneously to multiple ..."
Abstract
 Add to MetaCart
(Show Context)
A well known phenomenon in social networks is homophily, the tendency of agents to connect with similar agents. A derivative of this phenomenon is the emergence of communities. Another phenomenon observed in numerous networks is the existence of certain agents that belong simultaneously to multiple communities. An understanding of these phenomena constitutes a central research topic of network science. In this work we focus on a fundamental theoretical question related to the above phenomena with various applications: given an undirected graph G, can we infer efficiently the latent vertex features which explain the observed network structure under the assumption of a generative model that exhibits homophily? We propose a probabilistic generative model with the property that the probability of an edge among two vertices is a nondecreasing function of the common features they possess. This property is true for many realworld networks and surprisingly is ignored by many popular overlapping community detection methods as it was shown recently by the empirical work of Yang and Leskovec [44]. Our main theoretical contribution is the first provably rapidly mixing Markov chain for inferring latent features. On the experimental side, we verify the efficiency of our method in terms of run times, where we observe that it significantly outperforms stateoftheart methods. Our method is more than 2 400 times faster than a stateoftheart machine learning method [37] and typically provides nontrivial speedups compared to BigClam [43]. Furthermore, we verify on realdata with groundtruth available that our method learns efficiently high quality labelings. We use our method to learn social circles from Twitter egonetworks and perform multilabel classification.
FixedPoints of Social Choice: An Axiomatic Approach to Network Communities
, 2014
"... ar ..."
(Show Context)