Results 1  10
of
31
Video Suggestion and Discovery for YouTube: Taking Random Walks Through the View Graph
, 2008
"... The rapid growth of the number of videos in YouTube provides enormous potential for users to find content of interest to them. Unfortunately, given the difficulty of searching videos, the size of the video repository also makes the discovery of new content a daunting task. In this paper, we present ..."
Abstract

Cited by 93 (6 self)
 Add to MetaCart
(Show Context)
The rapid growth of the number of videos in YouTube provides enormous potential for users to find content of interest to them. Unfortunately, given the difficulty of searching videos, the size of the video repository also makes the discovery of new content a daunting task. In this paper, we present a novel method based upon the analysis of the entire user–video graph to provide personalized video suggestions for users. The resulting algorithm, termed Adsorption, provides a simple method to efficiently propagate preference information through a variety of graphs. We extensively test the results of the recommendations on a three month snapshot of live data from YouTube.
Locally constrained diffusion process on locally densified distance spaces with applications to shape retrieval.
 In Proc. of CVPR,
, 2009
"... Abstract The ..."
(Show Context)
Robust and Scalable GraphBased Semisupervised Learning
, 2012
"... Graphbased semisupervised learning (GSSL) provides a promising paradigm for modeling the manifold structures that may exist in massive data sources in highdimensional spaces. It has been shown effective in propagating a limited amount of initial labels to a large amount of unlabeled data, matching ..."
Abstract

Cited by 12 (7 self)
 Add to MetaCart
Graphbased semisupervised learning (GSSL) provides a promising paradigm for modeling the manifold structures that may exist in massive data sources in highdimensional spaces. It has been shown effective in propagating a limited amount of initial labels to a large amount of unlabeled data, matching the needs of many emerging applications such as image annotation and information retrieval. In this paper, we provide reviews of several classical GSSL methods and a few promising methods in handling challenging issues often encountered in webscale applications. First, to successfully incorporate the contaminated noisy labels associated with web data, label diagnosis and tuning techniques applied to GSSL are surveyed. Second, to support scalability to the gigantic scale (millions or billions of samples), recent solutions based on anchor graphs are reviewed. To help researchers pursue new ideas in this area, we also summarize a few popular data sets and software tools publicly available. Important open issues are discussed at the end to stimulate future research.
NODE CLASSIFICATION IN SOCIAL NETWORKS
"... When dealing with large graphs, such as those that arise in the context of online social networks, a subset of nodes may be labeled. These labels can indicate demographic values, interest, beliefs or other characteristics of the nodes (users). A core problem is to use this information to extend the ..."
Abstract

Cited by 11 (0 self)
 Add to MetaCart
(Show Context)
When dealing with large graphs, such as those that arise in the context of online social networks, a subset of nodes may be labeled. These labels can indicate demographic values, interest, beliefs or other characteristics of the nodes (users). A core problem is to use this information to extend the labeling so that all nodes are assigned a label (or labels). In this chapter, we survey techniques that have been proposed for this problem. We consider two broad categories: methods based on iterative application of traditional classifiers using graph information as features, and methods which propagate the existing labels via random walks. We adopt a common perspective on these methods to highlight the similarities between different approaches within and across the two categories. We also describe some extensions and related directions to the central problem of graph labeling.
Regularized multiclass semisupervised boosting
 In CVPR
, 2007
"... Many semisupervised learning algorithms only deal with binary classification. Their extension to the multiclass problem is usually obtained by repeatedly solving a set of binary problems. Additionally, many of these methods do not scale very well with respect to a large number of unlabeled samples ..."
Abstract

Cited by 10 (3 self)
 Add to MetaCart
(Show Context)
Many semisupervised learning algorithms only deal with binary classification. Their extension to the multiclass problem is usually obtained by repeatedly solving a set of binary problems. Additionally, many of these methods do not scale very well with respect to a large number of unlabeled samples, which limits their applications to largescale problems with many classes and unlabeled samples. In this paper, we directly address the multiclass semisupervised learning problem by an efficient boosting method. In particular, we introduce a new multiclass marginmaximizing loss function for the unlabeled data and use the generalized expectation regularization for incorporating cluster priors into the model. Our approach enables efficient usage of very large data sets. The performance and efficiency of our method is demonstrated on both standard machine learning data sets as well as on challenging object categorization tasks. 1.
SemiSupervised Learning with the Graph Laplacian: The Limit of Infinite Unlabelled Data
, 2009
"... We study the behavior of the popular Laplacian Regularization method for SemiSupervised Learning at the regime of a fixed number of labeled points but a large number of unlabeled points. We show that in R d, d � 2, the method is actually not wellposed, and as the number of unlabeled points increas ..."
Abstract

Cited by 10 (3 self)
 Add to MetaCart
(Show Context)
We study the behavior of the popular Laplacian Regularization method for SemiSupervised Learning at the regime of a fixed number of labeled points but a large number of unlabeled points. We show that in R d, d � 2, the method is actually not wellposed, and as the number of unlabeled points increases the solution degenerates to a noninformative function. We also contrast the method with the Laplacian Eigenvector method, and discuss the “smoothness” assumptions associated with this alternate method.
COSNet: a Cost Sensitive Neural Network for Semisupervised Learning in Graphs
"... Abstract. The semisupervised problem of learning node labels in graphs consists, given a partial graph labeling, in inferring the unknown labels of the unlabeled vertices. Several machine learning algorithms have been proposed for solving this problem, including Hopfield networks and label propagat ..."
Abstract

Cited by 6 (4 self)
 Add to MetaCart
(Show Context)
Abstract. The semisupervised problem of learning node labels in graphs consists, given a partial graph labeling, in inferring the unknown labels of the unlabeled vertices. Several machine learning algorithms have been proposed for solving this problem, including Hopfield networks and label propagation methods; however, some issues have been only partially considered, e.g. the preservation of the prior knowledge and the unbalance between positive and negative labels. To address these items, we propose a Hopfieldbased cost sensitive neural network algorithm (COSNet). The method factorizes the solution of the problem in two parts: 1) the subnetwork composed by the labelled vertices is considered, and the network parameters are estimated through a supervised algorithm; 2) the estimated parameters are extended to the subnetwork composed of the unlabeled vertices, and the attractor reached by the dynamics of this subnetwork allows to predict the labeling of the unlabeled vertices. The proposed method embeds in the neural algorithm the ”a priori ” knowledge coded in the labelled part of the graph, and separates node labels and neuron states, allowing to differentially weight positive and negative node labels. Moreover, COSNet introduces an efficient costsensitive strategy which allows to learn the nearoptimal parameters of the network in order to take into account the unbalance between positive and negative node labels. Finally, the dynamics of the network is restricted to its unlabeled part, preserving the minimization of the overall objective function and significantly reducing the time complexity of the learning algorithm. COSNet has been applied to the genomewide prediction of gene function in a model organism. The results, compared with those obtained by other semisupervised label propagation algorithms and supervised machine learning methods, show the effectiveness of the proposed approach. 1
SemiSupervised Learning Using Greedy MaxCut
"... Graphbased semisupervised learning (SSL) methods play an increasingly important role in practical machine learning systems, particularly in agnostic settings when no parametric information or other prior knowledge is available about the data distribution. Given the constructed graph represented by ..."
Abstract

Cited by 4 (1 self)
 Add to MetaCart
(Show Context)
Graphbased semisupervised learning (SSL) methods play an increasingly important role in practical machine learning systems, particularly in agnostic settings when no parametric information or other prior knowledge is available about the data distribution. Given the constructed graph represented by a weight matrix, transductive inference is used to propagate known labels to predict the values of all unlabeled vertices. Designing a robust label diffusion algorithm for such graphs is a widely studied problem and various methods have recently been suggested. Many of these can be formalized as regularized function estimation through the minimization of a quadratic cost. However, most existing label diffusion methods minimize a univariate cost with the classification function as the only variable of interest. Since the observed labels seed the diffusion process, such univariate frameworks are extremely sensitive to the initial label choice and any label noise. To alleviate the dependency on the initial observed labels, this article proposes a bivariate formulation for graphbased SSL, where both the binary label information and a continuous classification function are arguments of the optimization. This bivariate formulation is shown to be equivalent to a linearly constrained MaxCut problem. Finally an efficient solution via greedy gradient MaxCut
LargeScale Machine Learning for Classification and Search
, 2012
"... With the rapid development of the Internet, nowadays tremendous amounts of data including images and videos, up to millions or billions, can be collected for training machine learning models. Inspired by this trend, this thesis is dedicated to developing largescale machine learning techniques for t ..."
Abstract

Cited by 2 (1 self)
 Add to MetaCart
(Show Context)
With the rapid development of the Internet, nowadays tremendous amounts of data including images and videos, up to millions or billions, can be collected for training machine learning models. Inspired by this trend, this thesis is dedicated to developing largescale machine learning techniques for the purpose of making classification and nearest neighbor search practical on gigantic databases. Our first approach is to explore data graphs to aid classification and nearest neighbor search. A graph offers an attractive way of representing data and discovering the essential information such as the neighborhood structure. However, both of the graph construction process and graphbased learning techniques become computationally prohibitive at a large scale. To this end, we present an efficient large graph construction approach and subsequently apply it to develop scalable semisupervised learning and unsupervised hashing algorithms. Our unique contributions on the graphrelated topics include: 1. Large Graph Construction: Conventional neighborhood graphs such as kNN graphs require a quadratic time complexity, which is inadequate for largescale applications mentioned above. To overcome this bottleneck, we present a novel graph construction approach,