Results 1 - 10
of
64
L.S.: Learning optimal ranking with tensor factorization for tag recommendation
- In: KDD ’09: Proceeding of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
, 2009
"... Tag recommendation is the task of predicting a personalized list of tags for a user given an item. This is important for many websites with tagging capabilities like last.fm or delicious. In this paper, we propose a method for tag recommendation based on tensor factorization (TF). In contrast to oth ..."
Abstract
-
Cited by 60 (3 self)
- Add to MetaCart
(Show Context)
Tag recommendation is the task of predicting a personalized list of tags for a user given an item. This is important for many websites with tagging capabilities like last.fm or delicious. In this paper, we propose a method for tag recommendation based on tensor factorization (TF). In contrast to other TF methods like higher order singular value decomposition (HOSVD), our method RTF (‘ranking with tensor factorization’) directly optimizes the factorization model for the best personalized ranking. RTF handles missing values and learns from pairwise ranking constraints. Our optimization criterion for TF is motivated by a detailed analysis of the problem and of interpretation schemes for the observed data in tagging systems. In all, RTF directly optimizes for the actual problem using a correct interpretation of the data. We provide a gradient descent algorithm to solve our optimization problem. We also provide an improved learning and prediction method with runtime complexity analysis for RTF. The prediction runtime of RTF is independent of the number of observations and only depends on the factorization dimensions. Besides the theoretical analysis, we empirically show that our method outperforms other state-of-theart tag recommendation methods like FolkRank, PageRank and HOSVD both in quality and prediction runtime.
HADI: Mining radii of large graphs
- ACM Transactions on Knowledge Discovery from Data
, 2010
"... Given large, multi-million node graphs (e.g., Facebook, web-crawls, etc.), how do they evolve over time? How are they connected? What are the central nodes and the outliers? In this paper we define the Radius plot of a graph and show how it can answer these questions. However, computing the Radius p ..."
Abstract
-
Cited by 33 (10 self)
- Add to MetaCart
(Show Context)
Given large, multi-million node graphs (e.g., Facebook, web-crawls, etc.), how do they evolve over time? How are they connected? What are the central nodes and the outliers? In this paper we define the Radius plot of a graph and show how it can answer these questions. However, computing the Radius plot is prohibitively expensive for graphs reaching the planetary scale. There are two major contributions in this paper: (a) We propose HADI (HAdoop DIameter and radii estimator), a carefully designed and fine-tuned algorithm to compute the radii and the diameter of massive graphs, that runs on the top of the Hadoop/MapReduce system, with excellent scale-up on the number of available machines (b) We run HADI on several real world datasets including YahooWeb (6B edges, 1/8 of a Terabyte), one of the largest public graphs ever analyzed. Thanks to HADI, we report fascinating patterns on large networks, like the surprisingly small effective diameter, the multi-modal/bi-modal shape of the Radius plot, and its palindrome motion over time.
A Unified Framework for Providing Recommendations in Social Tagging Systems Based on Ternary Semantic Analysis
"... Abstract—Social Tagging is the process by which many users add metadata in the form of keywords, to annotate and categorize items (songs, pictures, web links, products, etc.). Social tagging systems (STSs) can provide three different types of recommendations: They can recommend 1) tags to users, bas ..."
Abstract
-
Cited by 28 (4 self)
- Add to MetaCart
(Show Context)
Abstract—Social Tagging is the process by which many users add metadata in the form of keywords, to annotate and categorize items (songs, pictures, web links, products, etc.). Social tagging systems (STSs) can provide three different types of recommendations: They can recommend 1) tags to users, based on what tags other users have used for the same items, 2) items to users, based on tags they have in common with other similar users, and 3) users with common social interest, based on common tags on similar items. However, users may have different interests for an item, and items may have multiple facets. In contrast to the current recommendation algorithms, our approach develops a unified framework to model the three types of entities that exist in a social tagging system: users, items, and tags. These data are modeled by a 3-order tensor, on which multiway latent semantic analysis and dimensionality reduction is performed using both the Higher Order Singular Value Decomposition (HOSVD) method and the Kernel-SVD smoothing technique. We perform experimental comparison of the proposed method against state-of-the-art recommendation algorithms with two real data sets (Last.fm and BibSonomy). Our results show significant improvements in terms of effectiveness measured through recall/precision. Index Terms—Social tags, recommender systems, tensors, HOSVD. Ç
Radius Plots for Mining Tera-byte Scale Graphs: Algorithms, Patterns, and Observations
"... Given large, multi-million node graphs (e.g., FaceBook, web-crawls, etc.), how do they evolve over time? How are they connected? What are the central nodes and the outliers of the graphs? We show that the Radius Plot (pdf of node radii) can answer these questions. However, computing the Radius Plot ..."
Abstract
-
Cited by 22 (16 self)
- Add to MetaCart
(Show Context)
Given large, multi-million node graphs (e.g., FaceBook, web-crawls, etc.), how do they evolve over time? How are they connected? What are the central nodes and the outliers of the graphs? We show that the Radius Plot (pdf of node radii) can answer these questions. However, computing the Radius Plot is prohibitively expensive for graphs reaching the planetary scale. There are two major contributions in this paper: (a) We propose HADI (HAdoop DIameter and radii estimator), a carefully designed and fine-tuned algorithm to compute the diameter of massive graphs, that runs on the top of the HADOOP /MAPREDUCE system, with excellent scale-up on the number of available machines (b) We run HADI on several real world datasets including YahooWeb (6B edges, 1/8 of a Terabyte), one of the largest public graphs ever analyzed. Thanks to HADI, we report fascinating patterns on large networks, like the surprisingly small effective diameter, the multi-modal/bi-modal shape of the Radius Plot, and its palindrome motion over time. 1
GigaTensor: Scaling Tensor Analysis Up By 100 Times- Algorithms and Discoveries
"... Many data are modeled as tensors, or multi dimensional arrays. Examples include the predicates (subject, verb, object) in knowledge bases, hyperlinks and anchor texts in the Web graphs, sensor streams (time, location, and type), social networks over time, and DBLP conference-author-keyword relations ..."
Abstract
-
Cited by 21 (6 self)
- Add to MetaCart
(Show Context)
Many data are modeled as tensors, or multi dimensional arrays. Examples include the predicates (subject, verb, object) in knowledge bases, hyperlinks and anchor texts in the Web graphs, sensor streams (time, location, and type), social networks over time, and DBLP conference-author-keyword relations. Tensor decomposition is an important data mining tool with various applications including clustering, trend detection, and anomaly detection. However, current tensor decomposition algorithms are not scalable for large tensors with billions of sizes and hundreds millions of nonzeros: the largest tensor in the literature remains thousands of sizes and hundreds thousands of nonzeros. Consider a knowledge base tensor consisting of about 26 million noun-phrases. The intermediate data explosion problem, associated with naive implementations of tensor decomposition algorithms, would require the materialization and the storage of a matrix whose largest dimension would be ≈ 7·10 14; this amounts to ∼ 10 Petabytes, or equivalently a few data centers worth of storage, thereby rendering the tensor analysis of this knowledge base, in the naive way, practically impossible. In this paper, we propose GI-GATENSOR, a scalable distributed algorithm for large scale tensor decomposition. GIGATENSOR exploits the sparseness of the real world tensors, and avoids the intermediate data explosion problem by carefully redesigning the tensor decomposition algorithm. Extensive experiments show that our proposed GIGATENSOR solves 100 × bigger problems than existing methods. Furthermore, we employ GIGATENSOR in order to analyze a very large real world, knowledge base tensor and present our astounding findings which include discovery of potential synonyms among millions of noun-phrases (e.g. the noun ‘pollutant ’ and the noun-phrase ‘greenhouse gases’).
Multivis: Content-based social network exploration through multi-way visual analysis
- In SDM
, 2009
"... With the explosion of social media, scalability becomes a key challenge. There are two main aspects of the problems that arise: 1) data volume: how to manage and analyze huge datasets to efficiently extract patterns, 2) data understanding: how to facilitate understanding of the patterns by users? To ..."
Abstract
-
Cited by 17 (3 self)
- Add to MetaCart
(Show Context)
With the explosion of social media, scalability becomes a key challenge. There are two main aspects of the problems that arise: 1) data volume: how to manage and analyze huge datasets to efficiently extract patterns, 2) data understanding: how to facilitate understanding of the patterns by users? To address both aspects of the scalability challenge, we present a hybrid approach that leverages two complementary disciplines, data mining and information visualization. In particular, we propose 1) an analytic data model for content-based networks using tensors; 2) an efficient high-order clustering framework for analyzing the data; 3) a scalable context-sensitive graph visualization to present the clusters. We evaluate the proposed methods using both synthetic and real datasets. In terms of computational efficiency, the proposed methods are an order of magnitude faster compared to the baseline. In terms of effectiveness, we present several case studies of real corporate social networks. 1
A Classification for Community Discovery Methods in Complex Networks
, 2011
"... Many real-world networks are intimately organized according to a community structure. Much research effort has been devoted to develop methods and algorithms that can efficiently highlight this hidden structure of a network, yielding a vast literature on what is called today community detection. S ..."
Abstract
-
Cited by 16 (6 self)
- Add to MetaCart
Many real-world networks are intimately organized according to a community structure. Much research effort has been devoted to develop methods and algorithms that can efficiently highlight this hidden structure of a network, yielding a vast literature on what is called today community detection. Since network representation can be very complex and can contain different variants in the traditional graph model, each algorithm in the literature focuses on some of these properties and establishes, explicitly or implicitly, its own definition of community. According to this definition, each proposed algorithm then extracts the communities, which typically reflect only part of the features of real communities. The aim of this survey is to provide a ‘user manual’ for the community discovery problem. Given a meta definition of what a community in a social network is, our aim is to organize the main categories of community discovery methods based on the definition of community they adopt. Given a desired definition of community and the features of a problem (size of network, direction of edges, multidimensionality, and so on) this review paper is designed to provide a set of approaches that researchers could focus on. The proposed classification of community discovery methods is also useful for putting into perspective the many open
Multi-Way Compressed Sensing for Sparse Low-Rank Tensors
, 2012
"... For linear models, compressed sensing theory and methods enable recovery of sparse signals of interest from few measurements—in the order of the number of nonzero entries as opposed to the length of the signal of interest. Results of similar flavor have more recently emerged for bilinear models, bu ..."
Abstract
-
Cited by 9 (1 self)
- Add to MetaCart
(Show Context)
For linear models, compressed sensing theory and methods enable recovery of sparse signals of interest from few measurements—in the order of the number of nonzero entries as opposed to the length of the signal of interest. Results of similar flavor have more recently emerged for bilinear models, but no results are available for multilinear models of tensor data. In this contribution, we consider compressed sensing for sparse and low-rank tensors. More specifically, we consider low-rank tensors synthesized as sums of outer products of sparse loading vectors, and a special class of linear dimensionality-reducing transformations that reduce each mode individually. We prove interesting “oracle ” properties showing that it is possible to identify the uncompressed sparse loadings directly from the compressed tensor data. The proofs naturally suggest a two-step recovery process: fitting a low-rank model in compressed domain, followed by per-mode decompression. This two-step process is also appealing from a computational complexity and memory capacity point of view, especially for big tensor datasets.
A Tensor-based Factorization Model of Semantic Compositionality
"... In this paper, we present a novel method for the computation of compositionality within a distributional framework. The key idea is that compositionality is modeled as a multi-way interaction between latent factors, which are automatically constructed from corpus data. We use our method to model the ..."
Abstract
-
Cited by 8 (0 self)
- Add to MetaCart
In this paper, we present a novel method for the computation of compositionality within a distributional framework. The key idea is that compositionality is modeled as a multi-way interaction between latent factors, which are automatically constructed from corpus data. We use our method to model the composition of subject verb object triples. The method consists of two steps. First, we compute a latent factor model for nouns from standard co-occurrence data. Next, the latent factors are used to induce a latent model of three-way subject verb object interactions. Our model has been evaluated on a similarity task for transitive phrases, in which it exceeds the state of the art. 1
Cross-Tagging for Personalized Open Social Networking
- CONFERENCE ON HYPERTEXT AND HYPERMEDIA
, 2009
"... The Social Web is successfully established and poised for continued growth. Web 2.0 applications such as blogs, bookmarking, music, photo and video sharing systems are among the most popular; and all of them incorporate a social aspect, i.e., users can easily share information with other users. But ..."
Abstract
-
Cited by 8 (0 self)
- Add to MetaCart
The Social Web is successfully established and poised for continued growth. Web 2.0 applications such as blogs, bookmarking, music, photo and video sharing systems are among the most popular; and all of them incorporate a social aspect, i.e., users can easily share information with other users. But due to the diversity of these applications – serving different aims – the Social Web is ironically divided. Blog users who write about music for example, could possibly benefit from other users registered in other social systems operating within the same domain, such as a social radio station. Although these sites are two different and disconnected systems, offering distinct services to the users, the fact that domains are compatible could benefit users from both systems with interesting and multi-faceted information. In this paper we propose to automatically establish social links between distinct social systems through cross-tagging, i.e., enriching a social system with the tags of other similar social system(s). Since tags are known for increasing the prediction quality of recommender systems (RS), we propose to quantitatively evaluate the extent to which users can benefit from cross-tagging by measuring the impact of different cross-tagging approaches on tag-aware RS for personalized resource recommendations. We conduct experiments in real world data sets and empirically show the effectiveness of our approaches.