Results 1  10
of
69
Tensor Decompositions and Applications
 SIAM REVIEW
, 2009
"... This survey provides an overview of higherorder tensor decompositions, their applications, and available software. A tensor is a multidimensional or N way array. Decompositions of higherorder tensors (i.e., N way arrays with N â¥ 3) have applications in psychometrics, chemometrics, signal proce ..."
Abstract

Cited by 723 (18 self)
 Add to MetaCart
(Show Context)
This survey provides an overview of higherorder tensor decompositions, their applications, and available software. A tensor is a multidimensional or N way array. Decompositions of higherorder tensors (i.e., N way arrays with N â¥ 3) have applications in psychometrics, chemometrics, signal processing, numerical linear algebra, computer vision, numerical analysis, data mining, neuroscience, graph analysis, etc. Two particular tensor decompositions can be considered to be higherorder extensions of the matrix singular value decompo
sition: CANDECOMP/PARAFAC (CP) decomposes a tensor as a sum of rankone tensors, and the Tucker decomposition is a higherorder form of principal components analysis. There are many other tensor decompositions, including INDSCAL, PARAFAC2, CANDELINC, DEDICOM, and PARATUCK2 as well as nonnegative variants of all of the above. The Nway Toolbox and Tensor Toolbox, both for MATLAB, and the Multilinear Engine are examples of software packages for working with tensors.
Beyond streams and graphs: Dynamic tensor analysis
 In KDD
, 2006
"... How do we find patterns in authorkeyword associations, evolving over time? Or in DataCubes, with productbranchcustomer sales information? Matrix decompositions, like principal component analysis (PCA) and variants, are invaluable tools for mining, dimensionality reduction, feature selection, rule ..."
Abstract

Cited by 113 (16 self)
 Add to MetaCart
How do we find patterns in authorkeyword associations, evolving over time? Or in DataCubes, with productbranchcustomer sales information? Matrix decompositions, like principal component analysis (PCA) and variants, are invaluable tools for mining, dimensionality reduction, feature selection, rule identification in numerous settings like streaming data, text, graphs, social networks and many more. However, they have only two orders, like author and keyword, in the above example. We propose to envision such higher order data as tensors, and tap the vast literature on the topic. However, these methods do not necessarily scale up, let alone operate on semiinfinite streams. Thus, we introduce the dynamic tensor analysis (DTA) method, and its variants. DTA provides a compact summary for highorder and highdimensional data, and it also reveals the hidden correlations. Algorithmically, we designed DTA very carefully so that it is (a) scalable, (b) space efficient (it does not need to store the past) and (c) fully automatic with no need for user defined parameters. Moreover, we propose STA, a streaming tensor analysis method, which provides a fast, streaming approximation to DTA. We implemented all our methods, and applied them in two real settings, namely, anomaly detection and multiway latent semantic indexing. We used two real, large datasets, one on network flow data (100GB over 1 month) and one from DBLP (200MB over 25 years). Our experiments show that our methods are fast, accurate and that they find interesting patterns and outliers on the real datasets. 1.
Unsupervised multiway data analysis: A literature survey
 IEEE Transactions on Knowledge and Data Engineering
, 2008
"... Multiway data analysis captures multilinear structures in higherorder datasets, where data have more than two modes. Standard twoway methods commonly applied on matrices often fail to find the underlying structures in multiway arrays. With increasing number of application areas, multiway data anal ..."
Abstract

Cited by 82 (10 self)
 Add to MetaCart
(Show Context)
Multiway data analysis captures multilinear structures in higherorder datasets, where data have more than two modes. Standard twoway methods commonly applied on matrices often fail to find the underlying structures in multiway arrays. With increasing number of application areas, multiway data analysis has become popular as an exploratory analysis tool. We provide a review of significant contributions in literature on multiway models, algorithms as well as their applications in diverse disciplines including chemometrics, neuroscience, computer vision, and social network analysis. 1.
Scalable tensor decompositions for multiaspect data mining
 In ICDM 2008: Proceedings of the 8th IEEE International Conference on Data Mining
, 2008
"... Modern applications such as Internet traffic, telecommunication records, and largescale social networks generate massive amounts of data with multiple aspects and high dimensionalities. Tensors (i.e., multiway arrays) provide a natural representation for such data. Consequently, tensor decompositi ..."
Abstract

Cited by 64 (2 self)
 Add to MetaCart
(Show Context)
Modern applications such as Internet traffic, telecommunication records, and largescale social networks generate massive amounts of data with multiple aspects and high dimensionalities. Tensors (i.e., multiway arrays) provide a natural representation for such data. Consequently, tensor decompositions such as Tucker become important tools for summarization and analysis. One major challenge is how to deal with highdimensional, sparse data. In other words, how do we compute decompositions of tensors where most of the entries of the tensor are zero. Specialized techniques are needed for computing the Tucker decompositions for sparse tensors because standard algorithms do not account for the sparsity of the data. As a result, a surprising phenomenon is observed by practitioners: Despite the fact that there is enough memory to store both the input tensors and the factorized output tensors, memory overflows occur during the tensor factorization process. To address this intermediate blowup problem, we propose MemoryEfficient Tucker (MET). Based on the available memory, MET adaptively selects the right execution strategy during the decomposition. We provide quantitative and qualitative evaluation of MET on real tensors. It achieves over 1000X space reduction without sacrificing speed; it also allows us to work with much larger tensors that were too big to handle before. Finally, we demonstrate a data mining casestudy using MET. 1
Triplerank: Ranking semantic web data by tensor decomposition
 In ISWC
, 2009
"... Abstract. The Semantic Web fosters novel applications targeting a more efficient and satisfying exploitation of the data available on the web, e.g. faceted browsing of linked open data. Large amounts and high diversity of knowledge in the Semantic Web pose the challenging question of appropriate rel ..."
Abstract

Cited by 53 (0 self)
 Add to MetaCart
(Show Context)
Abstract. The Semantic Web fosters novel applications targeting a more efficient and satisfying exploitation of the data available on the web, e.g. faceted browsing of linked open data. Large amounts and high diversity of knowledge in the Semantic Web pose the challenging question of appropriate relevance ranking for producing finegrained and rich descriptions of the available data, e.g. to guide the user along most promising knowledge aspects. Existing methods for graphbased authority ranking lack support for finegrained latent coherence between resources and predicates (i.e. support for link semantics in the linked data model). In this paper, we present TripleRank, a novel approach for faceted authority ranking in the context of RDF knowledge bases. TripleRank captures the additional latent semantics of Semantic Web data by means of statistical methods in order to produce richer descriptions of the available data. We model the Semantic Web by a 3dimensional tensor that enables the seamless representation of arbitrary semantic links. For the analysis of that model, we apply the PARAFAC decomposition, which can be seen as a multimodal counterpart to Web authority ranking with HITS. The result are groupings of resources and predicates that characterize their authority and navigational (hub) properties with respect to identified topics. We have applied TripleRank to multiple data sets from the linked open data community and gathered encouraging feedback in a user evaluation where TripleRank results have been exploited in a faceted browsing scenario. 1
Multilinear operators for higherorder decompositions
, 2006
"... We propose two new multilinear operators for expressing the matrix compositions that are needed in the Tucker and PARAFAC (CANDECOMP) decompositions. The ﬁrst operator,
which we call the Tucker operator, is shorthand for performing an nmode matrix multiplication for every mode of a given tensor and ..."
Abstract

Cited by 52 (9 self)
 Add to MetaCart
We propose two new multilinear operators for expressing the matrix compositions that are needed in the Tucker and PARAFAC (CANDECOMP) decompositions. The ﬁrst operator,
which we call the Tucker operator, is shorthand for performing an nmode matrix multiplication for every mode of a given tensor and can be employed to consisely express the Tucker decomposition. The second operator, which we call the Kruskal operator, is shorthand for the sum of the outerproducts of the columns of N matrices and allows a divorce from a matricized representation and a very consise expression of the PARAFAC decomposition. We explore the
properties of the Tucker and Kruskal operators independently of the related decompositions.
Additionally, we provide a review of the matrix and tensor operations that are frequently used in the context of tensor decompositions.
Multilayer networks
 TOOL FOR MULTILAYER ANALYSIS AND VISUALIZATION OF NETWORKS 17 OF 18
, 2014
"... In most natural and engineered systems, a set of entities interact with each other in complicated patterns that can encompass multiple types of relationships, change in time, and include other types of complications. Such systems include multiple subsystems and layers of connectivity, and it is impo ..."
Abstract

Cited by 34 (7 self)
 Add to MetaCart
(Show Context)
In most natural and engineered systems, a set of entities interact with each other in complicated patterns that can encompass multiple types of relationships, change in time, and include other types of complications. Such systems include multiple subsystems and layers of connectivity, and it is important to take such “multilayer” features into account to try to improve our understanding of complex systems. Consequently, it is necessary to generalize “traditional ” network theory by developing (and validating) a framework and associated tools to study multilayer systems in a comprehensive fashion. The origins of such efforts date back several decades and arose in multiple disciplines, and now the study of multilayer networks has become one of the most important directions in network science. In this paper, we discuss the history of multilayer networks (and related concepts) and review the exploding body of work on such networks. To unify the disparate terminology in the large body of recent work, we discuss a general framework for multilayer networks, construct a dictionary
Using naming authority to rank data and ontologies for web search
 In 8th International Semantic Web Conference
, 2009
"... Abstract. The focus of web search is moving away from returning relevant documents towards returning structured data as results to user queries. A vital part in the architecture of search engines are linkbased ranking algorithms, which however are targeted towards hypertext documents. Existing rank ..."
Abstract

Cited by 33 (6 self)
 Add to MetaCart
(Show Context)
Abstract. The focus of web search is moving away from returning relevant documents towards returning structured data as results to user queries. A vital part in the architecture of search engines are linkbased ranking algorithms, which however are targeted towards hypertext documents. Existing ranking algorithms for structured data, on the other hand, require manual input of a domain expert and are thus not applicable in cases where data integrated from a large number of sources exhibits enormous variance in vocabularies used. In such environments, the authority of data sources is an important signal that the ranking algorithm has to take into account. This paper presents algorithms for prioritising data returned by queries over web datasets expressed in RDF. We introduce the notion of naming authority which provides a correspondence between identifiers and the sources which can speak authoritatively for these identifiers. Our algorithm uses the original PageRank method to assign authority values to data sources based on a naming authority graph, and then propagates the authority values to identifiers referenced in the sources. We conduct performance and quality evaluations of the method on a large web dataset. Our method is schemaindependent, requires no manual input, and has applications in search, query processing, reasoning, and user interfaces over integrated datasets. 1
EigenTrend: Trend Analysis in the Blogosphere Based on Singular Value Decompositions
, 2006
"... The blogospherethe totality of blogrelated Web sites has become a great source of trend analysis in areas such as product survey, customer relationship, and marketing. Existing approaches are based on simple counts, such as the number of entries or the number of links. In this paper, we intro ..."
Abstract

Cited by 30 (3 self)
 Add to MetaCart
The blogospherethe totality of blogrelated Web sites has become a great source of trend analysis in areas such as product survey, customer relationship, and marketing. Existing approaches are based on simple counts, such as the number of entries or the number of links. In this paper, we introduce a novel concept, coined eigentrend, to represent the temporal trend in a group of blogs with common interests and propose two new techniques for extracting eigentrends in blogs. First, we propose a trend analysis technique based on the singular value decomposition. Extracted eigentrends provide new insights into multiple trends on the same keyword. Second, we propose another trend analysis technique based on a higherorder singular value decomposition. This analyzes the blogosphere as a dynamic graph structure and extracts eigentrends that reflect the structural changes of the blogosphere over time. Experimental studies based on synthetic data sets and a real blog data set show that our new techniques can reveal a lot of interesting trend information and insights in the blogosphere that are not obtainable from traditional countbased methods.
Scalable Tensor Factorizations with Missing Data
 SIAM INTERNATIONAL CONFERENCE ON DATA MINING
, 2010
"... The problem of missing data is ubiquitous in domains such as biomedical signal processing, network traffic analysis, bibliometrics, social network analysis, chemometrics, computer vision, and communication networksall domains in which data collection is subject to occasional errors. Moreover, the ..."
Abstract

Cited by 25 (1 self)
 Add to MetaCart
(Show Context)
The problem of missing data is ubiquitous in domains such as biomedical signal processing, network traffic analysis, bibliometrics, social network analysis, chemometrics, computer vision, and communication networksall domains in which data collection is subject to occasional errors. Moreover, these data sets can be quite large and have more than two axes of variation, e.g., sender, receiver, time. Many applications in those domains aim to capture the underlying latent structure of the data; in other words, they need to factorize data sets with missing entries. If we cannot address the problem of missing data, many important data sets will be discarded or improperly analyzed. Therefore, we need a robust and scalable approach for factorizing multiway arrays (i.e., tensors) in the presence of missing data. We focus on one of the most wellknown tensor factorizations, CANDECOMP/PARAFAC (CP), and formulate the CP model as a weighted least squares problem that models only the known entries. We develop an algorithm called CPWOPT (CP Weighted OPTimization) using a firstorder optimization approach to solve the weighted least squares problem. Based on extensive numerical experiments, our algorithm is shown to successfully factor tensors with noise and up to 70% missing data. Moreover, our approach is significantly faster than the leading alternative and scales to larger problems. To show the realworld usefulness of CPWOPT, we illustrate its applicability on a novel EEG (electroencephalogram) application where missing data is frequently encountered due to disconnections of electrodes.