Results 21 - 30
of
111
Multivis: Content-based social network exploration through multi-way visual analysis
- In SDM
, 2009
"... With the explosion of social media, scalability becomes a key challenge. There are two main aspects of the problems that arise: 1) data volume: how to manage and analyze huge datasets to efficiently extract patterns, 2) data understanding: how to facilitate understanding of the patterns by users? To ..."
Abstract
-
Cited by 17 (3 self)
- Add to MetaCart
(Show Context)
With the explosion of social media, scalability becomes a key challenge. There are two main aspects of the problems that arise: 1) data volume: how to manage and analyze huge datasets to efficiently extract patterns, 2) data understanding: how to facilitate understanding of the patterns by users? To address both aspects of the scalability challenge, we present a hybrid approach that leverages two complementary disciplines, data mining and information visualization. In particular, we propose 1) an analytic data model for content-based networks using tensors; 2) an efficient high-order clustering framework for analyzing the data; 3) a scalable context-sensitive graph visualization to present the clusters. We evaluate the proposed methods using both synthetic and real datasets. In terms of computational efficiency, the proposed methods are an order of magnitude faster compared to the baseline. In terms of effectiveness, we present several case studies of real corporate social networks. 1
Clr: a collaborative location recommendation framework based on co-clustering
- In Proceedings of the 34th International ACM Conference on Research and Development in Information Retrieval (SIGIR
, 2011
"... GPS data tracked on mobile devices contains rich information about human activities and preferences. In this paper, GPS data is used in location-based services (LBSs) to provide collaborative location recommendations. We observe that most existing LBSs provide location recommendations by clustering ..."
Abstract
-
Cited by 16 (0 self)
- Add to MetaCart
(Show Context)
GPS data tracked on mobile devices contains rich information about human activities and preferences. In this paper, GPS data is used in location-based services (LBSs) to provide collaborative location recommendations. We observe that most existing LBSs provide location recommendations by clustering the User-Location matrix. Since the User-Location matrix created based on GPS data is huge, there are two major problems with these methods. First, the number of similar locations that need to be considered in computing the recommendations can be numerous. As a result, the identification of truly relevant locations from numerous candidates is challenging. Second, the clustering process on large matrix is time consuming. Thus, when new GPS data arrives, complete re-clustering of the whole matrix is infeasible. To tackle these two problems, we propose the Collaborative Location Recommendation (CLR) framework for location recommendation. By considering activities (i.e., temporal preferences) and different user classes (i.e., Pattern Users, Normal Users, and Travelers) in the recommendation process, CLR is capable of generating more precise and refined recommendations to the users compared to the existing methods. Moreover, CLR employs a dynamic clustering algorithm CADC to cluster the trajectory data into groups of similar users, similar activities and similar locations efficiently by supporting incremental update of the groups when new GPS trajectory data arrives. We evaluate CLR with a real-world GPS dataset, and confirm that the CLR framework provides more accurate location recommendations compared to the existing methods.
Event detection from evolution of click-through data
- Department of Computer Science and Technology of Tsinghua University. His
"... Previous efforts on event detection from the web have fo-cused primarily on web content and structure data ignoring the rich collection of web log data. In this paper, we propose the first approach to detect events from the click-through data, which is the log data of web search engines. The in-tuit ..."
Abstract
-
Cited by 15 (0 self)
- Add to MetaCart
(Show Context)
Previous efforts on event detection from the web have fo-cused primarily on web content and structure data ignoring the rich collection of web log data. In this paper, we propose the first approach to detect events from the click-through data, which is the log data of web search engines. The in-tuition behind event detection from click-through data is that such data is often event-driven and each event can be represented as a set of query-page pairs that are not only semantically similar but also have similar evolution pattern over time. Given the click-through data, in our proposed approach, we first segment it into a sequence of bipartite graphs based on the user-defined time granularity. Next, the sequence of bipartite graphs is represented as a vector-based graph, which records the semantic and evolutionary relationships between queries and pages. After that, the vector-based graph is transformed into its dual graph, where each node is a query-page pair that will be used to represent real world events. Then, the problem of event detection is equivalent to the problem of clustering the dual graph of the vector-based graph. The clustering process is based on a two-phase graph cut algorithm. In the first phase, query-page pairs are clustered based on the semantic-based simi-larity such that each cluster in the result corresponds to a specific topic. In the second phase, query-page pairs related to the same topic are further clustered based on the evo-lution pattern-based similarity such that each cluster is ex-pected to represent a specific event under the specific topic. Experiments with real click-through data collected from a commercial web search engine show that the proposed ap-proach produces high quality results.
Multilinear Algebra For Analyzing Data With Multiple Linkages
, 2006
"... Link analysis typically focuses on a single type of connection, e.g., two journal papers are linked because they are written by the same author. However, often we want to analyze data that has multiple linkages between objects, e.g., two papers may have the same keywords and one may cite the othe ..."
Abstract
-
Cited by 12 (2 self)
- Add to MetaCart
Link analysis typically focuses on a single type of connection, e.g., two journal papers are linked because they are written by the same author. However, often we want to analyze data that has multiple linkages between objects, e.g., two papers may have the same keywords and one may cite the other. The goal of this paper is to show that multilinear algebra provides a tool for multilink analysis. We analyze five years of publication data from journals published by the Society for Industrial and Applied Mathematics. We explore how papers can be grouped in the context of multiple link types using a tensor to represent all the links between them. A PARAFAC decomposition on the resulting tensor yields information similar to the SVD decomposition of a standard adjacency matrix. We show how the PARAFAC decomposition can be used to understand the structure of the document space and define paper-paper similarities based on multiple linkages. Examples are presented where the decomposed tensor data is used to find papers similar to a body of work (e.g., related by topic or similar to a particular author's papers), find related authors using linkages other than explicit co-authorship or citations, distinguish between papers written by di#erent authors with the same name, and predict the journal in which a paper was published.
Personalized Online Document, Image and Video Recommendation via Commodity Eye-tracking
"... We propose a new recommendation algorithm for online documents, images and videos, which is personalized. Our idea is to rely on the attention time of individual users captured through commodity eye-tracking as the essential clue. The prediction of user interest over a certain online item (a documen ..."
Abstract
-
Cited by 11 (4 self)
- Add to MetaCart
(Show Context)
We propose a new recommendation algorithm for online documents, images and videos, which is personalized. Our idea is to rely on the attention time of individual users captured through commodity eye-tracking as the essential clue. The prediction of user interest over a certain online item (a document, image or video) is based on the user’s attention time acquired using vision-based commodity eye-tracking during his previous reading, browsing or video watching sessions over the same type of online materials. After acquiring a user’s attention times over a collection of online materials, our algorithm can predict the user’s probable attention time over a new online item through data mining. Based on our proposed algorithm, we have developed a new online content recommender system for documents, images and videos. The recommendation results produced by our algorithm are evaluated by comparing with those manually labeled by users as well as by commercial search engines
Search personalization through query and page topical analysis
"... information on a number of topics. Since the users may have diverse backgrounds and may have different expectations for a given query, some search engines try to personalize their results to better match the overall interests of an individual user. This task involves two great challenges. First the ..."
Abstract
-
Cited by 11 (2 self)
- Add to MetaCart
information on a number of topics. Since the users may have diverse backgrounds and may have different expectations for a given query, some search engines try to personalize their results to better match the overall interests of an individual user. This task involves two great challenges. First the search engines need to be able to effectively identify the user interests and build a profile for every individual user. Second, once such a profile is available, the search engines need to rank the results in a way that matches the interests of a given user. In this article, we present our work towards a personalized Web search engine and we discuss how we addressed each of these challenges. Since users are typically not willing to provide information on their personal preferences, for the first challenge, we attempt to determine such preferences by examining the click history of each user. In particular, we leverage a topical ontology for estimating a user’s topic preferences based on her past searches, i.e. previously issued queries and pages visited for those queries. We then explore the semantic similarity between the user’s current query and the query-matching pages, in order to identify the user’s current topic preference. For the second challenge, we have developed a ranking function that uses the learned past and current topic preferences in order to rank the search results to better match the preferences of a given user. Our experimental evaluation on the Google querystream of human subjects over a period of one month shows that user preferences can be learned accurately through the use of our topical ontology and that our ranking function which takes into account the learned user preferences yields significant improvements in the quality of the search results.
To each his own: personalized content selection based on text comprehensibility
- In Proceedings of the 5th ACM International Conference on Web Search and Data Mining
, 2012
"... Imagine a physician and a patient doing a search on antibiotic resistance. Or a chess amateur and a grandmaster conducting a search on Alekhine’s Defence. Although the topic is the same, arguably the two users in each case will satisfy their information needs with very different texts. Yet today sea ..."
Abstract
-
Cited by 10 (1 self)
- Add to MetaCart
(Show Context)
Imagine a physician and a patient doing a search on antibiotic resistance. Or a chess amateur and a grandmaster conducting a search on Alekhine’s Defence. Although the topic is the same, arguably the two users in each case will satisfy their information needs with very different texts. Yet today search engines mostly adopt the onesize-fits-all solution, where personalization is restricted to topical preference. We found that users do not uniformly prefer simple texts, and that the text comprehensibility level should match the user’s level of preparedness. Consequently, we propose to model the comprehensibility of texts as well as the users ’ reading proficiency in order to better explain how different users choose content for further exploration. We also model topic-specific reading proficiency, which allows us to better explain why a physician might choose to read sophisticated medical articles yet simple descriptions of SLR cameras. We explore different ways to build user profiles, and use collaborative filtering techniques to overcome data sparsity. We conducted experiments on large-scale datasets from a major Web search engine and a community question answering forum. Our findings confirm that explicitly modeling text comprehensibility can significantly improve content ranking (search results or answers, respectively).
Exploring online social activities for adaptive search personalization
- In Proceedings of the 19th ACM international conference on Information and knowledge management
, 2010
"... The web has largely become a very social environment and will continue to become even more so. People are not only enjoying their social visibility on the Web but also increasingly participating in various social activities delivered through the Web. In this paper, we propose to explore a user’s pub ..."
Abstract
-
Cited by 8 (0 self)
- Add to MetaCart
(Show Context)
The web has largely become a very social environment and will continue to become even more so. People are not only enjoying their social visibility on the Web but also increasingly participating in various social activities delivered through the Web. In this paper, we propose to explore a user’s public social activities, such as blog-ging and social bookmarking, to personalize Internet services. We believe that public social data provides a more acceptable way to derive user interests than more private data such as search histories and desktop data. We propose a framework that learns about users’ preferences from their activities on a variety of online social sys-tems. As an example, we illustrate how to apply the user interests derived by our system to personalize search results. Furthermore, our system is adaptive; it observes users ’ choices on search results and automatically adjusts the weights of different social systems during the information integration process, so as to refine its in-terest profile for each user. We have implemented our approach and performed experiments on real-world data collected from three large-scale online social systems. Over two hundred users from worldwide who are active on the three social systems have been tested. Our experimental results demonstrate the effectiveness of our personalized search approach. Our results also show that inte-grating information from multiple social systems usually leads to better personalized results than relying on the information from a single social system, and our adaptive approach further improves the performance of the personalization solution.
Probabilistic factor models for web site recommendation
- In SIGIR
, 2011
"... Due to the prevalence of personalization and information filteringapplications, modelingusers ’ interests on theWeb has become increasingly important duringthe past few years. In this paper, aiming at providing accurate personalized Web site recommendations for Web users, we propose a novel probabil ..."
Abstract
-
Cited by 7 (0 self)
- Add to MetaCart
(Show Context)
Due to the prevalence of personalization and information filteringapplications, modelingusers ’ interests on theWeb has become increasingly important duringthe past few years. In this paper, aiming at providing accurate personalized Web site recommendations for Web users, we propose a novel probabilistic factor model based on dimensionality reduction techniques. We also extend the proposed method to collective probabilistic factor modeling, which further improves model performance by incorporating heterogeneous data sources. The proposed method is general, and can be applied to not only Web site recommendations, but also a wide range of Web applications, including behavioral targeting, sponsored search, etc. The experimental analysis on Web site recommendation shows that our method outperforms other traditional recommendation approaches. Moreover, the complexity analysis indicates that our approach can be applied to very large datasets since it scales linearly with the number of observations. Categories and Subject Descriptors
MultiRank: Co-Ranking for Objects and Relations in Multi-Relational Data ABSTRACT
"... The main aim of this paper is to design a co-ranking scheme for objects and relations in multi-relational data. It has many important applications in data mining and information retrieval. However, in the literature, there is a lack of a general framework to deal with multi-relational data for co-ra ..."
Abstract
-
Cited by 7 (0 self)
- Add to MetaCart
(Show Context)
The main aim of this paper is to design a co-ranking scheme for objects and relations in multi-relational data. It has many important applications in data mining and information retrieval. However, in the literature, there is a lack of a general framework to deal with multi-relational data for co-ranking. The main contribution of this paper is to (i) propose a framework (MultiRank) to determine the importance of both objects and relations simultaneously based on a probability distribution computed from multi-relational data; (ii) show the existence and uniqueness of such probability distribution so that it can be used for co-ranking for objects and relations very effectively; and (iii) develop an efficient iterative algorithm to solve a set of tensor (multivariate polynomial) equations to obtain such probability distribution. Extensive experiments on real-world data suggest that the proposed framework is able to provide a co-ranking scheme for objects and relations successfully. Experimental results have also shown that our algorithm is computationally efficient, and effective for identification of interesting and explainable co-ranking results. Categories and Subject Descriptors [Algorithms/Models;Data]: Ranking;Sparse data