Results 1 - 10
of
151
Web mining for web personalization
- ACM Transactions on Internet Technology
, 2003
"... Web personalization is the process of customizing a Web site to the needs of specific users, taking advantage of the knowledge acquired from the analysis of the user’s navigational behavior (usage data) in correlation with other information collected in the Web context, namely, structure, content an ..."
Abstract
-
Cited by 217 (6 self)
- Add to MetaCart
Web personalization is the process of customizing a Web site to the needs of specific users, taking advantage of the knowledge acquired from the analysis of the user’s navigational behavior (usage data) in correlation with other information collected in the Web context, namely, structure, content and user profile data. Due to the explosive growth of the Web, the domain of Web personalization has gained great momentum both in the research and commercial areas. In this article we present a survey of the use of Web mining for Web personalization. More specifically, we introduce the modules that comprise a Web personalization system, emphasizing the Web usage mining module. A review of the most common methods that are used as well as technical issues that occur is given, along with a brief overview of the most popular tools and applications available from software vendors. Moreover, the most important research initiatives in the Web usage mining and personalization areas are presented.
Model-based clustering and visualization of navigation patterns on a web site
- Data Mining and Knowledge Discovery
, 2003
"... We present a new methodology for exploring and analyzing navigation patterns on a web site. The patterns that can be analyzed consist of sequences of URL categories traversed by users. In our approach, we rst partition site users into clusters such that users with similar navigation paths through th ..."
Abstract
-
Cited by 74 (0 self)
- Add to MetaCart
We present a new methodology for exploring and analyzing navigation patterns on a web site. The patterns that can be analyzed consist of sequences of URL categories traversed by users. In our approach, we rst partition site users into clusters such that users with similar navigation paths through the site are placed into the same cluster. Then, for each cluster, we display these paths for users within that cluster. The clustering approach weemployis model-based (as opposed to distance-based) and partitions users according to the order in which they request web pages. In particular, we cluster users by learning a mixture of rst-order Markov models using the Expectation-Maximization algorithm. The runtime of our algorithm scales linearly with the number of clusters and with the size of the data � and our implementation easily handles hundreds of thousands of user sessions in memory. In the paper, we describe the details of our method and a visualization tool based on it called WebCANVAS. We illustrate the use of our approach on user-tra c data from msnbc.com. Keywords: Model-based clustering, sequence clustering, data visualization, Internet, web 1
Session-Based Overload Control in QoS-Aware Web Servers
, 2002
"... With the explosive use of Internet, contemporary web servers are susceptible to overloads and their services deteriorate drastically and often cause denial of services. In this paper, we proposed two methods to prevent and control overloads in web servers by utilizing session-based relationship amon ..."
Abstract
-
Cited by 61 (1 self)
- Add to MetaCart
(Show Context)
With the explosive use of Internet, contemporary web servers are susceptible to overloads and their services deteriorate drastically and often cause denial of services. In this paper, we proposed two methods to prevent and control overloads in web servers by utilizing session-based relationship among HTTP requests. We first exploited the dependence among session-based requests by analyzing and predicting the reference patterns. Using the dependency relationships, wehave derived traffic conformation functions that can be used for capacity planning and overload prevention in web servers. Second, we have proposed a dynamic weighted fairing sharing (DWFS) scheduling algorithm to control overloads in web servers. DWFS is distinguished from other scheduling algorithms in the sense that it aims to avoid processing of requests that belong to sessions that are likely to be aborted in the near future. The experimental results demonstrate that DWFS can improve server responsiveness by as high as 50% while providing QoS support through service differentiation for a class of application environment.
Beyond DCG � User Behavior as a Predictor of a Successful Search
"... Web search engines are traditionally evaluated in terms of the relevance of web pages to individual queries. However, relevance of web pages does not tell the complete picture, since an individual query may represent only a piece of the user’s information need and users may have different informatio ..."
Abstract
-
Cited by 59 (15 self)
- Add to MetaCart
(Show Context)
Web search engines are traditionally evaluated in terms of the relevance of web pages to individual queries. However, relevance of web pages does not tell the complete picture, since an individual query may represent only a piece of the user’s information need and users may have different information needs underlying the same queries. We address the problem of predicting user search goal success by modeling user behavior. We show empirically that user behavior alone can give an accurate picture of the success of the user’s web search goals, without considering the relevance of the documents displayed. In fact, our experiments show that models using user behavior are more predictive of goal success than those using document relevance. We build novel sequence models incorporating time distributions for this task and our experiments show that the sequence and time distribution models are more accurate than static models based on user behavior, or predictions based on document relevance.
Measuring the accuracy of sessionizers for web usage analysis
- In Proc.of the Workshop on Web Mining, First SIAM Internat.Conf. on Data Mining
, 2001
"... Companies with web presence rely on web usage analysis to obtain insights on customer behavior, associations among products, impact of advertisement banners, web marketing campaigns and product promotions. The validity of these results depends heavily on the accurate reconstruction of the visitors & ..."
Abstract
-
Cited by 53 (3 self)
- Add to MetaCart
Companies with web presence rely on web usage analysis to obtain insights on customer behavior, associations among products, impact of advertisement banners, web marketing campaigns and product promotions. The validity of these results depends heavily on the accurate reconstruction of the visitors ' activities in the web site. To this end, many sites employ cookies that distinguish among dierent users coming from the same proxy server or anonymizer. However, the set of activities thus grouped together refer to the whole lifetime of a cookie at the user's host. The activities performed during each visit to the web site, the \sessions", are not grouped properly, thus prohibiting the monitoring of changes in the user's behaviour and in her interaction with the site during each session. The reconstruction of user sessions, the so-called \sessionizing " is blurred by client caches and multiple instantiations of the user's browser. Sessionizing tools exploit infor-mation on the site's topology and statistics on its usage, in order to assess the correct contents of a user session. These tools are based on heuristic rules and on assumptions about the site's usage, and are therefore prone to error. In this study, we provide a formal framework for the evaluation of the accuracy of sessionizing tools. We introduce a set of measures that compute the extent to which real sessions are successfully reconstructed by dierent sessionizers. The wide range of measures proposed re
ects the fact that some web usage analysis applications require exact reconstruction of a session, while for others ordering and page revisits are not important. On the basis of these measures, we compute and evaluate a number of sessionizing tools using the log data of a real web site. 1
Semantic Web Mining -- State of the art and future directions
, 2006
"... Semantic Web Mining aims at combining the two fast-developing research areas Semantic Web and Web Mining. This survey analyzes the convergence of trends from both areas: More and more researchers are working on improving the results of Web Mining by exploiting semantic structures in the Web, and the ..."
Abstract
-
Cited by 43 (1 self)
- Add to MetaCart
Semantic Web Mining aims at combining the two fast-developing research areas Semantic Web and Web Mining. This survey analyzes the convergence of trends from both areas: More and more researchers are working on improving the results of Web Mining by exploiting semantic structures in the Web, and they make use of Web Mining techniques for building the Semantic Web. Last but not least, these techniques can be used for mining the Semantic Web itself. The Semantic Web is the second-generation WWW, enriched by machine-processable information which supports the user in his tasks. Given the enormous size even of today’s Web, it is impossible to manually enrich all of these resources. Therefore, automated schemes for learning the relevant information are increasingly being used. Web Mining aims at discovering insights about the meaning of Web resources and their usage. Given the primarily syntactical nature of the data being mined, the discovery of meaning is impossible based on these data only. Therefore, formalizations of the semantics of Web sites and navigation behavior are becoming more and more common. Furthermore, mining the Semantic Web itself is another upcoming application. We argue that the two areas Web Mining and Semantic Web need each other to fulfill their goals, but that the full potential of this convergence is not yet realized. This paper gives an overview of where the two areas meet today, and sketches ways of how a closer integration could be profitable. © 2006 Elsevier B.V. All rights reserved.
Zipf's Law for Web Surfers
, 2001
"... One of the main activities of web users, known as "surfing", is to follow links. Lengthy navigation often leads to disorientation when users lose track of the context in which they are navigating and are unsure how to proceed in terms of the goal of their original query. Studying navigatio ..."
Abstract
-
Cited by 35 (4 self)
- Add to MetaCart
One of the main activities of web users, known as "surfing", is to follow links. Lengthy navigation often leads to disorientation when users lose track of the context in which they are navigating and are unsure how to proceed in terms of the goal of their original query. Studying navigation patterns of web users is thus important, since it can lead us to a better understanding of the problems users face when they are surfing. We derive Zipf's rank frequency law (i.e. an inverse power law) from an absorbing Markov chain model of surfers' behaviour assuming that less probable navigation trails are, on average, longer than more probable ones. In our model the probability of a trail is interpreted as the relevance (or "value") of the trail. We apply our model to two scenarios: in the first the probability of a user terminating the navigation session is independent of the number of links he has followed so far, and in the second the probability of a user terminating the navigation session i...
Mining web log sequential patterns with position coded pre-order linked wap-tree
- Data Min. Knowl. Discov
"... Abstract. Sequential mining is the process of applying data mining techniques to a sequential database for the purposes of discovering the correlation relationships that exist among an ordered list of events. An important application of sequential mining techniques is web usage mining, for mining we ..."
Abstract
-
Cited by 34 (4 self)
- Add to MetaCart
Abstract. Sequential mining is the process of applying data mining techniques to a sequential database for the purposes of discovering the correlation relationships that exist among an ordered list of events. An important application of sequential mining techniques is web usage mining, for mining web log accesses, where the sequences of web page accesses made by different web users over a period of time, through a server, are recorded. Web access pattern tree (WAP-tree) mining is a sequential pattern mining technique for web log access sequences, which first stores the original web access sequence database on a prefix tree, similar to the frequent pattern tree (FP-tree) for storing non-sequential data. WAP-tree algorithm then, mines the frequent sequences from the WAP-tree by recursively re-constructing intermediate trees, starting with suffix sequences and ending with prefix sequences. This paper proposes a more efficient approach for using the WAP-tree to mine frequent sequences, which totally eliminates the need to engage in numerous re-construction of intermediate WAP-trees during mining. The proposed algorithm builds the frequent header node links of the original WAP-tree in a pre-order fashion and uses the position code of each node to identify the ancestor/descendant relationships between nodes of the tree. It then, finds each frequent sequential pattern, through progressive prefix sequence search, starting with its first prefix subsequence event. Experiments show huge performance gain over the WAP-tree technique.