Results 1 - 10
of
87
Application of Sampling Methodologies to Network Traffic Characterization
, 1993
"... The relative performance of different data collection methods in the assessment of various traffic parameters is significant when the amount of data generated by a complete trace of a traffic interval is computationally overwhelming, and even capturing summary statistics for all traffic is impractic ..."
Abstract
-
Cited by 88 (5 self)
- Add to MetaCart
The relative performance of different data collection methods in the assessment of various traffic parameters is significant when the amount of data generated by a complete trace of a traffic interval is computationally overwhelming, and even capturing summary statistics for all traffic is impractical. This paper presents a study of the performance of various methods of sampling in answering questions related to wide area network traffic characterization. Using a packet trace from a network environment that aggregates traffic from a large number of sources, we simulate various sampling approaches, including time-driven and event-driven methods, with both random and deterministic selection patterns, at a variety of granularities. Using several metrics which indicate the similarity between two distributions, we then compare the sampled traces to the parent population. Our results revealed that the timetriggered techniques did not perform as well as the packettriggered ones. Furthermore, ...
The Inferential Theory Of Learning: Developing Foundations for . . .
, 1993
"... Thedevelopmentofmultistrategylearningsystemsrequiresaclearunderstandingoftherolesandthe applicabilityconditionsofdifferentlearningstrategies.Tothisend,thischapterintroducesthe InferentialTheoryofLearning thatprovidesaconceptualframeworkforexplaininglogicalcapabilities oflearningstrategies,i.e.,thei ..."
Abstract
-
Cited by 61 (15 self)
- Add to MetaCart
Thedevelopmentofmultistrategylearningsystemsrequiresaclearunderstandingoftherolesandthe applicabilityconditionsofdifferentlearningstrategies.Tothisend,thischapterintroducesthe InferentialTheoryofLearning thatprovidesaconceptualframeworkforexplaininglogicalcapabilities oflearningstrategies,i.e.,their competence.Viewinglearningasaprocessofmodifyingthelearner's knowledgebyexploringthelearner'sexperience,thetheorypostulatesthatanysuchprocesscanbe describedasasearchina knowledgespace, which involvesthelearner'sexperience,piorknowledgeand the learninggoal .Thesearchoperatorsareinstantiationsof knowledgetransmutations, whichare genericpatternsofknowledgechange.Transmutationsmayemployanybasictypeofinference --- deduction,inductionoranalogy.Severalfundamentalknowledg etransmutationsaredescribedinanovel andgeneralway,suchasgeneralization,abstraction,explanationandsimilization,andtheircounterparts, specialization,concretion,predictionanddissimilization,respectively.Generalizationenlargesthe referenceset ofadescription(thesetofentitiesthatarebeingdescribed).Abstractionreducesthe amountofthedetailaboutthereferenceset.Explanationgeneratespremisesthatexplain(orimply)the givenpropertiesofthereferenceset.Similization transfersknowledgefromonereferencesettoasimilar referenceset.Usingconceptsofthetheory,a multistrategytask -adaptivelearning(MTL)methodology isoutlined,andillustratedbyanexample.MTLdynamicallyadaptsstrategiestothe learningtask , definedbytheinputinformation,learner'sbackgroundknowledge,andthelearninggoal. Thegoalof MTLresearchisto synergisticallyintegrateawiderangeofinferentiallearningstrategies,suchas empiricalgeneralization,constructiveinduction, deductivegeneralization,explanation,prediction, abstraction,andsimilization. Keywords: learningtheory,inferencetheory,multi...
Evaluating Strategies for Similarity Search on the Web
, 2002
"... Finding pages on the Web that are similar to a query page (Related Pages) is an important component of modern search engines. A variety of strategies have been proposed for answering Related Pages queries, but comparative evaluation by user studies is expensive, especially when large strategy spaces ..."
Abstract
-
Cited by 60 (3 self)
- Add to MetaCart
Finding pages on the Web that are similar to a query page (Related Pages) is an important component of modern search engines. A variety of strategies have been proposed for answering Related Pages queries, but comparative evaluation by user studies is expensive, especially when large strategy spaces must be searched (e.g., when tuning parameters). We present a technique for automatically evaluating strategies using Web hierarchies, such as Open Directory, in place of user feedback. We apply this evaluation methodology to a mix of document representation strategies, including the use of text, anchor-text, and links. We discuss the relative advantages and disadvantages of the various approaches examined. Finally, we describe how to eciently construct a similarity index out of our chosen strategies, and provide sample results from our index.
An Information-Theoretic External Cluster-Validity Measure
- Research Report RJ 10219, IBM
, 2001
"... In this paper we propose a measure of similarity/association between two partitions of a set of objects. Our motivation is the desire to use the measure to characterize the quality or accuracy of clustering algorithms by somehow comparing the clusters they produce with "ground truth" consisting of c ..."
Abstract
-
Cited by 48 (2 self)
- Add to MetaCart
In this paper we propose a measure of similarity/association between two partitions of a set of objects. Our motivation is the desire to use the measure to characterize the quality or accuracy of clustering algorithms by somehow comparing the clusters they produce with "ground truth" consisting of classes assigned to the patterns by manual means or some other means in whose veracity there is confidence. Such measures are referred to as "external". Our measure also allows clusterings with different numbers of clusters to be compared in a quantitative and principled way. Our evaluation scheme quantitatively measures how useful the cluster labels of the patterns are as predictors of their class labels. When all clusterings to be compared have the same number of clusters, the measure is equivalent to the mutual information between the cluster labels and the class labels. In cases where the numbers of clusters are different, however, it computes the reduction in the number of bits that w...
Internet Traffic Characterization
, 1994
"... : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : xii 1 Introduction : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 1 1. The problem : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : ..."
Abstract
-
Cited by 45 (0 self)
- Add to MetaCart
: : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : xii 1 Introduction : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 1 1. The problem : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 1 2. Overview of thesis : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 2 3. Contribution of our work : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 3 2 Taxonomy of traffic characteristics : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 5 1. Aggregation granularity : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 5 2. Host versus network centric perspective : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 7 3. Host centric perspective : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 7 1. Delay and jitter : : : : : ...
Similarity of Attributes by External Probes
- In Knowledge Discovery and Data Mining
, 1997
"... In data mining, similarity or distance between attributes is one of the central notions. Such a notion can be used to build attribute hierarchies etc. Similarity metrics can be user-defined, but an important problem is defining similarity on the basis of data. Several methods based on statistical te ..."
Abstract
-
Cited by 32 (7 self)
- Add to MetaCart
In data mining, similarity or distance between attributes is one of the central notions. Such a notion can be used to build attribute hierarchies etc. Similarity metrics can be user-defined, but an important problem is defining similarity on the basis of data. Several methods based on statistical techniques exist. For defining the similarity between two attributes A and B they typically consider only the values of A and B, not the other attributes. We describe how a similarity notion between attributes can be defined by considering the values of other attributes. The basic idea is that in a 0/1 relation r, two attributes A and B are similar if the subrelations oe A=1 (r) and oe B=1 (r) are similar. Similarity between the two relations is defined by considering the marginal frequencies of a selected subset of other attributes. We show that the framework produces natural notions of similarity. Empirical results on the Reuters-21578 document dataset show, for example, how natural classif...
Clustering ensembles: Models of consensus and weak partitions
- IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE
, 2005
"... Clustering ensembles have emerged as a powerful method for improving both the robustness as well as the stability of unsupervised classification solutions. However, finding a consensus clustering from multiple partitions is a difficult problem that can be approached from graph-based, combinatorial ..."
Abstract
-
Cited by 24 (1 self)
- Add to MetaCart
Clustering ensembles have emerged as a powerful method for improving both the robustness as well as the stability of unsupervised classification solutions. However, finding a consensus clustering from multiple partitions is a difficult problem that can be approached from graph-based, combinatorial or statistical perspectives. This study extends previous research on clustering ensembles in several respects. First, we introduce a unified representation for multiple clusterings and formulate the corresponding categorical clustering problem. Second, we propose a probabilistic model of consensus using a finite mixture of multinomial distributions in a space of clusterings. A combined partition is found as a solution to the corresponding maximum likelihood problem using the EM algorithm. Third, we define a new consensus function that is related to the classical intra-class variance criterion using the generalized mutual information definition. Finally, we demonstrate the efficacy of combining partitions generated by weak clustering algorithms that use data projections and random data splits. A simple explanatory model is offered for the behavior of combinations of such weak clustering components. Combination accuracy is analyzed as a function of several parameters that control the power and resolution of component partitions as well as the number of partitions. We also analyze clustering ensembles with incomplete information and the effect of missing cluster labels on the quality of overall consensus. Experimental results demonstrate the effectiveness of the proposed methods on several real-world datasets.
Confirmation-guided discovery of first-order rules with Tertius
- Machine Learning
, 2000
"... . This paper deals with learning first-order logic rules from data lacking an explicit classification predicate. Consequently, the learned rules are not restricted to predicate definitions as in supervised inductive logic programming. First-order logic offers the ability to deal with structured, mul ..."
Abstract
-
Cited by 23 (9 self)
- Add to MetaCart
. This paper deals with learning first-order logic rules from data lacking an explicit classification predicate. Consequently, the learned rules are not restricted to predicate definitions as in supervised inductive logic programming. First-order logic offers the ability to deal with structured, multi-relational knowledge. Possible applications include first-order knowledge discovery, induction of integrity constraints in databases, multiple predicate learning, and learning mixed theories of predicate definitions and integrity constraints. One of the contributions of our work is a heuristic measure of confirmation, trading off novelty and satisfaction of the rule. The approach has been implemented in the Tertius system. The system performs an optimal bestfirst search, finding the k most confirmed hypotheses, and includes a non-redundant refinement operator to avoid duplicates in the search. Tertius can be adapted to many different domains by tuning its parameters, and it can deal eithe...
Automatic complex schema matching across web query interfaces: A correlation mining approach
- ACM Transactions on Database Systems
, 2003
"... To enable information integration, schema matching is a critical step for discovering semantic correspondences of attributes across heterogeneous sources. While complex matchings are common, because of their far more complex search space, most existing techniques focus on simple 1:1 matchings. To ta ..."
Abstract
-
Cited by 18 (3 self)
- Add to MetaCart
To enable information integration, schema matching is a critical step for discovering semantic correspondences of attributes across heterogeneous sources. While complex matchings are common, because of their far more complex search space, most existing techniques focus on simple 1:1 matchings. To tackle this challenge, this article takes a conceptually novel approach by viewing schema matching as correlation mining, for our task of matching Web query interfaces to integrate the myriad databases on the Internet. On this “deep Web, ” query interfaces generally form complex matchings between attribute groups (e.g., {author} corresponds to {first name, last name} in the Books domain). We observe that the co-occurrences patterns across query interfaces often reveal such complex semantic relationships: grouping attributes (e.g., {first name, last name}) tend to be co-present in query interfaces and thus positively correlated. In contrast, synonym attributes are negatively correlated because they rarely co-occur. This insight enables us to discover complex matchings by a correlation mining approach. In particular, we develop the DCM framework, which consists of data preprocessing, dual mining of positive and negative correlations, and finally matching construction. We evaluate the DCM framework on manually extracted interfaces and the results show good accuracy for discovering complex matchings. Further, to automate the
Identifying Employee Competencies in Dynamic Work Domains: Methodological Considerations and a Case Study
- IN: JOURNAL OF UNIVERSAL COMPUTER SCIENCE
, 2003
"... We present a formalisation for employee competencies which is based on a psychological framework separating the overt behavioural level from the underlying competence level. On the competence level, employees draw on action potentials (knowledge, skills and abilities) which in a given situation pro ..."
Abstract
-
Cited by 18 (7 self)
- Add to MetaCart
We present a formalisation for employee competencies which is based on a psychological framework separating the overt behavioural level from the underlying competence level. On the competence level, employees draw on action potentials (knowledge, skills and abilities) which in a given situation produce performance outcomes on the behavioural level. Our conception is based on the competence performance approach by [Korossy 1997] and [Korossy 1999] which uses mathematical structures to establish prerequisite relations on the competence and the performance level. From this framework, a methodology for assessing competencies in dynamic work domains is developed which utilises documents employees have created to assess the competencies they have been acquiring. By means of a case study, we show how the methodology and the resulting structures can be validated in an organisational setting. From the resulting structures, employee competency profiles can be derived and development planning can be supported. The structures also provide the means for making inferences within the competency assessment process which in turn

