Results 11 - 20
of
2,956
Statistical Analysis of a Telephone Call Center: A Queueing Science Perspective
, 2004
"... A call center is a service network in which agents provide telephone-based services. Customers that seek these services are delayed in tele-queues. This paper summarizes an analysis of a unique record of call center operations. The data comprise a complete operational history of a small banking cal ..."
Abstract
-
Cited by 242 (37 self)
- Add to MetaCart
A call center is a service network in which agents provide telephone-based services. Customers that seek these services are delayed in tele-queues. This paper summarizes an analysis of a unique record of call center operations. The data comprise a complete operational history of a small banking call center, call by call, over a full year. Taking the perspective of queueing theory, we decompose the service process into three fundamental components: arrivals, customer patience, and service durations. Each component involves different basic mathematical structures and requires a different style of statistical analysis. Some of the key empirical results are sketched, along with descriptions of the varied techniques required.
A Comparison of Prediction Accuracy, Complexity, and Training Time of Thirty-three Old and New Classification Algorithms
, 2000
"... . Twenty-two decision tree, nine statistical, and two neural network algorithms are compared on thirty-two datasets in terms of classication accuracy, training time, and (in the case of trees) number of leaves. Classication accuracy is measured by mean error rate and mean rank of error rate. Both cr ..."
Abstract
-
Cited by 234 (8 self)
- Add to MetaCart
(Show Context)
. Twenty-two decision tree, nine statistical, and two neural network algorithms are compared on thirty-two datasets in terms of classication accuracy, training time, and (in the case of trees) number of leaves. Classication accuracy is measured by mean error rate and mean rank of error rate. Both criteria place a statistical, spline-based, algorithm called Polyclass at the top, although it is not statistically signicantly dierent from twenty other algorithms. Another statistical algorithm, logistic regression, is second with respect to the two accuracy criteria. The most accurate decision tree algorithm is Quest with linear splits, which ranks fourth and fth, respectively. Although spline-based statistical algorithms tend to have good accuracy, they also require relatively long training times. Polyclass, for example, is third last in terms of median training time. It often requires hours of training compared to seconds for other algorithms. The Quest and logistic regression algor...
A longitudinal study of engineering student performance and retention. I. Success and failure in the introductory course
- J. Engr. Education
, 1993
"... A cohort of chemical engineering students has been taught in an experimental sequence of five chemical engineering courses, beginning with the introductory course in the Fall 1990 semester. Differences in academic performance have been observed between students from rural and small town backgrounds ..."
Abstract
-
Cited by 188 (11 self)
- Add to MetaCart
A cohort of chemical engineering students has been taught in an experimental sequence of five chemical engineering courses, beginning with the introductory course in the Fall 1990 semester. Differences in academic performance have been observed between students from rural and small town backgrounds (“rural students, ” N=55) and students from urban and suburban backgrounds (“urban students, ” N=65), with the urban students doing better on almost every measure investigated. In the introductory course, 80% of the urban students and 55 % of the rural students passed with a grade of C or better, with average grades of 2.63 for the urban students and 1.80 for the rural students (A=4.0). The urban group continued to earn higher grades in subsequent chemical engineering courses. After four years, 79 % of the urban students and 64 % of the rural students had graduated or were still enrolled in chemical engineering; the others had either transferred out of engineering or were no longer attending the university. This paper presents data on the students ’ home and school backgrounds and speculates on possible causes of observed performance differences between the two populations. * Journal of Engineering Education, 83(3), 209–217 (1994). Charts in the published version have been converted to
Generalized linear mixed models: a practical guide for ecology and evolution.
- Trends in Ecology and Evolution,
, 2009
"... How should ecologists and evolutionary biologists analyze nonnormal data that involve random effects? Nonnormal data such as counts or proportions often defy classical statistical procedures. Generalized linear mixed models (GLMMs) provide a more flexible approach for analyzing nonnormal data when ..."
Abstract
-
Cited by 183 (1 self)
- Add to MetaCart
(Show Context)
How should ecologists and evolutionary biologists analyze nonnormal data that involve random effects? Nonnormal data such as counts or proportions often defy classical statistical procedures. Generalized linear mixed models (GLMMs) provide a more flexible approach for analyzing nonnormal data when random effects are present. The explosion of research on GLMMs in the last decade has generated considerable uncertainty for practitioners in ecology and evolution. Despite the availability of accurate techniques for estimating GLMM parameters in simple cases, complex GLMMs are challenging to fit and statistical inference such as hypothesis testing remains difficult. We review the use (and misuse) of GLMMs in ecology and evolution, discuss estimation and inference and summarize 'best-practice' data analysis procedures for scientists facing this challenge. Generalized linear mixed models: powerful but challenging tools Data sets in ecology and evolution (EE) Researchers faced with nonnormal data often try shortcuts such as transforming data to achieve normality and homogeneity of variance, using nonparametric tests or relying on the robustness of classical ANOVA to nonnormality for balanced designs Instead of shoehorning their data into classical statistical frameworks, researchers should use statistical approaches that match their data. Generalized linear mixed models (GLMMs) combine the properties of two statistical frameworks that are widely used in EE, linear mixed models (which incorporate random effects) and generalized linear models (which handle nonnormal data by using link functions and exponential family [e.g. normal, Poisson or binomial] distributions). GLMMs are the best tool for analyzing nonnormal data that involve random effects: all one has to do, in principle, is specify a distribution, link function and structure of the random effects. For example, in Box 1, we use a GLMM to quantify the magnitude of the genotype-environment interaction in the response of Arabidopsis to herbivory. To do so, we select a Poisson distribution with a logarithmic link (typical for count data) and specify that the total number of fruits per plant and the responses to fertilization and clipping could vary randomly across populations and across genotypes within a population. However, GLMMs are surprisingly challenging to use even for statisticians. Although several software packages can handle GLMMs
Clustering categorical data: An approach based on dynamical systems
, 1998
"... We describe a novel approach for clustering col-lections of sets, and its application to the analysis and mining of categorical data. By “categorical data, ” we mean tables with fields that cannot be naturally ordered by a metric- e.g., the names of producers of automobiles, or the names of prod-uct ..."
Abstract
-
Cited by 180 (1 self)
- Add to MetaCart
We describe a novel approach for clustering col-lections of sets, and its application to the analysis and mining of categorical data. By “categorical data, ” we mean tables with fields that cannot be naturally ordered by a metric- e.g., the names of producers of automobiles, or the names of prod-ucts offered by a manufacturer. Our approach is based on an iterative method for assigning and propagating weights on the categorical values in a table; this facilitates a type of similarity mea-sure arising from the co-occurrence of values in the dataset. Our techniques can be studied an-alytically in terms of certain types of non-linear dynamical systems. We discuss experiments on a variety of tables of synthetic and real data; we find that our iterative methods converge quickly to prominently correlated values of various categorical fields.
Word sense disambiguation using a second language monolingual corpus
- COMPUTATIONAL LINGUISTICS
, 1994
"... This paper presents a new approach for resolving lexical ambiguities in one language using statistical data from a monolingual corpus of another language. This approach exploits the differences between mappings of words to senses in different languages. The paper concentrates on the problem of targe ..."
Abstract
-
Cited by 166 (1 self)
- Add to MetaCart
This paper presents a new approach for resolving lexical ambiguities in one language using statistical data from a monolingual corpus of another language. This approach exploits the differences between mappings of words to senses in different languages. The paper concentrates on the problem of target word selection in machine translation, for which the approach is directly applicable. The presented algorithm identifies syntactic relations between words, using a source language parser, and maps the alternative interpretations of these relations to the target language, using a bilingual lexicon. The preferred senses are then selected according to statistics on lexical relations in the target language. The selection is based on a statistical model and on a constraint propagation algorithm, which simultaneously handles all ambiguities in the sentence. The method was evaluated using three sets of Hebrew and German examples and was found to be very useful for disambiguation. The paper includes a detailed comparative analysis of statistical sense disambiguation methods.
Unbiased recursive partitioning: a conditional inference framework
, 2004
"... Recursive binary partitioning is a popular tool for regression analysis. Two fundamental problems of exhaustive search procedures usually applied to fit such models have been known for a long time: Overfitting and a selection bias towards covariates with many possible splits or missing values. While ..."
Abstract
-
Cited by 165 (12 self)
- Add to MetaCart
(Show Context)
Recursive binary partitioning is a popular tool for regression analysis. Two fundamental problems of exhaustive search procedures usually applied to fit such models have been known for a long time: Overfitting and a selection bias towards covariates with many possible splits or missing values. While pruning procedures are able to solve the overfitting problem, the variable selection bias still seriously effects the interpretability of tree-structured regression models. For some special cases unbiased procedures have been suggested, however lacking a common theoretical foundation. We propose a unified framework for recursive partitioning which embeds tree-structured regression models into a well defined theory of conditional inference procedures. Stopping criteria based on multiple test procedures are implemented and it is shown that the predictive performance of the resulting trees is as good as the performance of established exhaustive search procedures. It turns out that the partitions and therefore the models induced by both approaches are structurally different, indicating the need for an unbiased variable selection. The methodology presented here is applicable to all kinds of regression problems, including nominal, ordinal, numeric, censored as well as multivariate response variables and arbitrary measurement scales of the covariates. Data from studies on animal abundance, glaucoma classification, node positive breast cancer and mammography experience are re-analyzed.
Latent variable analysis: Growth mixture modeling and related techniques for longitudinal data
, 2004
"... This chapter gives an overview of recent advances in latent variable analysis. Emphasis is placed on the strength of modeling obtained by using a flexible combination of continuous and categorical latent variables. ..."
Abstract
-
Cited by 160 (16 self)
- Add to MetaCart
(Show Context)
This chapter gives an overview of recent advances in latent variable analysis. Emphasis is placed on the strength of modeling obtained by using a flexible combination of continuous and categorical latent variables.