Results 1  10
of
16
Fast maximum margin matrix factorization for collaborative prediction
 In Proceedings of the 22nd International Conference on Machine Learning (ICML
, 2005
"... Maximum Margin Matrix Factorization (MMMF) was recently suggested (Srebro et al., 2005) as a convex, infinite dimensional alternative to lowrank approximations and standard factor models. MMMF can be formulated as a semidefinite programming (SDP) and learned using standard SDP solvers. However, cu ..."
Abstract

Cited by 248 (6 self)
 Add to MetaCart
(Show Context)
Maximum Margin Matrix Factorization (MMMF) was recently suggested (Srebro et al., 2005) as a convex, infinite dimensional alternative to lowrank approximations and standard factor models. MMMF can be formulated as a semidefinite programming (SDP) and learned using standard SDP solvers. However, current SDP solvers can only handle MMMF problems on matrices of dimensionality up to a few hundred. Here, we investigate a direct gradientbased optimization method for MMMF and demonstrate it on large collaborative prediction problems. We compare against results obtained by Marlin (2004) and find that MMMF substantially outperforms all nine methods he tested. 1.
Multirelational Learning Using Weighted Tensor Decomposition with Modular Loss
"... We propose a modular framework for multirelational learning via tensor decomposition. In our learning setting, the training data contains multiple types of relationships among a set of objects, which we represent by a sparse threemode tensor. The goal is to predict the values of the missing entries ..."
Abstract

Cited by 8 (0 self)
 Add to MetaCart
We propose a modular framework for multirelational learning via tensor decomposition. In our learning setting, the training data contains multiple types of relationships among a set of objects, which we represent by a sparse threemode tensor. The goal is to predict the values of the missing entries. To do so, we model each relationship as a function of a linear combination of latent factors. We learn this latent representation by computing a lowrank tensor decomposition, using quasiNewton optimization of a weighted objective function. Sparsity in the observed data is captured by the weighted objective, leading to improved accuracy when training data is limited. Exploiting sparsity also improves efficiency, potentially up to an order of magnitude over unweighted approaches. In addition, our framework accommodates arbitrary combinations of smooth, taskspecific loss functions, making it better suited for learning different types of relations. For the typical cases of realvalued functions and binary relations, we propose several loss functions and derive the associated parameter gradients. We evaluate our method on synthetic and real data, showing significant improvements in both accuracy and scalability over related factorization techniques. 1
Transductive Ordinal Regression
"... Abstract—Ordinal regression is commonly formulated as a multiclass problem with ordinal constraints. The challenge of designing accurate classifiers for ordinal regression generally increases with the number of classes involved, due to the large number of labeled patterns that are needed. The avail ..."
Abstract

Cited by 5 (0 self)
 Add to MetaCart
(Show Context)
Abstract—Ordinal regression is commonly formulated as a multiclass problem with ordinal constraints. The challenge of designing accurate classifiers for ordinal regression generally increases with the number of classes involved, due to the large number of labeled patterns that are needed. The availability of ordinal class labels, however, is often costly to calibrate or difficult to obtain. Unlabeled patterns, on the other hand, often exist in much greater abundance and are freely available. To take benefits from the abundance of unlabeled patterns, we present a novel transductive learning paradigm for ordinal regression in this paper, namely Transductive Ordinal Regression (TOR). The key challenge of the present study lies in the precise estimation of both the ordinal class label of the unlabeled data and the decision functions of the ordinal classes, simultaneously. The core elements of the proposed TOR include an objective function that caters to several commonly used loss functions casted in transductive settings, for general ordinal regression. A label swapping scheme that facilitates a strictly monotonic decrease in the objective function value is also introduced. Extensive numerical studies on commonly used benchmark datasets including the real world sentiment prediction problem are then presented to showcase the characteristics and efficacies of the proposed transductive ordinal regression. Further, comparisons to recent stateoftheart ordinal regression methods demonstrate the introduced transductive learning paradigm for ordinal regression led to the robust and improved performance.
A Comparison of Different ROC Measures for Ordinal Regression
"... Ordinal regression learning has characteristics of both multiclass classification and metric regression because labels take ordered, discrete values. In applications of ordinal regression, the misclassification cost among the classes often differs and with different misclassification costs the comm ..."
Abstract

Cited by 4 (0 self)
 Add to MetaCart
(Show Context)
Ordinal regression learning has characteristics of both multiclass classification and metric regression because labels take ordered, discrete values. In applications of ordinal regression, the misclassification cost among the classes often differs and with different misclassification costs the common performance measures are not appropriate. Therefore we extend ROC analysis principles to ordinal regression. We derive an exact expression for the volume under the ROC surface (VUS) spanned by the true positive rates for each class and show its interpretation as the probability that a randomly drawn sequence with one object of each class is correctly ranked. Because the computation of V US has a huge time complexity, we also propose three approximations to this measure. Furthermore, the properties of VUS and its relationship with the approximations are analyzed by simulation. The results demonstrate that optimizing various measures will lead to different models. 1.
Extracting Information from Informal Communication
, 2007
"... This thesis focuses on the problem of extracting information from informal communication. Textual informal communication, such as email, bulletin boards and blogs, has become a vast information resource. However, such information is poorly organized and difficult for a computer to understand due to ..."
Abstract

Cited by 4 (0 self)
 Add to MetaCart
This thesis focuses on the problem of extracting information from informal communication. Textual informal communication, such as email, bulletin boards and blogs, has become a vast information resource. However, such information is poorly organized and difficult for a computer to understand due to lack of editing and structure. Thus, techniques which work well for formal text, such as newspaper articles, may be considered insufficient on informal text. One focus of ours is to attempt to advance the stateoftheart for subproblems of the information extraction task. We make contributions to the problems of named entity extraction, coreference resolution and context tracking. We channel our efforts toward methods which are particularly applicable to informal communication. We also consider a type of information which is somewhat unique to informal communication: preferences and opinions. Individuals often expression their opinions on products and services in such communication. Others’ may read these “reviews” to try to predict their own experiences. However, humans do a poor job of aggregating and generalizing large sets of data. We
C.: Dyadic Prediction Using a Latent Feature LogLinear Model
, 2010
"... In dyadic prediction, labels must be predicted for pairs (dyads) whose members possess unique identifiers and, sometimes, additional features called sideinformation. Special cases of this problem include collaborative filtering and link prediction. We present the first model for dyadic prediction ..."
Abstract

Cited by 3 (1 self)
 Add to MetaCart
In dyadic prediction, labels must be predicted for pairs (dyads) whose members possess unique identifiers and, sometimes, additional features called sideinformation. Special cases of this problem include collaborative filtering and link prediction. We present the first model for dyadic prediction that satisfies several important desiderata: (i) labels may be ordinal or nominal, (ii) sideinformation can be easily exploited if present, (iii) with or without sideinformation, latent features are inferred for dyad members, (iv) it is resistant to sampleselection bias, (v) it can learn wellcalibrated probabilities, and (vi) it can scale to very large datasets. To our knowledge, no existing method satisfies all the above criteria. In particular, many methods assume that the labels are ordinal and ignore sideinformation when it is present. Experimental results show that the new method is competitive with stateoftheart methods for the special cases of collaborative filtering and link prediction, and that it makes accurate predictions on nominal data. 1 Limitations of existing dyadic prediction methods In dyadic prediction, the training set consists of pairs of objects {(ri, ci)}ni=1, called dyads, with associated labels {yi}ni=1. The task is to predict labels for unobserved dyads i.e. for pairs (r′, c′) that do not appear in the training set. A wellstudied special case of dyadic prediction is where the dyads refer to (user, item) pairs,
Fuzzy Operator Trees for Modeling Rating Functions
"... We introduce a new method for modeling rating (utility) functions which employs techniques from fuzzy set theory. The main idea is to build a hierarchical model, called a fuzzy operator tree (FOT), by recursively decomposing a rating criterion into subcriteria, and to combine the evaluations of the ..."
Abstract

Cited by 3 (2 self)
 Add to MetaCart
(Show Context)
We introduce a new method for modeling rating (utility) functions which employs techniques from fuzzy set theory. The main idea is to build a hierarchical model, called a fuzzy operator tree (FOT), by recursively decomposing a rating criterion into subcriteria, and to combine the evaluations of these subcriteria by means of suitable aggregation operators. Apart from the model conception itself, we propose an evolutionary method for model calibration that fits the parameters of an FOT to exemplary ratings. The possibility to adapt an FOT to a given set of data makes the approach also interesting from a machine learning point of view. 1
On Minimizing the Position Error in Label Ranking
, 2007
"... Conventional classification learning allows a classifier to make a one shot decision in order to identify the correct label. However, in many practical applications, the problem is not to give a single estimation, but to make repeated suggestions until the correct target label has been identified. T ..."
Abstract

Cited by 2 (1 self)
 Add to MetaCart
Conventional classification learning allows a classifier to make a one shot decision in order to identify the correct label. However, in many practical applications, the problem is not to give a single estimation, but to make repeated suggestions until the correct target label has been identified. Thus, the learner has to deliver a label ranking, that is, a ranking of all possible alternatives. In this paper, we discuss a loss function, called the position error, which is suitable for evaluating the performance of a label ranking algorithm in this setting. Moreover, we propose “ranking through iterated choice”, a general strategy for extending any multiclass classifier to this scenario. Its basic idea is to reduce label ranking to standard classification by successively predicting a most likely class label and retraining a model on the remaining classes. We demonstrate empirically that this procedure does indeed reduce the position error in comparison with a conventional approach that ranks the classes according to their estimated probabilities. Besides, we also address the issue of implementing ranking through iterated choice in a computationally efficient way.
Decision Rulebased Algorithm for Ordinal Classification based on Rank Loss Minimization
"... Abstract. Many classification problems have in fact an ordinal nature, i.e., the class labels are ordered. We introduce a decision rule algorithm, called RankRules, tailored for this type of problems, that is based on minimization of the rank loss. In general, the complexity of the rank loss minimiz ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
(Show Context)
Abstract. Many classification problems have in fact an ordinal nature, i.e., the class labels are ordered. We introduce a decision rule algorithm, called RankRules, tailored for this type of problems, that is based on minimization of the rank loss. In general, the complexity of the rank loss minimization is quadratic with respect to the number of training examples, however, we show that the introduced algorithm works in linear time (plus sorting time of attribute values that is performed once in the preprocessing phase). The rules are built using a boosting approach. The impurity measure used for building single rules is derived using one of four minimization techniques often encountered in boosting. We analyze these techniques focusing on the tradeoff between misclassification and coverage of the rule. RankRules is verified in the computational experiment showing its competitiveness to other algorithms. 1