#### DMCA

## Multi-instance multi-label learning

### Cached

### Download Links

Venue: | Artificial Intelligence |

Citations: | 38 - 16 self |

### Citations

6250 | LIBSVM: a library for support vector machines, 2001. Available at http://www.csie.ntu.edu.tw/~cjlin/libsvm
- Chang, Lin
(Show Context)
Citation Context ...the two algorithms at different boosting rounds is shown in Appendix B (Fig. B.1), it can be observed that at those rounds the performance of the algorithms have become stable. Gaussian kernel Libsvm =-=[16]-=- is used for the Step 3a of MimlBoost. The MimlSvm and MimlSvmmi are also realized with Gaussian kernels. The parameter k of MimlSvm is set to be 20% of the number of training images; The performance ... |

5222 | Convex Analysis
- Rockafellar
- 1970
(Show Context)
Citation Context ...s impossible since ∆1,2 < ∆2,1 and 2φ′(0) + ∑ i 6=1,2 φ ′(f1 − fi) ≤ 2φ′(0) < 0. Thus, we complete the proof. 6.4 Proof of Theorem 10 For convex function φ we have φ′(x) ≤ φ′(y) for every x ≤ y from (=-=Rockafellar, 1997-=-), and the derivative function φ′(x) is continuous for x ∈ R if φ is differential and convex. Since φ is nonincreasing, we have φ′(x) ≤ 0 for all x ∈ R, and without loss of generality, we assume φ′(x)... |

3416 |
UCI repository of machine learning databases. http://www.ics. uci.edu/ ∼ mlearn/MLRepository.html
- Blake, Merz
- 1998
(Show Context)
Citation Context ...ulti-instancelearningdatasetsareused,includingMusk1,Musk2, Elephant, Tiger and Fox.BothMusk1 andMusk2 are drug activity prediction data sets, publicly available at the UCI machine learning repository =-=[8]-=-. Here every bag corresponds to a molecule, while every instance corresponds to a low-energy shape of the molecule [24].Musk1 contains 47 positive bagsand 45 negative bags, andthe number of instances ... |

2257 | Text categorization with support vector machines: Learning with many relevant features
- Joachims
- 1998
(Show Context)
Citation Context ...ity reduction methods have been developed [74,85]. Roughly speaking, earlier approaches to multi-label learning attempt to divide multi-label learning to a number of two-class classification problems =-=[36,72]-=- or transform it into a label ranking problem [27,56], while some later approaches try to exploit the correlation between the labels [43,65,85]. Most studies on multi-label learning focus on text cate... |

1652 | Machine learning in automated text categorization
- Sebastiani
- 2002
(Show Context)
Citation Context ...describing common characteristics of the class l. Actually, this kind of prototype vectors have already shown their usefulness in solving text categorization problems. For example, the Rocchio method =-=[34,59]-=- forms a prototype vector for each class by averaging all the documents (represented by weight vectors) of this class, and then classifies the testdocumentbycalculatingthedot-productsbetweentheweightv... |

1491 |
Automatic Text Processing: The Transformation, Analysis, and Retrieval of Information by Computer
- Salton
- 1989
(Show Context)
Citation Context ...081±.011 .329±.029 .026±.003 .949±.006 .777±.016 .854±.011 Ml-knn .049±.003 .126±.012 .440±.035 .045±.004 .920±.007 .821±.021 .867±.013 have been removed from the vocabulary using the Smart stop-list =-=[55]-=-. It has been found that based on document frequency, the dimensionality of the data set can be reduced to 1-10% without loss of effectiveness [73]. Thus, we use the top 2% frequent words, and therefo... |

1287 | A comparative study on feature selection in text categorization
- Yang, Pederson
- 1997
(Show Context)
Citation Context ...emoved from the vocabulary using the Smart stop-list [55]. It has been found that based on document frequency, the dimensionality of the data set can be reduced to 1-10% without loss of effectiveness =-=[73]-=-. Thus, we use the top 2% frequent words, and therefore each instance is a 243-dimensional feature vector. 5 The parameter configurations of RankSvm, MlSvm and Ml-knn are set in the same way as in Sec... |

686 |
Learning with Kernels: Support Vector
- Schölkopf, Smola
- 2001
(Show Context)
Citation Context ... (15) 1 T ||ˆ f|| 2 ˆ H +γ ˆ V({Xi} m i=1,{Yi} m i=1, ˆ f). (16) Note that || ˆ f|| 2 ˆ H : [0,∞) → R is a strictly monotonically increasing function. According to representer theorem (Theorem 4.2 in =-=[57]-=-), each minimizer ˆ f of the functional risk in Eq. 16 admits a representation of the form ⎛ T∑ m∑ ˆf(x,t) = ⎝βt,i0 t=1 i=1 ˆ ni ∑ k((Xi,t),(x,t))+ βt,ij j=1 ˆ ⎞ k((xij,t),(x,t)) ⎠ , (17) where βt,ij ... |

653 | An evaluation of statistical approaches to text categorization
- Yang
- 1999
(Show Context)
Citation Context ...ity reduction methods have been developed [74,85]. Roughly speaking, earlier approaches to multi-label learning attempt to divide multi-label learning to a number of two-class classification problems =-=[36,72]-=- or transform it into a label ranking problem [27,56], while some later approaches try to exploit the correlation between the labels [43,65,85]. Most studies on multi-label learning focus on text cate... |

652 | BoosTexter: A boosting-based system for text categorization
- Schapire, Singer
- 2000
(Show Context)
Citation Context ...eously. For example, in text categorization, a document about national education service may cover several predefined topics, such as government and education, indicating the content in the document (=-=Schapire and Singer, 2000-=-); in bioinformatics, a gene sequence may be relevant to multiple functions, such as metabolism, transcription and protein synthesis, showing the functions of the gene within a cell’s life circle (Eli... |

604 | Large margin methods for structured and interdependent output variables
- Tsochantaridis, Joachims, et al.
- 2005
(Show Context)
Citation Context ...ant, we present an efficient algorithm which constructs a nested sequence of tighter relaxations of the original problem using the cutting plane method [40]. 30Similartoitsusewithstructuredprediction=-=[64]-=-,weaddaconstraint (oracut)that is most violated by the current solution, and then find the solution in the updated feasible region. Such a procedure will converge to an optimal (or ε-suboptimal) solut... |

584 | Max margin markov networks
- Taskar, Guestrin, et al.
- 2003
(Show Context)
Citation Context ... learning approaches, such as the boosting algorithm AdaBoost.MH (Schapire and Singer, 2000), neural network algorithm BP-MIL (Zhang and Zhou, 2006), SVM-style algorithms (Elisseeff and Weston, 2002, =-=Taskar et al., 2004-=-, Hariharan et al., 2010), etc., in essence, try to optimize some surrogate losses such as the exponential loss and hinge loss. There are many definitions of consistency, e.g., the Fisher consistency ... |

308 | T.: Support Vector Machines for Multiple-Instance Learning
- Andrews, Tsochantaridis, et al.
- 1999
(Show Context)
Citation Context ...Relic [54] and Miti [9], neural network algorithms Bp-mip and extensions [77,90] and Rbf-mip [78], rule learning algorithm Ripper-mi [20], support vector machines and kernel methods mi-Svm and Mi-Svm =-=[3]-=-, Dd-Svm [18], MissSvm [88], Mi-Kernel[32], Bag-Instance Kernel[19], Marginalized Mi-Kernel[42] andconvex-hull methodCh-Fd[31],ensemble algorithmsMi-Ensemble[91],MiBoosting [70] and MilBoosting [6], l... |

260 | Solving the multiple-instance problem with axis-parallel rectangles
- Dietterich, Lathrop, et al.
- 1997
(Show Context)
Citation Context ...e. The goal is to label unseen bags correctly. Note that although the training bags are labeled, the labels of their instances are unknown. This learning framework was formalized by Dietterich et al. =-=[24]-=- when they were investigating drug activity prediction. Long and Tan [44] studied the Pac-learnability of multi-instance learning and showed that if the instances in the bags are independently drawn f... |

246 | Learning multiple tasks with kernel methods
- Evgeniou, Micchelli, et al.
- 2005
(Show Context)
Citation Context ...unction in Eq. 6 with an additional term Ω(f): Ω(f)+γ ·V({Xi} m i=1 ,{Yi} m i=1 ,f) (7) Here, γ is a regularization parameter balancing the model complexity Ω(f) and the empirical risk V. Inspired by =-=[28]-=-, we assume that the relatedness among the labels can be measured by the mean function w0, w0 = 1 T T∑ wt (8) The original idea in [28] is to minimize ∑ T t=1 ||wt−w0|| 2 and meanwhile minimize ||w0||... |

223 | A kernel method for multi-labelled classification
- Elisseeff, Weston
(Show Context)
Citation Context ...andmountains. For learning with such objects, multi-label learning has attracted much attention during the past few years and many effective approaches have been developed (Schapire and Singer, 2000, =-=Elisseeff and Weston, 2002-=-, Zhou and Zhang, 2007, Zhang and Zhou, 2007, Hsu et al., 2009, Dembczyński et al., 2010, Petterson and Caetano, 2010). The consistency (also called Bayes consistency) of learning algorithms concerns... |

221 | Sparse greedy matrix approximation for machine learning
- Smola, Schokopf
- 2000
(Show Context)
Citation Context ... sets and the solutions as all zeros (Line 1). Then, instead of testing all the constraints, which is rather expensive when there are lots of constraints, we use the speedup heuristic as described in =-=[61]-=-, i.e., we use p constraints to approximate the whole constraints (Line 4). Smola and Schölkopf [61] have shown that when p is larger than 59, the selected violated constraint is with probability 0.95... |

204 | Learning multi-label scene classification
- Boutell, Luo, et al.
- 2004
(Show Context)
Citation Context ...(... |

191 | Image categorization by learning and reasoning with regions
- Chen, Wang
(Show Context)
Citation Context ...nd Miti [9], neural network algorithms Bp-mip and extensions [77,90] and Rbf-mip [78], rule learning algorithm Ripper-mi [20], support vector machines and kernel methods mi-Svm and Mi-Svm [3], Dd-Svm =-=[18]-=-, MissSvm [88], Mi-Kernel[32], Bag-Instance Kernel[19], Marginalized Mi-Kernel[42] andconvex-hull methodCh-Fd[31],ensemble algorithmsMi-Ensemble[91],MiBoosting [70] and MilBoosting [6], logistic regre... |

191 | The alternating decision tree learning algorithm
- Freund, Mason
- 1999
(Show Context)
Citation Context ...ining phase, where training examples and their corresponding labels that are hard (easy) to predict get incrementally higher (lower) weights. Later, De Comité et al. [22] used alternatingdecisiontrees=-=[30]-=-whicharemorepowerfulthandecisionstumpsusedin BoosTexter to handle multi-label data and thus obtained the AdtBoost.MH algorithm. Probabilistic generative models have been found useful in multi-label le... |

177 | C.: Multiple Instance Boosting for Object Detection
- Viola, Platt, et al.
- 2005
(Show Context)
Citation Context ...plied to diverse applications including image categorization [17,18], image retrieval [71,84], text categorization [3,60], web mining [86], spam detection [37], computer security [54], face detection =-=[66,76]-=-, computer-aided medical diagnosis [31], etc. 3 The MIML Framework Let X denote the instance space and Y the set of class labels. Then, formally, the MIML task is defined as: 8(a) Traditional supervi... |

174 | Multi-Label Text Classification with a Mixture Model Trained by EM
- McCallum
- 1999
(Show Context)
Citation Context ...fulthandecisionstumpsusedin BoosTexter to handle multi-label data and thus obtained the AdtBoost.MH algorithm. Probabilistic generative models have been found useful in multi-label learning. McCallum =-=[47]-=- proposed a Bayesian approach for multi-label document classification, where a mixture probabilistic model (one mixture component per category) is assumed to generate each document and an EM algorithm... |

172 | ML-kNN: a lazy learning approach to multi-label learning. Pattern Recognition 40(7):2038–2048
- Zhang, Zhou
- 2007
(Show Context)
Citation Context ...Backpropagation algorithm by employing an error function to capture the fact that the labels belonging to an instance should be ranked higher than those not belonging to that instance. Zhang and Zhou =-=[80]-=- also proposed the Ml-knn algorithm, which identifies the k nearest neighbors of the concerned instance and then assigns labels according to the maximum a posteriori principle. Elisseeff and Weston [2... |

168 |
The concave-convex procedure
- Yuille, Rangarajan
- 2003
(Show Context)
Citation Context ...zation problem can be solved 28by Cccp [19,62], which is one of the most standard techniques to solve such kind of non-convex optimization problems. Cccp is guaranteed to converge to a local minimum =-=[75]-=-, and in many cases it can even converge to a global solution [25]. Here, for solving the optimization problem 21, Cccp works by solving a sequential convexquadraticproblems.Concretely,giventheinitial... |

160 | Hierarchical document categorization with support vector machines
- Cai, Hofmann
- 2004
(Show Context)
Citation Context ...rization [22,33,39,47,56, 65,74], and several studies aim to improve the performance of text categorization systems by exploiting additional information given by the hierarchical structure of classes =-=[14,15,53]-=- or unlabeled data [43]. In addition to text categorization, multi-label learning has also been found useful in many other tasks such as scene classification [11], image and video annotation [38,48], ... |

154 | A.: Multi-Instance Kernels - Gartner, Flach, et al. |

142 |
The cutting-plane method for solving convex programs
- Kelley
- 1960
(Show Context)
Citation Context ...hat most of the constraints in Eq. 24 are redundant, we present an efficient algorithm which constructs a nested sequence of tighter relaxations of the original problem using the cutting plane method =-=[40]-=-. 30Similartoitsusewithstructuredprediction[64],weaddaconstraint (oracut)that is most violated by the current solution, and then find the solution in the updated feasible region. Such a procedure wil... |

118 | Collective multi- label classification
- Ghamrawi, McCallum
- 2005
(Show Context)
Citation Context ...g loss, one-error, coverage and average precision (Schapire and Singer, 2000, Zhang and Zhou, 2006); accuracy, precision, recall and F1 (Godbole and Sarawagi, 2004, Qi et al., 2007); subset accuracy (=-=Ghamrawi and McCallum, 2005-=-); etc. In this paper, we focus on two well-known losses, i.e., ranking loss and hamming loss, and leave the discussion on other losses to future work. Notice that all the above multi-label losses are... |

108 | Active Learning
- Settles
- 2012
(Show Context)
Citation Context ...aresatisfied (i.e., thebag contains instances fromevery concept). Recently, research on multi-instance clustering [82], multi-instance semi-supervised learning [49] and multi-instance active learning =-=[60]-=- have also been reported. Multi-instance learning has also attracted the attention of the Ilp community. It has been suggested that multi-instance problems could be regarded as a bias on inductive log... |

107 | Log-linear models for label ranking
- Dekel, Keshet, et al.
- 2004
(Show Context)
Citation Context ...− fi(X)), (4) where φ is a convex real-valued function, which was chosen as hinge loss φ(x) = (1−x)+ in (Elisseeff and Weston, 2002) and exponential loss φ(x) = exp(−x) in (Schapire and Singer, 2000, =-=Dekel et al., 2004-=-, Zhang and Zhou, 2006). Before our discussion, it is necessary to introduce the notations ∆i,j = ∑ Y : yi<yj pY aY and δ(i, j; k, l) = ∑ Y : yi<yj ,yk<yl pY aY , for a given vector p ∈ Λ and a non-ne... |

100 | Knowledge discovery in multi-label phenotype data
- Clare, King
- 2001
(Show Context)
Citation Context ...ext applications. Many other multi-label learning algorithms have been developed, such as decision trees, neural networks, k-nearest neighbor classifiers, support vector machines, etc. Clare and King =-=[21]-=- developed a multi-label version of C4.5 decision trees through modifying the definition of entropy. Zhang and Zhou [79] presented multi-label neural network Bp-Mll, which is derived from the Backprop... |

97 | Discriminative methods for multi-labeled classification
- Godbole, Sarawagi
- 2004
(Show Context)
Citation Context ... learning by defining a specific cost function and the corresponding margin for multi-label models. Other kinds of multi-label Svms have been developed by Boutell et al. [11] and Godbole and Sarawagi =-=[33]-=-. In particular, by hierarchically approximating the Bayes optimal classifier for the H-loss, 5Cesa-Bianchi et al. [15] proposed an algorithm which outperforms simple hierarchical Svms. Recently, non... |

95 | Support Vector Machines and the Bayes rule in classification
- Lin
- 2002
(Show Context)
Citation Context ...o least square loss, and Breiman (2004) studied the convergence of arcing-style greedy boosting algorithms to the Bayes classifier. The consistency theory on support vector machines was developed in (=-=Lin, 2002-=-; Steinwart, 2005). The most influential and fundamental work (Zhang, 2004b; Bartlett et al., 2006) investigated the consistency for binary classification, in which many popular algorithms (e.g., boos... |

93 | Multi-label prediction via compressed sensing
- Hsu, Kakade, et al.
- 2009
(Show Context)
Citation Context ...ttracted much attention during the past few years and many effective approaches have been developed (Schapire and Singer, 2000, Elisseeff and Weston, 2002, Zhou and Zhang, 2007, Zhang and Zhou, 2007, =-=Hsu et al., 2009-=-, Dembczyński et al., 2010, Petterson and Caetano, 2010). The consistency (also called Bayes consistency) of learning algorithms concerns whether the expected risk of a learned function converges to ... |

91 |
Hierarchical multi-label prediction of gene function
- Barutcuoglu, Schapire, et al.
(Show Context)
Citation Context ...ata [43]. In addition to text categorization, multi-label learning has also been found useful in many other tasks such as scene classification [11], image and video annotation [38,48], bioinformatics =-=[7,12,13,21, 27]-=-, and even association rule mining [50,63]. There is a lot of research on multi-instance learning, which studies the problem where a real-world object described by a number of instances is associated ... |

90 | Correlative multi-label video annotation
- Qi, Hua, et al.
- 2007
(Show Context)
Citation Context ...14,15,53] or unlabeled data [43]. In addition to text categorization, multi-label learning has also been found useful in many other tasks such as scene classification [11], image and video annotation =-=[38,48]-=-, bioinformatics [7,12,13,21, 27], and even association rule mining [50,63]. There is a lot of research on multi-instance learning, which studies the problem where a real-world object described by a n... |

89 | Content-based image retrieval using multiple-instance learning
- Zhang, Goldman, et al.
- 2002
(Show Context)
Citation Context ...as a bridge between propositional and relational learning. Multi-instance learning techniques have already been applied to diverse applications including image categorization [17,18], image retrieval =-=[71,84]-=-, text categorization [3,60], web mining [86], spam detection [37], computer security [54], face detection [66,76], computer-aided medical diagnosis [31], etc. 3 The MIML Framework Let X denote the in... |

89 | Listwise approach to learning to rank - theory and algorithm
- Xia, Liu, et al.
- 2008
(Show Context)
Citation Context ...died for binary classification (Zhang, 2004b, Steinwart, 2005, Bartlett et al., 2006), multi-class classification (Zhang, 2004a, Tewari and Bartlett, 2007), learning to rank (Cossock and Zhang, 2008, =-=Xia et al., 2008-=-, Duchi et al., 2010), etc. The consistency of multi-label learning, however, remains untouched, and to the best of our knowledge, this paper presents the first theoretical analysis on the consistency... |

88 | Multilabel neural networks with applications to functional genomics and text categorization
- Zhang, Zhou
- 2006
(Show Context)
Citation Context ...s, k-nearest neighbor classifiers, support vector machines, etc. Clare and King [21] developed a multi-label version of C4.5 decision trees through modifying the definition of entropy. Zhang and Zhou =-=[79]-=- presented multi-label neural network Bp-Mll, which is derived from the Backpropagation algorithm by employing an error function to capture the fact that the labels belonging to an instance should be ... |

87 | Parametric mixture models for multi-labeled text
- Ueda, Saito
- 2003
(Show Context)
Citation Context ...stic model (one mixture component per category) is assumed to generate each document and an EM algorithm is employed tolearnthemixtureweightsandtheworddistributionsineachmixturecomponent. UedaandSaito=-=[65]-=-presented anothergenerativeapproach,whichassumes thatthe multi-label text has a mixture of characteristic words appearing in single-label text belonging to each of the multi-labels. It is noteworthy t... |

75 | Kernel methods for missing variables
- Smola, Vishwanathan, et al.
- 2005
(Show Context)
Citation Context ... than that of negative examples, this method incorporates a mechanism to deal with class imbalance. We employ the constrained concave-convex procedure (Cccp) whichhaswell-studied convergenceproperties=-=[62]-=-tosolvetheresultantnon-convex optimization problem. We also present a cutting plane algorithm that finds the solution efficiently. 5.1 The Loss Function GivenasetofMIMLtrainingexamples{(X1,Y1),(X2,Y2)... |

75 | Consistency of support vector machines and other regularized kernel classifiers - Steinwart - 2005 |

67 | On the consistency of multiclass classification methods - Tewari, Bartlett - 2007 |

66 | On learning from multi-instance examples: Empirical evalutaion of a theoretical approach - Auer - 1997 |

64 | Supervised versus multiple instance learning: An empirical comparison
- Ray, Craven
- 2005
(Show Context)
Citation Context ...nel[32], Bag-Instance Kernel[19], Marginalized Mi-Kernel[42] andconvex-hull methodCh-Fd[31],ensemble algorithmsMi-Ensemble[91],MiBoosting [70] and MilBoosting [6], logistic regression algorithm Mi-lr =-=[51]-=-, etc. Actually almost all popular machine learning algorithms have their multiinstance versions. Most algorithms attempt to adapt single-instance supervised learning algorithms to the multi-instance ... |

63 | Learning with multiple labels
- Jin, Ghahramani
- 2003
(Show Context)
Citation Context ...bel learning assumes that an instance can be associated with multiple valid labels, but there is also some work assuming that only one of the labels among those associated with an instance is correct =-=[35]-=-. 4address. One famous approach to solving multi-label problems is Schapire and Singer’s AdaBoost.MH [56], which is an extension of AdaBoost and is the core of a successful multi-label learningsystem... |

62 | T.: Image Database Retrieval with Multiple-Instance Learning Techniques - Yang, Lozano-Perez |

57 | A note on learning from multipleinstance examples
- Blum, Kalai
- 1998
(Show Context)
Citation Context ... framework is NP-hard. Moreover, they presented a theoretical algorithm that does not require product distribution, which was transformed into a practical algorithm named Multinst [4]. Blum and Kalai =-=[10]-=- described a reduction from Pac-learning under the multi-instance learning framework to Pac-learning with one-sided random 6classification noise. They also presented an algorithm with smaller sample ... |

57 | Correlated label propagation with application to multi-label learning
- Kang, Jin, et al.
- 2006
(Show Context)
Citation Context ...14,15,53] or unlabeled data [43]. In addition to text categorization, multi-label learning has also been found useful in many other tasks such as scene classification [11], image and video annotation =-=[38,48]-=-, bioinformatics [7,12,13,21, 27], and even association rule mining [50,63]. There is a lot of research on multi-instance learning, which studies the problem where a real-world object described by a n... |

57 | Bayes optimal multilabel classification via probabilistic classifier chains - Cheng, Dembczyński, et al. - 2010 |

55 | Logistic regression and boosting for labeled bags of instances
- Xu, Frank
- 2004
(Show Context)
Citation Context ...thods mi-Svm and Mi-Svm [3], Dd-Svm [18], MissSvm [88], Mi-Kernel[32], Bag-Instance Kernel[19], Marginalized Mi-Kernel[42] andconvex-hull methodCh-Fd[31],ensemble algorithmsMi-Ensemble[91],MiBoosting =-=[70]-=- and MilBoosting [6], logistic regression algorithm Mi-lr [51], etc. Actually almost all popular machine learning algorithms have their multiinstance versions. Most algorithms attempt to adapt single-... |

54 |
L.: Attribute-value learning versus inductive logic programming: The missing links (extended abstract
- Raedt
- 1998
(Show Context)
Citation Context ...e logicprogramming,andthemulti-instance paradigmcouldbethekey between the propositional and relational representations, being more expressive than the former, and much easier to learn than the latter =-=[23]-=-. Alphonse and Matwin [1] approximated a relational learning problem by a multi-instance problem, fed the resulting data to feature selection techniques adapted from propositional representations, and... |

52 | Semi-supervised multilabel learning by constrained non-negative matrix factorization
- Liu, Jin, et al.
- 2006
(Show Context)
Citation Context ... for the H-loss, 5Cesa-Bianchi et al. [15] proposed an algorithm which outperforms simple hierarchical Svms. Recently, non-negative matrix factorization has also been applied to multi-label learning =-=[43]-=-, and multi-label dimensionality reduction methods have been developed [74,85]. Roughly speaking, earlier approaches to multi-label learning attempt to divide multi-label learning to a number of two-c... |

52 | Multiple instance regression
- Ray, Page
(Show Context)
Citation Context ...ears of the research of multi-instance learning, most work considered multi-instance classification with discrete-valued outputs. Later, multi-instance regression with real-valued outputs was studied =-=[2,52]-=-, and different versions of generalized multi-instance learning have been defined [58,68]. The main difference between standard multi-instance learning and generalized multi-instance learning is that ... |

50 | Multi-label informed latent semantic indexing
- Yu, Yu, et al.
(Show Context)
Citation Context ...performs simple hierarchical Svms. Recently, non-negative matrix factorization has also been applied to multi-label learning [43], and multi-label dimensionality reduction methods have been developed =-=[74,85]-=-. Roughly speaking, earlier approaches to multi-label learning attempt to divide multi-label learning to a number of two-class classification problems [36,72] or transform it into a label ranking prob... |

48 | Approximating hyper-rectangles: learning and pseudorandom sets
- Auer, Long, et al.
- 1997
(Show Context)
Citation Context ...ning and showed that if the instances in the bags are independently drawn from product distribution, the Apr (Axis-Parallel Rectangle) proposed by Dietterich et al. [24] is Pac-learnable. Auer et al. =-=[5]-=- showed that if the instances in the bags are not independent then Apr learning under the multi-instance learning framework is NP-hard. Moreover, they presented a theoretical algorithm that does not r... |

48 | Statistical analysis of bayes optimal subset ranking
- Cossock, Zhang
(Show Context)
Citation Context ... sufficient training data, and thus rarely adopted in practice. Much work has been devoted to the analysis of consistency for ranking problems under different learning settings, e.g., subset ranking (=-=Cossock and Zhang, 2008-=-), listwise ranking (Xia et al., 2008), top-... |

46 |
2004 MMAC: A new multi-class, multi-label associative classification approach
- Thabtah, Cowling, et al.
(Show Context)
Citation Context ...-label learning has also been found useful in many other tasks such as scene classification [11], image and video annotation [38,48], bioinformatics [7,12,13,21, 27], and even association rule mining =-=[50,63]-=-. There is a lot of research on multi-instance learning, which studies the problem where a real-world object described by a number of instances is associated with a single class label. Here the traini... |

44 | Hierarchical classification: Combining bayes with svm
- Cesa-bianchi, Zaniboni
- 2006
(Show Context)
Citation Context ...-label Svms have been developed by Boutell et al. [11] and Godbole and Sarawagi [33]. In particular, by hierarchically approximating the Bayes optimal classifier for the H-loss, 5Cesa-Bianchi et al. =-=[15]-=- proposed an algorithm which outperforms simple hierarchical Svms. Recently, non-negative matrix factorization has also been applied to multi-label learning [43], and multi-label dimensionality reduct... |

44 |
Pac learning axis-aligned rectangles with respect to product distributions from multiple-instance examples
- Long, Tan
- 1998
(Show Context)
Citation Context ...ining bags are labeled, the labels of their instances are unknown. This learning framework was formalized by Dietterich et al. [24] when they were investigating drug activity prediction. Long and Tan =-=[44]-=- studied the Pac-learnability of multi-instance learning and showed that if the instances in the bags are independently drawn from product distribution, the Apr (Axis-Parallel Rectangle) proposed by D... |

43 | Maximal margin labeling for multi-topic text categorization
- Kazawa, Izumitani, et al.
- 2005
(Show Context)
Citation Context ...rm it into a label ranking problem [27,56], while some later approaches try to exploit the correlation between the labels [43,65,85]. Most studies on multi-label learning focus on text categorization =-=[22,33,39,47,56, 65,74]-=-, and several studies aim to improve the performance of text categorization systems by exploiting additional information given by the hierarchical structure of classes [14,15,53] or unlabeled data [43... |

40 | Multiple-instance pruning for learning efficient cascade detectors
- Zhang, Viola
- 2007
(Show Context)
Citation Context ...plied to diverse applications including image categorization [17,18], image retrieval [71,84], text categorization [3,60], web mining [86], spam detection [37], computer security [54], face detection =-=[66,76]-=-, computer-aided medical diagnosis [31], etc. 3 The MIML Framework Let X denote the instance space and Y the set of class labels. Then, formally, the MIML task is defined as: 8(a) Traditional supervi... |

40 | Large scale max-margin multi-label classification with priors
- Hariharan, Zelnik-Manor, et al.
- 2010
(Show Context)
Citation Context ... such as the boosting algorithm AdaBoost.MH (Schapire and Singer, 2000), neural network algorithm BP-MIL (Zhang and Zhou, 2006), SVM-style algorithms (Elisseeff and Weston, 2002, Taskar et al., 2004, =-=Hariharan et al., 2010-=-), etc., in essence, try to optimize some surrogate losses such as the exponential loss and hinge loss. There are many definitions of consistency, e.g., the Fisher consistency (Lin, 2002), infinite-sa... |

37 | Learning hierarchical multi-category text classifcation models
- Rousu, Saunders, et al.
- 2005
(Show Context)
Citation Context ...rization [22,33,39,47,56, 65,74], and several studies aim to improve the performance of text categorization systems by exploiting additional information given by the hierarchical structure of classes =-=[14,15,53]-=- or unlabeled data [43]. In addition to text categorization, multi-label learning has also been found useful in many other tasks such as scene classification [11], image and video annotation [38,48], ... |

35 | On the relation between multi-instance learning and semi-supervised learning
- Zhou, Xu
- 2007
(Show Context)
Citation Context ...eural network algorithms Bp-mip and extensions [77,90] and Rbf-mip [78], rule learning algorithm Ripper-mi [20], support vector machines and kernel methods mi-Svm and Mi-Svm [3], Dd-Svm [18], MissSvm =-=[88]-=-, Mi-Kernel[32], Bag-Instance Kernel[19], Marginalized Mi-Kernel[42] andconvex-hull methodCh-Fd[31],ensemble algorithmsMi-Ensemble[91],MiBoosting [70] and MilBoosting [6], logistic regression algorith... |

34 | A regularization framework for multiple-instance learning
- Cheung, Kwok
(Show Context)
Citation Context ...nsions [77,90] and Rbf-mip [78], rule learning algorithm Ripper-mi [20], support vector machines and kernel methods mi-Svm and Mi-Svm [3], Dd-Svm [18], MissSvm [88], Mi-Kernel[32], Bag-Instance Kernel=-=[19]-=-, Marginalized Mi-Kernel[42] andconvex-hull methodCh-Fd[31],ensemble algorithmsMi-Ensemble[91],MiBoosting [70] and MilBoosting [6], logistic regression algorithm Mi-lr [51], etc. Actually almost all p... |

34 |
Learning single and multiple instance decision trees for computer security applications
- Ruffo
(Show Context)
Citation Context ...rithms have been developed during the past decade. To name a few, Diverse Density [45] and Em-dd [83], k-nearest neighbor algorithms Citation-knn and Bayesian-knn [67], decision tree algorithms Relic =-=[54]-=- and Miti [9], neural network algorithms Bp-mip and extensions [77,90] and Rbf-mip [78], rule learning algorithm Ripper-mi [20], support vector machines and kernel methods mi-Svm and Mi-Svm [3], Dd-Sv... |

34 | Multilabel Dimensionality Reduction via Dependence Maximization
- Zhang, Zhou
(Show Context)
Citation Context ...performs simple hierarchical Svms. Recently, non-negative matrix factorization has also been applied to multi-label learning [43], and multi-label dimensionality reduction methods have been developed =-=[74,85]-=-. Roughly speaking, earlier approaches to multi-label learning attempt to divide multi-label learning to a number of two-class classification problems [36,72] or transform it into a label ranking prob... |

31 | A unified model for multilabel classification and ranking
- Brinker, Fürnkranz, et al.
- 2006
(Show Context)
Citation Context ...ata [43]. In addition to text categorization, multi-label learning has also been found useful in many other tasks such as scene classification [11], image and video annotation [38,48], bioinformatics =-=[7,12,13,21, 27]-=-, and even association rule mining [50,63]. There is a lot of research on multi-instance learning, which studies the problem where a real-world object described by a number of instances is associated ... |

31 |
Multiple-instance learning via embedded instance selection
- MILES
(Show Context)
Citation Context ...tions have been introduced [29]. For example, in contrast to assuming that there is a key instance, some work has assumed that there is no key instance and every instance contributes to the bag label =-=[17,70]-=-. There is also an argument that the instances in the bags should not be treated independently [88]. All those assumptions have been put under the umbrella of multi-instance learning, and generally, i... |

31 | A framework for learning rules from multiple instance data
- Chevaleyre, Zucker
- 2001
(Show Context)
Citation Context ...ithms Citation-knn and Bayesian-knn [67], decision tree algorithms Relic [54] and Miti [9], neural network algorithms Bp-mip and extensions [77,90] and Rbf-mip [78], rule learning algorithm Ripper-mi =-=[20]-=-, support vector machines and kernel methods mi-Svm and Mi-Svm [3], Dd-Svm [18], MissSvm [88], Mi-Kernel[32], Bag-Instance Kernel[19], Marginalized Mi-Kernel[42] andconvex-hull methodCh-Fd[31],ensembl... |

31 | On the consistency of ranking algorithms
- Duchi, Mackey, et al.
- 2010
(Show Context)
Citation Context ... the expected risk of a learned function converges to the Bayes risk as the training sample size increases (Lin, 2002, Zhang, 2004b, Steinwart, 2005, Bartlett et al., 2006, Tewari and Bartlett, 2007, =-=Duchi et al., 2010-=-). Nowadays, it is well-accepted that a good learner should at least be consistent with large samples. It is noteworthy that, though many efforts have been devoted to multi-label learning, few theoret... |

30 | On generalized multiple-instance learning
- Scott, Zhang, et al.
- 2003
(Show Context)
Citation Context ...ssification with discrete-valued outputs. Later, multi-instance regression with real-valued outputs was studied [2,52], and different versions of generalized multi-instance learning have been defined =-=[58,68]-=-. The main difference between standard multi-instance learning and generalized multi-instance learning is that in standard multi-instance learning there is a single concept, and a bag is positive if i... |

29 | On multi-class cost-sensitive learning
- Zhou, Liu
- 2010
(Show Context)
Citation Context ..., the imbalance rate of y is: m∑ ni 1 m∑ ni ibr(y) = × m∑ = |Yi| n×|Yi| . i=1 y∈Yi ni i=1 i=1 y∈Yi Therearemanyclass-imbalancelearningmethods[69].Oneofthemostpopularand effective methods is rescaling =-=[87]-=-, which can be incorporated into our framework easily. In short, after obtaining the estimated imbalance rate for every class label, we can use these rates to modulate the loss caused by different mis... |

29 | Reverse multi-label learning
- Petterson, Caetano
- 2010
(Show Context)
Citation Context ...ears and many effective approaches have been developed (Schapire and Singer, 2000, Elisseeff and Weston, 2002, Zhou and Zhang, 2007, Zhang and Zhou, 2007, Hsu et al., 2009, Dembczyński et al., 2010, =-=Petterson and Caetano, 2010-=-). The consistency (also called Bayes consistency) of learning algorithms concerns whether the expected risk of a learned function converges to the Bayes risk as the training sample size increases (Li... |

28 | Multiple instance learning for computer aided diagnosis
- Fung, Dundar, et al.
- 2007
(Show Context)
Citation Context ...pper-mi [20], support vector machines and kernel methods mi-Svm and Mi-Svm [3], Dd-Svm [18], MissSvm [88], Mi-Kernel[32], Bag-Instance Kernel[19], Marginalized Mi-Kernel[42] andconvex-hull methodCh-Fd=-=[31]-=-,ensemble algorithmsMi-Ensemble[91],MiBoosting [70] and MilBoosting [6], logistic regression algorithm Mi-lr [51], etc. Actually almost all popular machine learning algorithms have their multiinstance... |

26 | Multiple-instance learning of real-valued data
- Amar, Dooly, et al.
- 2001
(Show Context)
Citation Context ...ears of the research of multi-instance learning, most work considered multi-instance classification with discrete-valued outputs. Later, multi-instance regression with real-valued outputs was studied =-=[2,52]-=-, and different versions of generalized multi-instance learning have been defined [58,68]. The main difference between standard multi-instance learning and generalized multi-instance learning is that ... |

26 | Ensembles of multiinstance learners
- Zhou, Zhang
(Show Context)
Citation Context ...es and kernel methods mi-Svm and Mi-Svm [3], Dd-Svm [18], MissSvm [88], Mi-Kernel[32], Bag-Instance Kernel[19], Marginalized Mi-Kernel[42] andconvex-hull methodCh-Fd[31],ensemble algorithmsMi-Ensemble=-=[91]-=-,MiBoosting [70] and MilBoosting [6], logistic regression algorithm Mi-lr [51], etc. Actually almost all popular machine learning algorithms have their multiinstance versions. Most algorithms attempt ... |

24 | Multi-instance clustering with applications to multi-instance prediction
- Zhang, Zhou
- 2009
(Show Context)
Citation Context ...ing [58,68] there are multiple concepts, and a bag is positive only when all concepts aresatisfied (i.e., thebag contains instances fromevery concept). Recently, research on multi-instance clustering =-=[82]-=-, multi-instance semi-supervised learning [49] and multi-instance active learning [60] have also been reported. Multi-instance learning has also attracted the attention of the Ilp community. It has be... |

21 |
Learning about individuals from group statistics
- KÜCK, FREITAS
- 2005
(Show Context)
Citation Context ... in MIML problems. We can roughly estimate the imbalance rate, which is the ratio of the number of positive instances to that of negative instances, for each class label using the strategy adopted by =-=[41]-=-. In detail, for a specific label y ∈ Y, we can divide the training bags {(X1,Y1),(X2,Y2),··· ,(Xm,Ym)} into two subsets, A1 = {(Xi,Yi)|y ∈ Yi} and A2 = {(Xi,Yi)|y /∈ Yi}. It is obvious that all the i... |

20 |
A boosting approach to multiple instance learning
- Auer, Ortner
- 2004
(Show Context)
Citation Context ...vm [3], Dd-Svm [18], MissSvm [88], Mi-Kernel[32], Bag-Instance Kernel[19], Marginalized Mi-Kernel[42] andconvex-hull methodCh-Fd[31],ensemble algorithmsMi-Ensemble[91],MiBoosting [70] and MilBoosting =-=[6]-=-, logistic regression algorithm Mi-lr [51], etc. Actually almost all popular machine learning algorithms have their multiinstance versions. Most algorithms attempt to adapt single-instance supervised ... |

19 | Case-based multilabel ranking
- Brinker, Hüllermeier
- 2007
(Show Context)
Citation Context ...ata [43]. In addition to text categorization, multi-label learning has also been found useful in many other tasks such as scene classification [11], image and video annotation [38,48], bioinformatics =-=[7,12,13,21, 27]-=-, and even association rule mining [50,63]. There is a lot of research on multi-instance learning, which studies the problem where a real-world object described by a number of instances is associated ... |

19 | Multi-instance learning based web mining
- Zhou, Jiang, et al.
- 2005
(Show Context)
Citation Context ... learning. Multi-instance learning techniques have already been applied to diverse applications including image categorization [17,18], image retrieval [71,84], text categorization [3,60], web mining =-=[86]-=-, spam detection [37], computer security [54], face detection [66,76], computer-aided medical diagnosis [31], etc. 3 The MIML Framework Let X denote the instance space and Y the set of class labels. T... |

19 |
Solving multi-instance problems with classifier ensemble based on constructive clustering
- Zhou, Zhang
(Show Context)
Citation Context ...iscrimination on instances to discrimination on bags [91]. Recently there is some proposal on adapting the multi-instance representation to single-instance algorithms by representation transformation =-=[93]-=-. It is worth mentioning that standard multi-instance learning [24] assumes that if a bag contains a positive instance then the bag is positive; this implies that there exists a key instance in a posi... |

18 | Multi-label learning by instance differentiation
- Zhang, Zhou
- 2007
(Show Context)
Citation Context ... MimlSvm, there are also many alternatives to realize the second step. For example, by using mi-Svm [3] to replace the MiBoosting used in MimlBoost and by using the two-layer neural network structure =-=[81]-=- to replace the MlSvm used in MimlSvm, we get MimlSvmmi and MimlNn respectively. Their performance is also evaluated in our experiments. We compare the MIML algorithms with several state-of-the-art al... |

18 |
S.: Em-dd: An improved multi-instance learning technique
- Zhang, Goldman
(Show Context)
Citation Context ...ller sample complexity than that of the algorithm of Auer et al. [5]. Many multi-instance learning algorithms have been developed during the past decade. To name a few, Diverse Density [45] and Em-dd =-=[83]-=-, k-nearest neighbor algorithms Citation-knn and Bayesian-knn [67], decision tree algorithms Relic [54] and Miti [9], neural network algorithms Bp-mip and extensions [77,90] and Rbf-mip [78], rule lea... |

17 | A.: Multiinstance tree learning
- Blockeel, Page, et al.
(Show Context)
Citation Context ...en developed during the past decade. To name a few, Diverse Density [45] and Em-dd [83], k-nearest neighbor algorithms Citation-knn and Bayesian-knn [67], decision tree algorithms Relic [54] and Miti =-=[9]-=-, neural network algorithms Bp-mip and extensions [77,90] and Rbf-mip [78], rule learning algorithm Ripper-mi [20], support vector machines and kernel methods mi-Svm and Mi-Svm [3], Dd-Svm [18], MissS... |

15 | Marginalized multiinstance kernels
- Kwok, Cheung
(Show Context)
Citation Context ...78], rule learning algorithm Ripper-mi [20], support vector machines and kernel methods mi-Svm and Mi-Svm [3], Dd-Svm [18], MissSvm [88], Mi-Kernel[32], Bag-Instance Kernel[19], Marginalized Mi-Kernel=-=[42]-=- andconvex-hull methodCh-Fd[31],ensemble algorithmsMi-Ensemble[91],MiBoosting [70] and MilBoosting [6], logistic regression algorithm Mi-lr [51], etc. Actually almost all popular machine learning algo... |

14 | Neural networks for multiinstance learning
- Zhou, Zhang
- 2002
(Show Context)
Citation Context ...iverse Density [45] and Em-dd [83], k-nearest neighbor algorithms Citation-knn and Bayesian-knn [67], decision tree algorithms Relic [54] and Miti [9], neural network algorithms Bp-mip and extensions =-=[77,90]-=- and Rbf-mip [78], rule learning algorithm Ripper-mi [20], support vector machines and kernel methods mi-Svm and Mi-Svm [3], Dd-Svm [18], MissSvm [88], Mi-Kernel[32], Bag-Instance Kernel[19], Marginal... |

13 | A review of multi-instance learning assumptions
- Foulds, Frank
(Show Context)
Citation Context ... instance; manySvmalgorithmsdefined themarginofapositivebagbythemarginofitsmost positiveinstance[3,19].Astheresearchofmulti-instancelearninggoeson,however, some other assumptions have been introduced =-=[29]-=-. For example, in contrast to assuming that there is a key instance, some work has assumed that there is no key instance and every instance contributes to the bag label [17,70]. There is also an argum... |

11 | A multiple instance learning strategy for combating good word attacks on spam filters
- Jorgensen, Zhou, et al.
- 2008
(Show Context)
Citation Context ...ance learning techniques have already been applied to diverse applications including image categorization [17,18], image retrieval [71,84], text categorization [3,60], web mining [86], spam detection =-=[37]-=-, computer security [54], face detection [66,76], computer-aided medical diagnosis [31], etc. 3 The MIML Framework Let X denote the instance space and Y the set of class labels. Then, formally, the MI... |

11 | Improve multi-instance neural networks through feature selection
- Zhang, Zhou
- 2004
(Show Context)
Citation Context ...iverse Density [45] and Em-dd [83], k-nearest neighbor algorithms Citation-knn and Bayesian-knn [67], decision tree algorithms Relic [54] and Miti [9], neural network algorithms Bp-mip and extensions =-=[77,90]-=- and Rbf-mip [78], rule learning algorithm Ripper-mi [20], support vector machines and kernel methods mi-Svm and Mi-Svm [3], Dd-Svm [18], MissSvm [88], Mi-Kernel[32], Bag-Instance Kernel[19], Marginal... |

10 |
Mining with rarity: problems and solutions: a unifying framework
- Weiss
- 2004
(Show Context)
Citation Context ... 1 |Yi| where |Yi| returns the number of labels in Yi. Thus, the imbalance rate of y is: m∑ ni 1 m∑ ni ibr(y) = × m∑ = |Yi| n×|Yi| . i=1 y∈Yi ni i=1 i=1 y∈Yi Therearemanyclass-imbalancelearningmethods=-=[69]-=-.Oneofthemostpopularand effective methods is rescaling [87], which can be incorporated into our framework easily. In short, after obtaining the estimated imbalance rate for every class label, we can u... |

10 | Adapting rbf neural networks to multi-instance learning
- Zhang, Zhou
- 2006
(Show Context)
Citation Context ...and Em-dd [83], k-nearest neighbor algorithms Citation-knn and Bayesian-knn [67], decision tree algorithms Relic [54] and Miti [9], neural network algorithms Bp-mip and extensions [77,90] and Rbf-mip =-=[78]-=-, rule learning algorithm Ripper-mi [20], support vector machines and kernel methods mi-Svm and Mi-Svm [3], Dd-Svm [18], MissSvm [88], Mi-Kernel[32], Bag-Instance Kernel[19], Marginalized Mi-Kernel[42... |

7 | Filtering multi-instance problems to reduce dimensionality in relational learning
- Alphonse, Matwin
(Show Context)
Citation Context ...ulti-instance paradigmcouldbethekey between the propositional and relational representations, being more expressive than the former, and much easier to learn than the latter [23]. Alphonse and Matwin =-=[1]-=- approximated a relational learning problem by a multi-instance problem, fed the resulting data to feature selection techniques adapted from propositional representations, and then transformed thefilt... |

6 |
Solving the multi-instance problem: a lazy learning approach
- Wang, Zucker
- 2000
(Show Context)
Citation Context ...5]. Many multi-instance learning algorithms have been developed during the past decade. To name a few, Diverse Density [45] and Em-dd [83], k-nearest neighbor algorithms Citation-knn and Bayesian-knn =-=[67]-=-, decision tree algorithms Relic [54] and Miti [9], neural network algorithms Bp-mip and extensions [77,90] and Rbf-mip [78], rule learning algorithm Ripper-mi [20], support vector machines and kernel... |

2 |
optimization algorithm for solving the trust-region subproblem
- C
- 1998
(Show Context)
Citation Context ...he most standard techniques to solve such kind of non-convex optimization problems. Cccp is guaranteed to converge to a local minimum [75], and in many cases it can even converge to a global solution =-=[25]-=-. Here, for solving the optimization problem 21, Cccp works by solving a sequential convexquadraticproblems.Concretely,giventheinitialsubgradient ∑ ni j=1ρijtk ′ I(xij) αt of maxj=1,···,ni k′ I(xij) α... |

1 |
A kernel methodfor multi-labelled classification
- Elisseeff, Weston
- 2002
(Show Context)
Citation Context ...0] also proposed the Ml-knn algorithm, which identifies the k nearest neighbors of the concerned instance and then assigns labels according to the maximum a posteriori principle. Elisseeff and Weston =-=[27]-=- proposed the RankSvm algorithm for multi-label learning by defining a specific cost function and the corresponding margin for multi-label models. Other kinds of multi-label Svms have been developed b... |

1 |
MISSL:Multiple-instance semi-supervisedlearning
- Goldman
- 2006
(Show Context)
Citation Context ... bag is positive only when all concepts aresatisfied (i.e., thebag contains instances fromevery concept). Recently, research on multi-instance clustering [82], multi-instance semi-supervised learning =-=[49]-=- and multi-instance active learning [60] have also been reported. Multi-instance learning has also attracted the attention of the Ilp community. It has been suggested that multi-instance problems coul... |

1 |
Multi-label associative classification ofmedical documents from medline
- Reformat
- 2005
(Show Context)
Citation Context ...-label learning has also been found useful in many other tasks such as scene classification [11], image and video annotation [38,48], bioinformatics [7,12,13,21, 27], and even association rule mining =-=[50,63]-=-. There is a lot of research on multi-instance learning, which studies the problem where a real-world object described by a number of instances is associated with a single class label. Here the traini... |