Results 1 - 10
of
82
Cost-sensitive boosting for classification of imbalanced data
, 2007
"... Classification of data with imbalanced class distribution has posed a significant drawback of the performance attainable by most standard classifier learning algorithms, which assume a relatively balanced class distribution and equal misclassification costs. The significant difficulty and frequent o ..."
Abstract
-
Cited by 77 (1 self)
- Add to MetaCart
Classification of data with imbalanced class distribution has posed a significant drawback of the performance attainable by most standard classifier learning algorithms, which assume a relatively balanced class distribution and equal misclassification costs. The significant difficulty and frequent occurrence of the class imbalance problem indicate the need for extra research efforts. The objective of this paper is to investigate meta-techniques applicable to most classifier learning algorithms, with the aim to advance the classification of imbalanced data. The AdaBoost algorithm is reported as a successful meta-technique for improving classification accuracy. The insight gained from a comprehensive analysis of the AdaBoost algorithm in terms of its advantages and shortcomings in tacking the class imbalance problem leads to the exploration of three cost-sensitive boosting algorithms, which are developed by introducing cost items into the learning framework of AdaBoost. Further analysis shows that one of the proposed algorithms tallies with the stagewise additive modelling in statistics to minimize the cost exponential loss. These boosting algorithms are also studied with respect to their weighting strategies towards different types of samples, and their effectiveness in identifying rare cases through experiments on several real world medical data sets, where the class imbalance problem prevails.
L.C.D.: Ensemble methods for spoken emotion recognition in call-centers. Speech communication 49,
, 2007
"... Abstract Machine-based emotional intelligence is a requirement for more natural interaction between humans and computer interfaces and a basic level of accurate emotion perception is needed for computer systems to respond adequately to human emotion. Humans convey emotional information both intenti ..."
Abstract
-
Cited by 26 (0 self)
- Add to MetaCart
(Show Context)
Abstract Machine-based emotional intelligence is a requirement for more natural interaction between humans and computer interfaces and a basic level of accurate emotion perception is needed for computer systems to respond adequately to human emotion. Humans convey emotional information both intentionally and unintentionally via speech patterns. These vocal patterns are perceived and understood by listeners during conversation. This research aims to improve the automatic perception of vocal emotion in two ways. First, we compare two emotional speech data sources: natural, spontaneous emotional speech and acted or portrayed emotional speech. This comparison demonstrates the advantages and disadvantages of both acquisition methods and how these methods affect the end application of vocal emotion recognition. Second, we look at two classification methods which have not been applied in this field: stacked generalisation and unweighted vote. We show how these techniques can yield an improvement over traditional classification methods.
Exploring Interactions in High-Dimensional Genomic Data: An Overview of Logic Regression, with Applications
- Journal of Multivariate Analysis
, 2004
"... ..."
(Show Context)
A systematic prediction of multiple drug-target interactions from chemical, genomic, and pharmacological data. PLoS One 7: e37608. doi
- 10.1371/ journal.pone.0037608 PMID: 22666371
, 2012
"... In silico prediction of drug-target interactions from heterogeneous biological data can advance our system-level search for drug molecules and therapeutic targets, which efforts have not yet reached full fruition. In this work, we report a systematic approach that efficiently integrates the chemical ..."
Abstract
-
Cited by 12 (2 self)
- Add to MetaCart
(Show Context)
In silico prediction of drug-target interactions from heterogeneous biological data can advance our system-level search for drug molecules and therapeutic targets, which efforts have not yet reached full fruition. In this work, we report a systematic approach that efficiently integrates the chemical, genomic, and pharmacological information for drug targeting and discovery on a large scale, based on two powerful methods of Random Forest (RF) and Support Vector Machine (SVM). The performance of the derived models was evaluated and verified with internally five-fold cross-validation and four external independent validations. The optimal models show impressive performance of prediction for drug-target interactions, with a concordance of 82.83%, a sensitivity of 81.33%, and a specificity of 93.62%, respectively. The consistence of the performances of the RF and SVM models demonstrates the reliability and robustness of the obtained models. In addition, the validated models were employed to systematically predict known/unknown drugs and targets involving the enzymes, ion channels, GPCRs, and nuclear receptors, which can be further mapped to functional ontologies such as target-disease associations and target-target interaction networks. This approach is expected to help fill the existing gap between
Opcode sequences as representation of executables for data-mining-based unknown malware detection
- INFORMATION SCIENCES 227
, 2013
"... Malware can be defined as any type of malicious code that has the potential to harm a computer or network. The volume of malware is growing faster every year and poses a serious global security threat. Consequently, malware detection has become a critical topic in computer security. Currently, signa ..."
Abstract
-
Cited by 12 (0 self)
- Add to MetaCart
(Show Context)
Malware can be defined as any type of malicious code that has the potential to harm a computer or network. The volume of malware is growing faster every year and poses a serious global security threat. Consequently, malware detection has become a critical topic in computer security. Currently, signature-based detection is the most widespread method used in commercial antivirus. In spite of the broad use of this method, it can detect malware only after the malicious executable has already caused damage and provided the malware is adequately documented. Therefore, the signature-based method consistently fails to detect new malware. In this paper, we propose a new method to detect unknown malware families. This model is based on the frequency of the appearance of opcode sequences. Furthermore, we describe a technique to mine the relevance of each opcode and assess the frequency of each opcode sequence. In addition, we provide empirical validation that this new method is capable of detecting unknown malware.
A growth model for rna secondary structures
- Journal of Statistical Mechanics: Theory and Experiment
"... Abstract. A hierarchical model for the growth of planar arch structures for RNA secondary structures is presented, and shown to be equivalent to a tree-growth model. Both models can be solved analytically, giving access to scaling functions for large molecules, and corrections to scaling, checked by ..."
Abstract
-
Cited by 11 (2 self)
- Add to MetaCart
(Show Context)
Abstract. A hierarchical model for the growth of planar arch structures for RNA secondary structures is presented, and shown to be equivalent to a tree-growth model. Both models can be solved analytically, giving access to scaling functions for large molecules, and corrections to scaling, checked by numerical simulations of up to 6500 bases. The equivalence of both models should be helpful in understanding more general tree-growth processes. PACS numbers: 87.14.gn, 87.15.bd, 02.10.Ox, 02.50.EyA growth model for RNA secondary structures 2 1.
Prediction of protein-protein interaction sites by random forest algorithmwithmRMR and IFS,”
- Article ID e43927,
, 2012
"... Protein-protein interaction sites are the basis of biomolecule interactions, which are widely used in drug target identification and new drug discovery. Traditional site predictors of protein-protein interaction mostly based on unbalanced datasets, the classification results tend to negative class, ..."
Abstract
-
Cited by 8 (4 self)
- Add to MetaCart
(Show Context)
Protein-protein interaction sites are the basis of biomolecule interactions, which are widely used in drug target identification and new drug discovery. Traditional site predictors of protein-protein interaction mostly based on unbalanced datasets, the classification results tend to negative class, resulting in a lower predictive accuracy for positive class. A method called RBFIS (radial basis function improved by SMOTE) is presented in the paper to address the problem. The intelligent algorithm SMOTE is used to artificially synthesize the imbalanced datasets of negative sample classes. Simultaneously, KNN algorithm is utilized to interpolate values between the minority class samples to generate new samples, making the sample data tend to balance as much as possible. Then, RBF classifier is used to construct the site predictor of protein-protein interaction based on the processed quasi-equilibrium sample sets. The results of experiments indicated that the method had an improvement on recall and f -measure of positive class compared with traditional methods by 12% and 25%. Moreover, many rounds of experiments were performed for different combinations of features. It was observed that the key combination of different multiple features can better efficiently improve the prediction performance. In conclusion, the studies we have performed show that the proposed method is better for dealing with the imbalanced protein interaction sites.
M.V.: A time series forest for classification and feature extraction. Information Sciences 239(pp
, 2013
"... ar ..."
(Show Context)
Privileged information-based conditional regression forests for facial feature detection
- in IEEE Conf. Automatic Face and Gesture Recognition
, 2013
"... Abstract—In this paper we propose a method that utilises privileged information, that is information that is available only at the training phase, in order to train Regression Forests for facial feature detection. Our method chooses the split functions at some randomly chose internal tree nodes acco ..."
Abstract
-
Cited by 7 (4 self)
- Add to MetaCart
(Show Context)
Abstract—In this paper we propose a method that utilises privileged information, that is information that is available only at the training phase, in order to train Regression Forests for facial feature detection. Our method chooses the split functions at some randomly chose internal tree nodes according to the information gain calculated from the privileged information, such as head pose or gender. In this way the training patches arrive at leaves that tend to have low variance both in displacements to facial points and in privileged information. At each leaf node, we learn both the probability of the privileged information and regression models conditioned on it. During testing, the marginal probability of privileged information is estimated and the facial feature locations are localised using the appropriate conditional regression models. The proposed model is validated by comparing with very recent methods on two challenging datasets, namely Labelled Faces in the Wild and Labelled Face Parts in the Wild. I.