Results 1 - 10
of
17
A review of explanation and explanation in case-based reasoning
- Department of computer Science. Trinity
, 2003
"... ..."
Applying machine learning to Chinese temporal relation resolution
- Proceedings of the 42nd Annual Meeting of the Association for Computational Linguistics
, 2004
"... Temporal relation resolution involves extraction of temporal information explicitly or implicitly embedded in a language. This information is often inferred from a variety of interactive grammatical and lexical cues, especially in Chinese. For this purpose, inter-clause relations (temporal or otherw ..."
Abstract
-
Cited by 15 (0 self)
- Add to MetaCart
Temporal relation resolution involves extraction of temporal information explicitly or implicitly embedded in a language. This information is often inferred from a variety of interactive grammatical and lexical cues, especially in Chinese. For this purpose, inter-clause relations (temporal or otherwise) in a multiple-clause sentence play an important role. In this paper, a computational model based on machine learning and heterogeneous collaborative bootstrapping is proposed for analyzing temporal relations in a Chinese multiple-clause sentence. The model makes use of the fact that events are represented in different temporal structures. It takes into account the effects of linguistic features such as tense/aspect, temporal connectives, and discourse structures. A set of experiments has been conducted to investigate how linguistic features could affect temporal relation resolution. 1
Opcode sequences as representation of executables for data-mining-based unknown malware detection
- INFORMATION SCIENCES 227
, 2013
"... Malware can be defined as any type of malicious code that has the potential to harm a computer or network. The volume of malware is growing faster every year and poses a serious global security threat. Consequently, malware detection has become a critical topic in computer security. Currently, signa ..."
Abstract
-
Cited by 12 (0 self)
- Add to MetaCart
(Show Context)
Malware can be defined as any type of malicious code that has the potential to harm a computer or network. The volume of malware is growing faster every year and poses a serious global security threat. Consequently, malware detection has become a critical topic in computer security. Currently, signature-based detection is the most widespread method used in commercial antivirus. In spite of the broad use of this method, it can detect malware only after the malicious executable has already caused damage and provided the malware is adequately documented. Therefore, the signature-based method consistently fails to detect new malware. In this paper, we propose a new method to detect unknown malware families. This model is based on the frequency of the appearance of opcode sequences. Furthermore, we describe a technique to mine the relevance of each opcode and assess the frequency of each opcode sequence. In addition, we provide empirical validation that this new method is capable of detecting unknown malware.
Improving the Performance Stability of Inductive Expert Systems Under Input Noise
- Information Systems Research
, 1995
"... Inductive expert systems typically operate with imperfect or noisy input attributes. We study design differences in inductive expert systems arising from implicit versus explicit handling of input noise. Most previous approaches use an implicit approach wherein inductive expert systems are construct ..."
Abstract
-
Cited by 6 (2 self)
- Add to MetaCart
(Show Context)
Inductive expert systems typically operate with imperfect or noisy input attributes. We study design differences in inductive expert systems arising from implicit versus explicit handling of input noise. Most previous approaches use an implicit approach wherein inductive expert systems are constructed using input data of quality comparable to problems the system will be called upon to solve. We develop an explicit algorithm (ID3ecp) that uses a clean (without input errors) training set and an explicit measure of the input noise level and compare it to a traditional implicit algorithm, ID3p (the ID3 algorithm with the pessimistic pruning procedure). The novel feature of the explicit algorithm is that it injects noise in a controlled rather than random manner in order to reduce the performance variance due to noise. We show analytically that the implicit algorithm has the same expected partitioning behavior as the explicit algorithm. In contrast, however, the partitioning behavior of the explicit algorithm is shown to be more stable (i.e., lower variance) than the implicit algorithm. To extend the analysis to the predictive performance of the algorithms, a set of simulation experiments is described in which the average performance and coefficient of variation of performance of both algorithms are studied on real and artificial data sets. The experimental results confirm the analytical results and demonstrate substantial differences in stability of performance between the algorithms especially as the noise level increases. 1.
Cluster-based algorithms for filling missing values
- 6th Pacific-Asia Conf. Knowledge Discovery and Data Mining, Lecture Notes in Artificial Intelligence 2336
, 2002
"... Abstract. We first survey existing methods to deal with missing val-ues and report the results of an experimental comparative evaluation in terms of their processing cost and quality of imputing missing val-ues. We then propose three cluster-based mean-and-mode algorithms to impute missing values. E ..."
Abstract
-
Cited by 5 (1 self)
- Add to MetaCart
(Show Context)
Abstract. We first survey existing methods to deal with missing val-ues and report the results of an experimental comparative evaluation in terms of their processing cost and quality of imputing missing val-ues. We then propose three cluster-based mean-and-mode algorithms to impute missing values. Experimental results show that these algorithms with linear complexity can achieve comparative quality as sophisticated algorithms and therefore are applicable to large datasets. 1
Enzyme function prediction with interpretable models In Computational Systems Biology, Humana Press. (R. Samudrala, J. McDermott, R. Bumgarner, Editors)
"... Enzymes play central roles in metabolic pathways and the prediction of metabolic pathways in newly sequenced genomes usually starts with the assignment of genes to enzymatic reactions. However, genes with similar catalytic activity are not necessarily similar in sequence, and therefore the tradition ..."
Abstract
-
Cited by 4 (0 self)
- Add to MetaCart
Enzymes play central roles in metabolic pathways and the prediction of metabolic pathways in newly sequenced genomes usually starts with the assignment of genes to enzymatic reactions. However, genes with similar catalytic activity are not necessarily similar in sequence, and therefore the traditional sequence similarity-based approach often fails to identify the relevant enzymes, thus hindering efforts to map the metabolome of an organism. Here we study the direct relationship between basic protein properties and their function. Our goal is to develop a new tool for functional prediction (e.g. prediction of Enzyme Commission number) that can be used to complement and support other techniques based on sequence or structure information. In order to define this mapping we collected a set of 453 features and properties that characterize proteins and are believed to be related to structural and functional aspects of proteins. We introduce a mixture model of stochastic decision trees to learn the set of potentially complex relationships between features and function. To study these correlations, trees are created and tested on the Pfam classification of proteins, which is based on sequence, and the EC classification, which is based on enzymatic function. The model is very effective in learning highly diverged protein families or families that are not defined on the basis of sequence. The resulting tree structures highlight the properties that are strongly correlated with structural and functional aspects of protein families, and can be used to suggest a concise definition of a protein family.
A counter example to the stronger version of the binary tree hypothesis
- IN `ECML-95 WORKSHOP ON STATISTICS, MACHINE LEARNING, AND KNOWLEDGE DISCOVERY IN DATABASES
, 1995
"... ..."
(Show Context)
Automatic Prominence Classification in Swedish
"... This study aims at automatically classifying levels of acoustic prominence on a dataset of 200 Swedish sentences of read speech by one male native speaker. Each word in the sentences was categorized by four speech experts into one of three groups depending on the level of prominence perceived. Six a ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
(Show Context)
This study aims at automatically classifying levels of acoustic prominence on a dataset of 200 Swedish sentences of read speech by one male native speaker. Each word in the sentences was categorized by four speech experts into one of three groups depending on the level of prominence perceived. Six acoustic features at a syllable level and seven features at a word level were used. Two machine learning algorithms, namely Support Vector Machines (SVM) and memory based Learning (MBL) were trained to classify the sentences into their respective classes. The MBL gave an average word level accuracy of 69.08 % and the SVM gave an average accuracy of 65.17 % on the test set. These values were comparable with the average accuracy of the human annotators with respect to the average annotations. In this study, word duration was found to be the most important feature required for classifying prominence in Swedish read speech. Index Terms: Swedish prominence, SVM, MBL, syllable and word level features, word duration 1.
Mining Massive Archives of Mice Sounds with Symbolized Representations
"... Many animals produce long sequences of vocalizations best described as “songs. ” In some animals, such as crickets and frogs, these songs are relatively simple and repetitive chirps or trills. However, animals as diverse as whales, bats, birds and even the humble mice considered here produce intrica ..."
Abstract
-
Cited by 2 (2 self)
- Add to MetaCart
(Show Context)
Many animals produce long sequences of vocalizations best described as “songs. ” In some animals, such as crickets and frogs, these songs are relatively simple and repetitive chirps or trills. However, animals as diverse as whales, bats, birds and even the humble mice considered here produce intricate and complex songs. These songs are worthy of study in their own right. For example, the study of bird songs has helped to cast light on various questions in the nature vs. nurture debate. However, there is a particular reason why the study of mice songs can benefit mankind. The house mouse (Mus musculus) has long been an important model organism in biology and medicine, and it is by far the most commonly used genetically altered laboratory mammal to address human diseases. While there has been significant recent efforts to analyze mice songs, advances in sensor technology have created a situation where our ability to collect data far outstrips our ability to analyze it. In this work we argue that the time is ripe for archives of mice songs to fall into the purview of data mining. We show a novel technique for mining mice vocalizations directly in the visual (spectrogram) space that practitioners currently use. Working in this space allows us to bring an arsenal of data mining tools to bear on this important domain, including similarity search, classification, motif discovery and contrast set mining.