Results 1 - 10
of
104
Using information content to evaluate semantic similarity in a taxonomy
- In Proceedings of the 14th International Joint Conference on Artificial Intelligence (IJCAI-95
, 1995
"... philip.resnikfleast.sun.com This paper presents a new measure of semantic similarity in an IS-A taxonomy, based on the notion of information content. Experimental evaluation suggests that the measure performs encouragingly well (a correlation of r = 0.79 with a benchmark set of human similarity judg ..."
Abstract
-
Cited by 526 (6 self)
- Add to MetaCart
philip.resnikfleast.sun.com This paper presents a new measure of semantic similarity in an IS-A taxonomy, based on the notion of information content. Experimental evaluation suggests that the measure performs encouragingly well (a correlation of r = 0.79 with a benchmark set of human similarity judgments, with an upper bound of r = 0.90 for human subjects performing the same task), and significantly better than the traditional edge counting approach (r = 0.66). 1
Semantic Similarity in a Taxonomy: An Information-Based Measure and its Application to Problems of Ambiguity in Natural Language
, 1999
"... This article presents a measure of semantic similarityinanis-a taxonomy based on the notion of shared information content. Experimental evaluation against a benchmark set of human similarity judgments demonstrates that the measure performs better than the traditional edge-counting approach. The a ..."
Abstract
-
Cited by 320 (10 self)
- Add to MetaCart
This article presents a measure of semantic similarityinanis-a taxonomy based on the notion of shared information content. Experimental evaluation against a benchmark set of human similarity judgments demonstrates that the measure performs better than the traditional edge-counting approach. The article presents algorithms that take advantage of taxonomic similarity in resolving syntactic and semantic ambiguity, along with experimental results demonstrating their e#ectiveness. 1. Introduction Evaluating semantic relatedness using network representations is a problem with a long history in arti#cial intelligence and psychology, dating back to the spreading activation approach of Quillian #1968# and Collins and Loftus #1975#. Semantic similarity represents a special case of semantic relatedness: for example, cars and gasoline would seem to be more closely related than, say, cars and bicycles, but the latter pair are certainly more similar. Rada et al. #Rada, Mili, Bicknell, & Blett...
Determining Semantic Similarity among Entity Classes from Different Ontologies
- IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING
, 2003
"... Semantic similarity measures play an important role in information retrieval and information integration. Traditional approaches to modeling semantic similarity compute the semantic distance between definitions within a single ontology. This single ontology is either a domain-independent ontology or ..."
Abstract
-
Cited by 119 (3 self)
- Add to MetaCart
Semantic similarity measures play an important role in information retrieval and information integration. Traditional approaches to modeling semantic similarity compute the semantic distance between definitions within a single ontology. This single ontology is either a domain-independent ontology or the result of the integration of existing ontologies. We present an approach to computing semantic similarity that relaxes the requirement of a single ontology and accounts for differences in the levels of explicitness and formalization of the different ontology specifications. A similarity function determines similar entity classes by using a matching process over synonym sets, semantic neighborhoods, and distinguishing features that are classified into parts, functions, and attributes. Experimental results with different ontologies indicate that the model gives good results when ontologies have complete and detailed representations of entity classes. While the combination of word matching and semantic neighborhood matching is adequate for detecting equivalent entity classes, feature matching allows us to discriminate among similar, but not necessarily equivalent, entity classes.
Utility-Based Cache Partitioning: A Low-Overhead, High-Performance, Runtime Mechanism to Partition Shared Caches
- IEEE/ACM INTERNATIONAL SYMPOSIUM ON MICROARCHITECTURE
, 2006
"... This paper investigates the problem of partitioning a shared cache between multiple concurrently executing applications. The commonly used LRU policy implicitly partitions a shared cache on a demand basis, giving more cache resources to the application that has a high demand and fewer cache resource ..."
Abstract
-
Cited by 89 (0 self)
- Add to MetaCart
This paper investigates the problem of partitioning a shared cache between multiple concurrently executing applications. The commonly used LRU policy implicitly partitions a shared cache on a demand basis, giving more cache resources to the application that has a high demand and fewer cache resources to the application that has a low demand. However, a higher demand for cache resources does not always correlate with a higher performance from additional cache resources. It is beneficial for performance to invest cache resources in the application that benefits more from the cache resources rather than in the application that has more demand for the cache resources. This paper proposes utility-based cache partitioning (UCP), a low-overhead, runtime mechanism that partitions a shared cache between multiple applications depending on the reduction in cache misses that each application is likely to obtain for a given amount of cache resources. The proposed mechanism monitors each application at runtime using a novel, cost-effective, hardware circuit that requires less than 2kB of storage. The information collected by the monitoring circuits is used by a partitioning algorithm to decide the amount of cache resources allocated to each application. Our evaluation, with 20 multiprogrammed workloads, shows that UCP improves performance of a dual-core system by up to 23% and on average 11% over LRU-based cache partitioning.
SEA-CNN: Scalable processing of continuous k-nearest neighbor queries in spatio-temporal databases
- In ICDE
, 2005
"... Location-aware environments are characterized by a large number of objects and a large number of continuous queries. Both the objects and continuous queries may change their locations over time. In this paper, we focus on continuous k-nearest neighbor queries (CKNN, for short). We present a new algo ..."
Abstract
-
Cited by 53 (4 self)
- Add to MetaCart
Location-aware environments are characterized by a large number of objects and a large number of continuous queries. Both the objects and continuous queries may change their locations over time. In this paper, we focus on continuous k-nearest neighbor queries (CKNN, for short). We present a new algorithm, termed SEA-CNN, for answering continuously a collection of concurrent CKNN queries. SEA-CNN has two important features: incremental evaluation and shared execution. SEA-CNN achieves both efficiency and scalability in the presence of a set of concurrent queries. Furthermore, SEA-CNN does not make any assumptions about the movement of objects, e.g., the objects velocities and shapes of trajectories, or about the mutability of the objects and/or the queries, i.e., moving or stationary queries issued on moving or stationary objects. We provide theoretical analysis of SEA-CNN with respect to the execution costs, memory requirements and effects of tunable parameters. Comprehensive experimentation shows that SEA-CNN is highly scalable and is more efficient in terms of both I/O and CPU costs in comparison to other R-tree-based CKNN techniques. 1.
Application Level Fault Tolerance in Heterogeneous Networks of Workstations
- Journal of Parallel and Distributed Computing
, 1997
"... We have explored methods for checkpointing and restarting processes within the Distributed object migration environment (Dome), a C++ library of data parallel objects that are automatically distributed over heterogeneous networks of workstations (NOWs). System level checkpointing methods, although t ..."
Abstract
-
Cited by 48 (0 self)
- Add to MetaCart
We have explored methods for checkpointing and restarting processes within the Distributed object migration environment (Dome), a C++ library of data parallel objects that are automatically distributed over heterogeneous networks of workstations (NOWs). System level checkpointing methods, although transparent to the user, were rejected because they lack support for heterogeneity. We have implemented application level checkpointing which places the checkpoint and restart mechanisms within Dome's C++ objects. Application level checkpointing has been implemented with a library-based technique for the programmer and a more transparent preprocessor-based technique. Dome's implementation of checkpointing successfully checkpoints and restarts processes on different numbers of machines and different architectures. Results from executing Dome programs across a NOW with realistic failure rates have been experimentally determined and are compared with theoretical results. The overhead of checkpoi...
Adaptive insertion policies for high performance caching
- In Proceedings of the 35th International Symposium on Computer Architecture
, 2007
"... The commonly used LRU replacement policy is susceptible to thrashing for memory-intensive workloads that have a working set greater than the available cache size. For such applications, the majority of lines traverse from the MRU position to the LRU position without receiving any cache hits, resulti ..."
Abstract
-
Cited by 42 (3 self)
- Add to MetaCart
The commonly used LRU replacement policy is susceptible to thrashing for memory-intensive workloads that have a working set greater than the available cache size. For such applications, the majority of lines traverse from the MRU position to the LRU position without receiving any cache hits, resulting in inefficient use of cache space. Cache performance can be improved if some fraction of the working set is retained in the cache so that at least that fraction of the working set can contribute to cache hits. We show that simple changes to the insertion policy can significantly reduce cache misses for memory-intensive workloads. We propose the LRU Insertion Policy (LIP) which places the incoming line in the LRU position instead of the MRU position. LIP protects the cache from thrashing and results in close to optimal hitrate for applications that have a cyclic reference pattern. We also propose the Bimodal Insertion Policy (BIP) as an enhancement of LIP that adapts to changes in the working set while maintaining the thrashing protection of LIP. We finally propose a Dynamic Insertion Policy (DIP) to choose between BIP and the traditional LRU policy depending on which policy incurs fewer misses. The proposed insertion policies do not require any change to the existing cache structure, are trivial to implement, and have a storage requirement of less than two bytes. We show that DIP reduces the average MPKI of the baseline 1MB 16-way L2 cache by 21%, bridging two-thirds of the gap between LRU and OPT.
Improving Cluster Availability Using Workstation Validation
, 2002
"... We demonstrate a framework for improving the availability of cluster based Internet services. Our approach models Internet services as a collection of interconnected components, each possessing well defined interfaces and failure semantics. Such a decomposition allows designers to engineer high avai ..."
Abstract
-
Cited by 35 (11 self)
- Add to MetaCart
We demonstrate a framework for improving the availability of cluster based Internet services. Our approach models Internet services as a collection of interconnected components, each possessing well defined interfaces and failure semantics. Such a decomposition allows designers to engineer high availability based on an understanding of the interconnections and isolated fault behavior of each component, as opposed to ad-hoc methods. In this work, we focus on using the entire commodity workstation as a component because it possesses natural, fault-isolated interfaces. We define a failure event as a reboot because not only is a workstation unavailable during a reboot, but also because reboots are symptomatic of a larger class of failures, such as configuration and operator errors. Our observations of 3 distinct clusters show that the time between reboots is best modeled by a Weibull distribution with shape parameters of less than 1, implying that a workstation becomes more reliable the longer it has been operating. Leveraging this observed property, we design an allocation strategy which withholds recently rebooted workstations from active service, validating their stability before allowing them to return to service. We show via simulation that this policy leads to a 70-30 rule-of-thumb: For a constant utilization, approximately 70% of the workstation failures can be masked from end clients with 30% extra capacity added to the cluster, provided reboots are not strongly correlated. We also found our technique is most sensitive to the burstiness of reboots as opposed to absolute lengths of workstation uptimes.
Enhancing Lifetime and Security of PCM-Based Main Memory with Start-Gap Wear Leveling
"... Phase Change Memory (PCM) is an emerging memory technology that can increase main memory capacity in a cost-effective and power-efficient manner. However, PCM cells can endure only a maximum of 10 7- 10 8 writes, making a PCM based system have a lifetime of only a few years under ideal conditions. F ..."
Abstract
-
Cited by 22 (1 self)
- Add to MetaCart
Phase Change Memory (PCM) is an emerging memory technology that can increase main memory capacity in a cost-effective and power-efficient manner. However, PCM cells can endure only a maximum of 10 7- 10 8 writes, making a PCM based system have a lifetime of only a few years under ideal conditions. Furthermore, we show that non-uniformity in writes to different cells reduces the achievable lifetime of PCM system by 20x. Writes to PCM cells can be made uniform with Wear-Leveling. Unfortunately, existing wear-leveling techniques require large storage tables and indirection, resulting in significant area and latency overheads. We propose Start-Gap, a simple, novel, and effective wear-leveling technique that uses only two registers. By combining Start-Gap with simple address-space randomization techniques we show that the achievable lifetime of the baseline 16GB PCM-based system is boosted from 5 % (with no wear-leveling) to 97 % of the theoretical maximum, while incurring a total storage overhead of less than 13 bytes and obviating the latency overhead of accessing large tables. We also analyze the security vulnerabilities for memory systems that have limited write endurance, showing that under adversarial settings, a PCM-based system can fail in less than one minute. We provide a simple extension to Start-Gap that makes PCM-based systems robust to such malicious attacks. Categories and Subject Descriptors:
High-Level Fault Tolerance in Distributed Programs
, 1994
"... We have been developing high-level checkpoint and restart methods for Dome (Distributed Object Migration Environment) , a C++ library of data-parallel objects that are automatically distributed using PVM. There are several levels of programming abstraction at which fault tolerance mechanisms can be ..."
Abstract
-
Cited by 21 (3 self)
- Add to MetaCart
We have been developing high-level checkpoint and restart methods for Dome (Distributed Object Migration Environment) , a C++ library of data-parallel objects that are automatically distributed using PVM. There are several levels of programming abstraction at which fault tolerance mechanisms can be designed: high-level, where the checkpoint and restart are built into our C++ objects, but the program structure is severly constrained; high-level with preprocessing, where a preprocessor inserts extra C++ statements into the code to facilitate checkpoint and restart; and low-level, where periodically an interrupt causes a memory image to be written out. Because we consider portability (both of our libraries and of the checkpoints they produce) to be an important goal, we focus on the higher-level checkpointing methods. In addition, we describe an implementation of high-level checkpointing, demonstrate it on multiple architectures, and show that it is efficient enough to provide good expect...

