Results 1 - 10
of
139
Architecture and Quality in Data Warehouses: an Extended Repository Approach
, 1999
"... This paper makes two ..."
Finding and ranking knowledge on the semantic web.
- In Proceedings of the 4th International Semantic Web Conference,
, 2005
"... Abstract. Swoogle helps software agents and knowledge engineers find Semantic Web knowledge encoded in RDF and OWL documents on the Web. Navigating such a Semantic Web on the Web is difficult due to the paucity of explicit hyperlinks beyond the namespaces in URIrefs and the few inter-document links ..."
Abstract
-
Cited by 69 (2 self)
- Add to MetaCart
(Show Context)
Abstract. Swoogle helps software agents and knowledge engineers find Semantic Web knowledge encoded in RDF and OWL documents on the Web. Navigating such a Semantic Web on the Web is difficult due to the paucity of explicit hyperlinks beyond the namespaces in URIrefs and the few inter-document links like rdfs:seeAlso and owl:imports. In order to solve this issue, this paper proposes a novel Semantic Web navigation model providing additional navigation paths through Swoogle's search services such as the Ontology Dictionary. Using this model, we have developed algorithms for ranking the importance of Semantic Web objects at three levels of granularity: documents, terms and RDF graphs. Experiments show that Swoogle outperforms conventional web search engine and other ontology libraries in finding more ontologies, ranking their importance, and thus promoting the use and emergence of consensus ontologies.
Class Noise vs. Attribute Noise: A Quantitative Study of Their Impacts
- Artificial Intelligence Review
"... Abstract. Real-world data is never perfect and can often suffer from corruptions (noise) that may impact interpretations of the data, models created from the data and decisions made based on the data. Noise can reduce system performance in terms of classification accuracy, time in building a classif ..."
Abstract
-
Cited by 62 (7 self)
- Add to MetaCart
(Show Context)
Abstract. Real-world data is never perfect and can often suffer from corruptions (noise) that may impact interpretations of the data, models created from the data and decisions made based on the data. Noise can reduce system performance in terms of classification accuracy, time in building a classifier and the size of the classifier. Accordingly, most existing learning algorithms have integrated various approaches to enhance their learning abilities from noisy environments, but the existence of noise can still introduce serious negative impacts. A more reasonable solution might be to employ some preprocessing mechanisms to handle noisy instances before a learner is formed. Unfortunately, rare research has been conducted to systematically explore the impact of noise, especially from the noise handling point of view. This has made various noise processing techniques less significant, specifically when dealing with noise that is introduced in attributes. In this paper, we present a systematic evaluation on the effect of noise in machine learning. Instead of taking any unified theory of noise to evaluate the noise impacts, we differentiate noise into two categories: class noise and attribute noise, and analyze their impacts on the system performance separately. Because class noise has been widely addressed in existing research efforts, we concentrate on attribute noise. We investigate the relationship between attribute noise and classification accuracy, the impact of noise at different attributes, and possible solutions in handling attribute noise. Our conclusions can be used to guide interested readers to enhance data quality by designing various noise handling mechanisms.
Data Cleansing: Beyond Integrity Analysis
, 2000
"... The paper analyzes the problem of data cleansing and automatically identifying potential errors in data sets. An overview of the diminutive amount of existing literature concerning data cleansing is given. Methods for error detection that go beyond integrity analysis are reviewed and presented. The ..."
Abstract
-
Cited by 55 (0 self)
- Add to MetaCart
The paper analyzes the problem of data cleansing and automatically identifying potential errors in data sets. An overview of the diminutive amount of existing literature concerning data cleansing is given. Methods for error detection that go beyond integrity analysis are reviewed and presented. The applicable methods include: statistical outlier detection, pattern matching, clustering, and data mining techniques. Some brief results supporting the use of such methods are given. The future research directions necessary to address the data cleansing problem are discussed. Keywords: data cleansing, data cleaning, data quality, error detection.
Design and Analysis of Quality Information for Data Warehouses
- In Proc. of the 17th Int. Conf. on Conceptual Modeling (ER98
, 1998
"... Data warehouses are complex systems that have to deliver highly-aggregated, high quality data from heterogeneous sources to decision makers. Due to the dynamic change in the requirements and the environment, data warehouse system rely on meta databases to control their operation and to aid their evo ..."
Abstract
-
Cited by 39 (6 self)
- Add to MetaCart
Data warehouses are complex systems that have to deliver highly-aggregated, high quality data from heterogeneous sources to decision makers. Due to the dynamic change in the requirements and the environment, data warehouse system rely on meta databases to control their operation and to aid their evolution. In this paper, we present an approach to assess the quality of the data warehouse via a semantically rich model of quality management in a data warehouse. The model allows stakeholders to design abstract quality goals that are translated to executable analysis queries on quality measurements in the data warehouse's meta database. The approach is being implemented using the ConceptBase meta database system.
Towards Quality-Oriented Data Warehouse Usage and Evolution
- in Advanced Information Systems Engineering, 11th International Conference CAiSE’99, ser. Lecture Notes in Computer Science
, 1999
"... . As a decision support information system, a data warehouse must provide high level quality of data and quality of service. In the DWQ project we have proposed an architectural framework and a repository of metadata which describes all the data warehouse components in a set of metamodels to whic ..."
Abstract
-
Cited by 37 (6 self)
- Add to MetaCart
(Show Context)
. As a decision support information system, a data warehouse must provide high level quality of data and quality of service. In the DWQ project we have proposed an architectural framework and a repository of metadata which describes all the data warehouse components in a set of metamodels to which is added a quality metamodel, defining for each data warehouse metaobject the corresponding relevant quality dimensions and quality factors. Apart from this static definition of quality, we also provide an operational complement, that is a methodology on how to use quality factors and to achieve user quality goals. This methodology is an extension of the Goal-Question-Metric (GQM) approach, which allows to capture (a) the inter-relationships between different quality factors and (b) to organize them in order to fulfil specific quality goals. After summarizing the DWQ quality model, this paper describes the methodology we propose to use this quality model, as well as its impact on ...
Estimating the Quality of Databases
, 1998
"... . With more and more electronic information sources becoming widely available, the issue of the quality of these often-competing sources has become germane. We propose a standard for specifying the quality of databases, which is based on the dual concepts of data soundness and data completeness. ..."
Abstract
-
Cited by 33 (0 self)
- Add to MetaCart
(Show Context)
. With more and more electronic information sources becoming widely available, the issue of the quality of these often-competing sources has become germane. We propose a standard for specifying the quality of databases, which is based on the dual concepts of data soundness and data completeness. The relational model of data is extended by associating a quality specification with each relation instance, and by extending its algebra to calculate the quality specifications of derived relation instances. This provides a method for calculating the quality of answers to arbitrary queries from the overall quality specification of the database. We show practical methods for estimating the initial quality specifications of given databases, and we report on experiments that test the validity of our methods. Finally, we describe how quality estimations are being applied in the Multiplex multidatabase system to resolve cross-database inconsistencies. 1 Data Quality What is it? Data ...
Swoogle: Searching for knowledge on the semantic web
- in Proceedings of the AAAI 05
, 2005
"... Most knowledge on the Web is encoded as natural lan-guage text, which is convenient for human users but very difficult for software agents to understand. Even with in-creased use of XML-encoded information, software agents ..."
Abstract
-
Cited by 31 (0 self)
- Add to MetaCart
Most knowledge on the Web is encoded as natural lan-guage text, which is convenient for human users but very difficult for software agents to understand. Even with in-creased use of XML-encoded information, software agents
Automating the Approximate Record Matching Process
- Information Sciences
, 1999
"... Data Quality has many dimensions one of which is accuracy. Accuracy is usually compromised by errors accidentally or intensionally introduced in a database system. These errors result in inconsistent, incomplete, or erroneous data elements. For example, a small variation in the representation of a d ..."
Abstract
-
Cited by 31 (5 self)
- Add to MetaCart
Data Quality has many dimensions one of which is accuracy. Accuracy is usually compromised by errors accidentally or intensionally introduced in a database system. These errors result in inconsistent, incomplete, or erroneous data elements. For example, a small variation in the representation of a data object, produces a unique instantiation of the object being represented. In order to improve the accuracy of the data stored in a database system, we need to compare them either with real-world counterparts or with other data stored in the same or a different system. In this paper we address the problem of matching records which refer to the same entity by computing their similarity. Exact record matching has limited applicability in this context since even simple errors like character transpositions cannot be captured in the record linking process. Our methodology deploys advanced data mining techniques for dealing with the high computational and inferential complexity of approximate record matching.