2 citations found. Retrieving documents...
V. Raman and J. Hellerstein. An interactive framework for data cleaning. Technical report, University of California, Berkeley, 2000.

 Home/Search   Document Details and Download   Summary   Related Articles   Check  

This paper is cited in the following contexts:
The Nimble XML Data Integration System - Draper, HaLevy, Weld (2001)   (3 citations)  (Correct)

....and from there directly to query execution plans in the physical algebra. 3.2. Dynamic Data Cleaning Although data cleaning and ETL capabilities are a crucial aspect of data warehousing and data integration, they have received only limited attention from the research community to date (e.g. [7, 15, 3, 10, 11, 16]) Data cleaning is difficult for the following reasons: Data anomalies: Values may be truncated, abbreviated, incorrect or missing. Corresponding records may contain inconsistent values for some fields. Keys, attributes, structures, encoding conventions may differ across applications. In ....

V. Raman and J. Hellerstein. An interactive framework for data cleaning. Technical report, University of California, Berkeley, 2000.


Online Dynamic Reordering - Raman, Raman, Hellerstein (2000)   (2 citations)  Self-citation (Raman Hellerstein)   (Correct)

....hypotheses; this is especially important for domain specific patterns that can easily seen by eyeballing a spreadsheet but are hard to find by formulating a query. Showing example data is also useful for data transformation, which is often necessary for data analysis and also for data cleansing [Raman and Hellerstein 2000]. Unfortunately, spreadsheets do not scale gracefully to large datasets. An inherent problem is that many spreadsheet behaviors are painfully slow on large datasets. Microsoft Excel 97, for example, restricts table size to at most 64K rows [Microsoft 1997] presumably to ensure interactive ....

.... (by address or cell content prefix) We are developing A B C, a scalable spreadsheet for data analysis, transformation, and cleaning, where fetching data from the source, applying transforms, sorting, scrolling, and jumping to particular positions are all instantaneous from the user s perspective [Raman and Hellerstein 2000]. We lower the response time as perceived by the user by processing retrieving items faster in the region around the scrollbar the range to which an item belongs is inferred via a histogram (this could be stored as a precomputed statistic or be built on the fly [Gibbons et al. 1997; Chaudhuri ....

[Article contains additional citation context not shown here]

Raman, V. and Hellerstein, J. M. An interactive framework for data cleaning. http://control.cs.berkeley.edu/abc. Working draft, 2000.

Online articles have much greater impact   More about CiteSeer.IST   Add search form to your site   Submit documents   Feedback  

CiteSeer.IST - Copyright Penn State and NEC