• Documents
  • Authors
  • Tables
  • Log in
  • Sign up
  • MetaCart
  • DMCA
  • Donate

CiteSeerX logo

Advanced Search Include Citations
Advanced Search Include Citations | Disambiguate

Incrementally maintaining classification using an RDBMS (2011)

by M L Koc, C Ré
Add To MetaCart

Tools

Sorted by:
Results 1 - 4 of 4

Incremental knowledge base construction using DeepDive

by Jaeho Shin, Sen Wu, Feiran Wang, Christopher De, Sa Ce, Zhang† Christopher Ré - Proceedings of the VLDB Endowment (PVLDB , 2015
"... Populating a database with unstructured information is a long-standing problem in industry and research that encom-passes problems of extraction, cleaning, and integration. Re-cent names used for this problem include dealing with dark data and knowledge base construction (KBC). In this work, we desc ..."
Abstract - Cited by 7 (3 self) - Add to MetaCart
Populating a database with unstructured information is a long-standing problem in industry and research that encom-passes problems of extraction, cleaning, and integration. Re-cent names used for this problem include dealing with dark data and knowledge base construction (KBC). In this work, we describe DeepDive, a system that combines database and machine learning ideas to help develop KBC systems, and we present techniques to make the KBC process more efficient. We observe that the KBC process is iterative, and we de-velop techniques to incrementally produce inference results for KBC systems. We propose two methods for incremen-tal inference, based respectively on sampling and variational techniques. We also study the tradeoff space of these meth-ods and develop a simple rule-based optimizer. DeepDive includes all of these contributions, and we evaluate Deep-Dive on five KBC systems, showing that it can speed up KBC inference tasks by up to two orders of magnitude with negligible impact on quality. 1.
(Show Context)

Citation Context

... the incremental maintenance problem as one of approximate inference. Previous work in the database community has looked at how machine learning data products change in response to both to new labels =-=[24]-=- and to new data [9,10]. In KBC, both the program and data change on each iteration. Our proposed approach can cope with both types of change simultaneously. The technical question is which approximat...

Learning Generalized Linear Models Over Normalized Data

by Arun Kumar, Jeffrey Naughton, Jignesh M. Patel
"... Enterprise data analytics is a booming area in the data man-agement industry. Many companies are racing to develop toolkits that closely integrate statistical and machine learn-ing techniques with data management systems. Almost all such toolkits assume that the input to a learning algorithm is a si ..."
Abstract - Cited by 2 (1 self) - Add to MetaCart
Enterprise data analytics is a booming area in the data man-agement industry. Many companies are racing to develop toolkits that closely integrate statistical and machine learn-ing techniques with data management systems. Almost all such toolkits assume that the input to a learning algorithm is a single table. However, most relational datasets are not stored as single tables due to normalization. Thus, analysts often perform key-foreign key joins before learning on the join output. This strategy of learning after joins introduces redundancy avoided by normalization, which could lead to poorer end-to-end performance and maintenance overheads due to data duplication. In this work, we take a step towards enabling and optimizing learning over joins for a common class of machine learning techniques called generalized linear models that are solved using gradient descent algorithms in an RDBMS setting. We present alternative approaches to learn over a join that are easy to implement over existing RDBMSs. We introduce a new approach named factorized learning that pushes ML computations through joins and avoids redundancy in both I/O and computations. We study the tradeoff space for all our approaches both analytically and empirically. Our results show that factorized learning is often substantially faster than the alternatives, but is not always the fastest, necessitating a cost-based approach. We also discuss extensions of all our approaches to multi-table joins as well as to Hive.
(Show Context)

Citation Context

...roup of systems. We hope our work contributes to more research in this direction. Analytics systems that provide incremental maintenance over evolving data for some ML models have been studied before =-=[19,25]-=-. However, neither of those papers address learning over joins. It is interesting future work to study the interplay between these two problems. 7. CONCLUSION AND FUTUREWORK Key-foreign key joins are ...

DeepDive: A Data Management System for Automatic Knowledge Base Construction

by Ce Zhang, Pradap Konda, Emily Mallory , 2015
"... iACKNOWLEDGMENTS I owe Christopher Re ́ my career as a researcher, the greatest dream of my life. Since the day I first met Chris and told him about my dream, he has done everything he could, as a scientist, an educator, and a friend, to help me. I am forever indebted to him for his completely hones ..."
Abstract - Add to MetaCart
iACKNOWLEDGMENTS I owe Christopher Re ́ my career as a researcher, the greatest dream of my life. Since the day I first met Chris and told him about my dream, he has done everything he could, as a scientist, an educator, and a friend, to help me. I am forever indebted to him for his completely honest criticisms and feedback, the most valuable gifts an advisor can give. His training equipped me with confidence and pride that I will carry for the rest of my career. He is the role model that I will follow. If my whole future career achieves an approximation of what he has done so far in his, I will be proud and happy. I am also indebted to Jude Shavlik and Miron Livny, who, after Chris left for Stanford, kindly helped me through all the paperwork and payments at Wisconsin. If it were not for their help, I would not have been able to continue my PhD studies. I am also profoundly grateful to Jude for being the chair of my committee. I am also likewise grateful to Jeffrey Naughton, David Page, and Shanan Peters for serving on my committee; and Thomas Reps for his feedback during defense. DeepDive would have not been possible without all its users. Shanan Peters was the first user, working with it before it even got its name. He spent three years going through a painful process with us before we understood the current abstraction of DeepDive. I am grateful to him for sticking with
(Show Context)

Citation Context

... the incremental maintenance problem as one of approximate inference. Previous work in the database community has looked at how machine learning data products change in response to both to new labels =-=[110]-=- and to new data [52,53]. In KBC, both the program and data change on each iteration. Our proposed approach can cope with both types of change simultaneously. The technical question is which approxima...

Feature Engineering for Knowledge Base Construction

by Amir Abbas, Sadeghian Zifei Shan, Jaeho Shin, Feiran Wang, Sen Wu, Ce Zhang
"... Knowledge base construction (KBC) is the process of populating a knowledge base, i.e., a relational database together with inference rules, with information extracted from documents and structured sources. KBC blurs the distinction between two traditional database problems, information extraction an ..."
Abstract - Add to MetaCart
Knowledge base construction (KBC) is the process of populating a knowledge base, i.e., a relational database together with inference rules, with information extracted from documents and structured sources. KBC blurs the distinction between two traditional database problems, information extraction and in-formation integration. For the last several years, our group has been building knowledge bases with scientific collaborators. Using our approach, we have built knowledge bases that have comparable and sometimes better quality than those constructed by human volunteers. In contrast to these knowledge bases, which took experts a decade or more human years to construct, many of our projects are con-structed by a single graduate student. Our approach to KBC is based on joint probabilistic inference and learning, but we do not see inference as either a panacea or a magic bullet: inference is a tool that allows us to be systematic in how we construct, debug, and improve the quality of such systems. In addition, inference allows us to construct these systems in a more loosely coupled way than traditional approaches. To support this idea, we have built the DeepDive system, which has the design goal of letting the user “think about features— not algorithms. ” We think of DeepDive as declarative in that one specifies what they want but not how to get it. We describe our approach with a focus on feature engineering, which we argue is an understudied problem relative to its importance to end-to-end quality. 1
(Show Context)

Citation Context

...problem is distinct from incremental or online learning, as the focus is on maintaining the downstream data products. We have done some work in this direction in a simplified classifier-based setting =-=[16]-=-. • “Active Debugging” and Rule Learning. The current debugging infrastructure in DeepDive is “passive” in that DeepDive waits for instructions from the user for error analysis or adding more features...

Powered by: Apache Solr
  • About CiteSeerX
  • Submit and Index Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2019 The Pennsylvania State University