Systems Seminar - CSE
Provisioning Queries and Analytics
Add to Google Calendar
Provisioning is a technique for avoiding repeated expensive computations in what-if analysis. Given a query, an analyst formulates
k hypotheticals, each retaining some of the tuples of a database instance, possibly overlapping, and she wishes to answer the query
under scenarios, where a scenario is defined by a subset of the hypotheticals that are "turned on". We say that a query admits *compact provisioning* if given any database instance and any k hypotheticals, one can create a poly-size (in k) sketch that can then be used to answer the query under any of the 2^k possible scenarios without accessing the original instance.
In this paper, we focus on provisioning complex queries that combine relational algebra (the logical component), grouping, and statistics/analytics (the numerical component). We show that positive logical queries with grouping can be compactly provisioned while the
addition of negation or recursion requires the sketch size to be exponential in k. Similar lower bounds hold for the exact provisioning
of numerical queries but we show that queries that compute quantiles or linear regression (as well as simpler queries that compute count and sum/average of positive values) can be compactly provisioned to provide *approximate* answers to an arbitrary precision.
Joint work with Sepehr Assadi, Sanjeev Khanna, and Yang Li (University of Pennsylvania).
Val Tannen is a professor in the Department of Computer and Information Science of the University of Pennsylvania. He joined Penn after receiving his PhD from the Massachusetts Institute of Technology in 1987. After working for a time in Programming Languages, his current research interests are in Databases. Moreover, he has always been interested in applications of Logic to Computer Science and since 1994 he has also worked in Bioinformatics, leading a number of interdisciplinary projects. In Databases, he and his students and collaborators have worked on query language design and on models and systems for query optimization, parallel query processing, and data integration. More recently their work has focused on models and systems for data sharing, data provenance and the management of uncertain information. Tannen is an ACM Fellow.