Systems Seminar - CSE

PADS: An End-to-end System for Processing Ad Hoc Data

Mary Fernández
SHARE:

Enormous amounts of data exist in “well-behaved” formats such as relational tables and XML, which come equipped with extensive tool support. However, vast amounts of data also exist in non-standard or “ad hoc” data formats, which often lack standard or extensible tools. PADS is an end-to-end system for processing ad hoc data sources. The core of PADS is a declarative language, which allows data analysts to describe both the physical layout of ad hoc data and semantic properties of that data, and a data-description compiler, which produces customizable libraries for parsing the ad hoc data. A suite of tools built around this core include statistical data-profiling tools, a query engine that permits viewing ad hoc sources as XML and for querying them with XQuery, and an interactive front-end that helps users produce PADS descriptions quickly. In this talk, I will describe the PADS data-description language and the relationship between ad hoc data described by PADS and its realization in XML.

The PADS team includes its principal inventors Kathleen Fisher (AT&T Research) and Robert Gruber (Google Inc.); Mark Daly, Yitzhak Mandelbaum, and David Walker (Princeton University); and Xuan Zheng (University of Michigan). More information on PADS can be found at: http://www.padsproj.org.
Mary Fernández is Principal Technical Staff at AT&T Labs – Research. She works at the juncture of programming languages and database systems, in particular on domain-specific languages for data management problems, their formal semantics, techniques for their efficient implementation, and their interaction with general-purpose programming languages. She is co-editor of several of the XQuery W3C working drafts and is a principal designer and implementor of Galax, a complete, open-source implementation of XQuery (www.galaxquery.org). She is also an associate editor of ACM Transactions on Database Systems and serves on the advisory board of MentorNet (www.mentornet.net), an e-mentoring network for women in engineering and science.

Sponsored by

SSL