AI Lab logo
menu MENU

Systems Seminar - CSE

Clio: Translating Web Data

Renee Miller
SHARE:

We present a novel framework for mapping between any combination of XML
and relational schemas, in which a high-level, user mapping is
translated into semantically meaningful queries that transform source
data into the target representation. Our approach works in two phases.
In the first phase, the high-level mapping, expressed as a set of
inter-schema correspondences, is converted into a set of mappings that
capture the design choices made in the source and target schemas
(including their hierarchical organization as well as their nested
referential constraints). The second phase translates these mappings
into queries over the source schemas that produce data satisfying the
constraints and structure of the target schema, and preserving the
semantic relationships of the source. Non-null target values may need
to be invented in this process. The mapping algorithm is complete in
that it produces {\bf all} mappings that are consistent with the schema
constraints. We have implemented the translation algorithm in Clio, a
schema mapping tool, and present our experiences on several real
schemas.

The mappings produced by Clio can be used both within data integration
where source data is queried through a virtual target schema and for
data exchange. We discuss the often subtle difference between the
semantics of data integration and that of data exchange. This is joint
work with Ron Fagin, Mauricio Hernandez, Phokion Kolaitis, Lucian Popa,
and Yannis Velegrakis.
Ren ée J. Miller is an associate professor of computer science at the
University of Toronto. She received the 1997 Presidential Early Career
Award for Scientists and Engineers (PECASE), the highest honor bestowed
by the United States government on outstanding scientists and engineers
beginning their careers. She is a recipient of the NSF CAREER Award,
the Premier's Research Excellence Award, and an IBM Faculty Award. Her
research interests are in the efficient, effective use of large volumes
of complex, heterogeneous data. This interest spans heterogeneous
databases, data mining, and data warehousing. She received her PhD in
Computer Science from the University of Wisconsin, Madison and
bachelor's degrees in Mathematics and Cognitive Science from MIT.

Sponsored by

SSRL