Projects

In a page like this you usually would find a description of current and past projects, with beautiful words about the relevance and the significance of these projects (even when they are completely irrelevant and useless) for the future of computer science and of the whole humanity, and with some links to project sites.

I am not breaking this unwritten rule, so ...

Gamma

This project is a joint activity with Dario Colazzo, currently at LRI, France. The research activity started in the last quarter of 2004 as a collateral research project and rapidly became my principal research activity (even if completely unfunded).

The basic idea of the project is to study the problem of maintaining schema mappings in XML p2p systems (and in data exchange systems too), so to develop techniques for automatically discovering corrupted mappings. This problem is very important, as query answering depends on the quality of schema mappings.

For further information about Gamma, click here.

Our first approach, presented at PlanX 2005 and described here, was to derive the correctness of a mapping from the correctness of rewritten queries over the target schema: if a query Q1 on S is rewritten by a mapping m into a query Q2 over T, and Q2 does not match T, then m is for sure incorrect.

This approach, even if very simple, has two main drawback. First, it is not complete, as some errors cannot be captured. Second, it depends on the query answering algorithms, hence it is computationally very expensive. To overcome these issues, we developed a second approach, presented at DBPL 2005 and described here, which is complete (ALL errors are captured) and is (almost) tractable. The basic idea of this technique is to infer (once) the output type of a mapping and to compare it with the target schema, according to a type projection relation. Type projection checking is in PTIME, while type inference for our mapping language is exponential.

We have just completed a formal study of type projection, and we are currently trying to apply this operation to a data exchange setting.

XPeer

XPeer is a peer-to-peer (p2p) system for sharing and querying XML data. The system is being developed by the Database Group of the Computer Science Department of University of Pisa in the context of the Italian FIRB Grid.IT project.

XPeer was initially outlined (in the first half of 2003) as a tool to perform resource discovery in a grid infrastructure (if you are wondering why so many grid projects are receiving so much money in Europe, I am doing the same). The basic idea was to associate each resource in a grid environment with a short XML description, and to use XPeer to process simple XML queries on the resulting XML distributed database, so to overcome the limitations of existing resource discovery systems for Grid platforms, which are essentially based on LDAP.

From the second half of 2003, we started reshaping the system to make it more interesting from a scientific point of view: we transformed XPeer in a full-fledged p2p database system, so to support complex XML queries (the FLWR core of XQuery) in a rather chaotic and unstable environment, where peers are free to do what they want. In particular, we designed XPeer as a self-configuring system, able to self-manage its overlay network, so to adapt to changes in the workload as well as in the topology of the peer network, which was assumed to be very dynamic.

This assumption was one of the biggest mistakes we made (in particular, I made) in the design of XPeer. While mapping-based systems like Piazza set the mark, we dropped schema mappings from our system, as they were too expensive to manage in a dynamic environment.

We spent about one year in designing the architecture and the protocols of the system (changing both of them many times), so the implementation really started in the late summer of 2004. This was the second biggest mistake we made.

After one year of hard implementation work, and after many rejected papers, XPeer implementation is almost completed. By the second quarter of 2005 the Database Group should be able to start the deployment of the system as well extensive experiments on its behavior.