Pisa - Dipartimento di Informatica - Research Evaluation Exercise 1999


Algorithms and Complexity:
Models and applications




Proposer: Fabrizio Luccio


Summary

This group has been active in the development of computational models and paradigms describing the functioning of real systems. A large experience has also been acquired in designing data structures. Based on this background, the research proceeds in the following three areas, that largely overlap. The most relevant part of our present and short term research is in area 2.

1. Computational models. Development and experimental validation of models for I/O complexity, and parallel computation on clusters of PC's. Investigation on routing and fault tolerance in distributed systems. Possibly, experiments in molecular computing in collaboration with a group of biologists.

2. Design of algorithms and applications on very large data sets. The focus is searching and updating large text collections which are stored in secondary storage devices, like disks and CD-ROMs, through the efficient construction of index data structures for very large strings. Among the foreseen outputs, we mention the construction of an extension of the well known LEDA library to string processing.

3. Computational biology. Algorithms for genome distance, and for the inference of gene duplications inside a single genome.



State of the art and trends

The study of computational models should be aimed at a description of the functioning of real systems. This has been a trend of the past decade, and has probably reached his peak in sequential computation with the studies of hierarchical memories and I/O complexity. Meanwhile, the first massive applications running on parallel machines have raised a substantial interest in parallel algorithms. After the great wealth of studies on the abstract parallel PRAM model had laid the bases for general design methodologies in this area, more realistic models have been proposed (BSP and coarse-grained above all), under which the field of parallel algorithms and data structures is currently being reconsidered. Particular attention is now given to the case in which a limited parallelism is actually available, because many existing parallel machines include 10 to 100 processors, and known sequential algorithms can be extended to such machines without great effort. On the other extreme, massively parallel machines are mostly aimed to the solution of special problems, and the design of their algorithms is a self-standing discipline.

A companion set of studies has been developed for distributed systems. This has generally been kept seprated from the field of parallel systems, although the division among the two is artificial in many cases. A unification effort has started quite recently.

Culturally related to this whole field, but quite distinct in nature, are the investigations on new paradigms of "natural computing", aimed at overcoming the limitations posed by exponential problems. In particular, data can be represented and processed through biochemical reactions. The practical impact of these models is not clear yet, but it is certainly interesting to experiment on them in the framework of a cooperation between biologists and computer scientists.

Aside from its intrinsic theoretical value, our interest in computational models is strictly connected to the development of algorithms and applications for very large data sets, that constitute the bulk of our present and planned future research. This also includes some studies in genome information storage and rearrangement. Let us now examine our motivations in some detail.

Computers usually contain a hierarchy of memory levels: CPU-registers, caches, random access memory, up to magnetic disks, CD-ROMs and tertiary storage devices. Each level has its own capacity and performance, that go from few kilobytes to several gygabytes, and from few nanoseconds to several milliseconds, and possibly minutes, of access time. Programmers generally assume that all memory references require the same access time, that is reasonable when the data sets are not particularly large, or when they are referenced with some degree of locality. Modern operating systems take advantage of these conditions through caching and prefetching heuristics. Many applications, however, require processing and management of very large data sets. This occurs in the fields of multimedia, digital libraries, data compression, genetics, just to cite a few. Another important source is the one of spatial data representing multidimensional objects, to be managed in computer graphics and geographic information systems applications.

Data are spread over the whole memory hierarchy, and the data flow among the memory levels becomes the major bottleneck. Furthermore locality in the pattern of memory references is usually not guaranteed, so that caching and prefetching methods currently used do not produce an appreciable speed-up. Substantial gain in performance can only be attained by directly exploiting the memory hierarchy in the design of algorithms and data structures, by the use of design methods different from those deployed thus far. An important document in this area is a survey of Jeffrey S. Vitter, and in particular the Parallel Disk Model (PDM), that is a reasonably accurate model for designing and analyzing algorithms and data structures for the special, yet interesting, case of two-level memory.

Many elegant algorithms and data structures have been successfully designed and experimented for very large data sets. Probably the most widespread data sets are made up of texts, and that motivates an important line of research involved in the area of data structures and algorithms for string processing in massive textual data sets. Classical tools for manipulating external texts - like suffix trees, suffix arrays, Prefix B-trees, inverted files - offer very good practical performances but are not optimal from a computational point of view. Theoretical results are therefore foreseen to better understand the properties and limitations of the textual indexing problem. On the other hand, there is necessity of delivering easy-to-use and effective software tools for string processing tasks, which are frequent and computationally demanding, but not yet available. Another goal is to collect test sets for combinatorial algorithms on strings in order to provide the researchers in this area with a standard benchmark to check the performance of their programs.

We also give attention to themes of biological interest, because part of our studies will be in algorithms for genome analysis. Laboratories produce a huge and rapidly increasing amount of biological data worldwide. In particular, it is estimated that available biosequence data sets double their size every year. The need of storing and updating these data with great efficiency has motivated the rapid growth of computational biology, whose interest is internationally increasing at an unexpected pace (the main conference in the field, ACM RECOMB, had to limit participation to the first 300 registrants this year).

A major class of problems in computational biology is related to genome comparison and rearrangement. The aim of the latter is to gain information about the way genomes evolve. "Distance" between genomes is defined in terms of the number of mutations needed to transform one genome into another, where a mutation may derive from a combination of duplications, transpositions, inversions, translocations, and recombinations of whole genes. These studies require a strong algorithmic expertise, and must be conducted in cooperation with molecular biologists.


Relevant research activities at the Department

1. Computational models.

In recent years, this group has been active in the development of computational models and paradigms describing the functioning of real systems. Some theoretical bases on hierarchical systems and parallel speed-up have been laid out [e.g., LP98]. The intrinsic limits of parallel computation have been investigated through the study of special Boolean functions [LP97], with unexpected implications in logical design [LP99]. Distributed systems have also been considered, studying redundancy for fault tolerant functioning [e.g., DMP98, FLL98], and investigating the power of routing strategies [e.g., LMO98]. All these studies are still under way. Some experimentation is also starting now, for the validation of the disk models developed for sequential computation. Experiments are also starting to confirm parallel speed-up predictions on a simple network of ten PCs available at the department.

In the specific subject of DNA computing a deep survey has been produced, with the individuation of applications that potentially offer special performances, and the discussion of laboratory feasibility of the corresponding computations.New unusual complexity parameters have also been taken into account.

2. Design of algorithms and applications on very large data sets.

The focus is searching and updating large text collections which are stored in secondary storage devices, like disks and CD-ROMs, through the efficient construction of index data structures for very large strings. Among the foreseen outputs, we mention the construction of an extension of the well known LEDA library to string processing.

A significant tool in massive string processing is the "String B-tree" proposed in [FG95/99], which allows one to circumvent two major difficulties encountered in managing long strings. First, each long string is represented by a constant amount of space (independently of its length); second, it avoids the computational overhead usually incurred in comparing two long strings character by character. As a result, String B-trees enable to search arbitrary strings in very large text archives with a negligible number of disk accesses, and to fast maintain the archives under changes over the time. There is preliminary evidence that the String B-tree performance is also good in practice. Various preliminary results of this group described the features of the String B-tree and its applications [e.g., F97, FFM98, FL98, FG99, FG95/99]. Further theoretical study and experimental analysis is now necessary to refine and extend our results, and to proceed in relevant applications, as discussed later on.

One of the achievements in the processing of very large spatial data is a new randomized technique based on random sampling [CFM98]. This technique allows the development of a general randomized approach suitable to get I/O-efficient solutions for various basic geometric problems like segment intersections, convex hulls, Voronoi diagrams, batched planar point location. Several (internal memory) algorithms for the segment intersections problem are known that execute an optimal number of (internal) operations, both sequential and parallel. In the external-memory model, however, no I/O-optimal solution is known and the best algorithm has been devised by Arge, Vengroff and Vitter (1995). It is deterministic, involved, and non-optimal. Other external-memory techniques to solve I/O-optimally other geometric problems like 2(3)-d convex hulls and the batched planar point location are known, however the resulting algorithms are complicated and require the use of several non-trivial I/O-optimal subroutines to solve some specific geometric subproblems. Our randomized incremental technique for external memory works for all problems listed above is simple, and gives optimal algorithms both in terms of expected number of I/Os and internal operations.

Another achievement is a suite of efficient methods for organizing and maintaining solutions for a large class of problems, called decomposable problems, that are characterized by divide-and-conquer solutions in a dynamic setting [GI98,GI99]. We focus particularly on multidimensional data sets which must be kept simultaneously sorted under several total orderings: these orderings may be defined by the user, and may also be changed dynamically by the user throughout the lifetime of the data structures, according to the application at hand. Besides standard insertions and deletions of data, our proposed solution can perform efficiently split and concatenate operations on the whole data sets according to any ordering. This allows the user to dynamically rearrange any ordering of a segment of data, in a time that is faster than recomputing the new ordering from scratch, and to efficiently answer queries related to the data contained in a particular range of the current orderings. Our solution fully generalizes the notion of B-trees to higher dimensions by carefully combining space-driven and data-driven partitions. Balancing is easy as we introduce a new multidimensional data structure, the cross-tree, that is the cross product of balanced trees. As a result, the cross-tree is competitive with other popular index data structures that require linear space, such as k-d trees, quad-trees, grid files, space filling curves, hB-trees, and R-trees.

Novel applications of geometric data structures to string matching problems have been proposed in [FMB99]. In the field of parallel algorithms, some results have been produced on PRAM computation on strings [FL96, GG99], while the efficient processing of string dictionaries in a coarse-grained model have been demonstrated in [FL99].

3. Computational biology.

In the years, this group has contributed to the area of string processing with several studies, in particular addressing the problem of context sensitive string matching that seems to be particularly relevant in biological problems. Problems on the matching of trees have been solved under different conditions [e.g., LP95], and methodologies for the riconstruction of phyogenies have then been proposed and tested on primates. Quite recently a study has been carried out on genome analysis in cooperation with researchers of Paris VI, aimed to the detection of duplicated genes within a single genome. The mathematical technique empoyed makes use of the theory of random graphs.trees.

In the field of biological data storage, another work in cooperation with the Max Planck Institut fur Informatik of Saarbruken has lead to an apparently very relevant data structure for the organization and fast retrieval of very large genome data sets [BCF99].

This group participates into the following research projects:

1. Principl investigator. UNESCO project on "Large data processing: methods and applications", with focus on textual data and literary assetts. Together with Max Planck Institut fur Informatik of Saarbruken, Germany; The University of Warsaw, Poland; Mu'tah University, Jordan.

2. NATO grant on "Mathematical models and algorithms for coarse-grained computation. Together with INRIA of Sophia Antipolis, France; Carleton University of Ottawa, Canada.

3. Principal investigator. CNR of Italy, project in Bioinformatics.Together with the Universities of Milano, Parma, Genova, and Palermo.


Short term plans and expected results

As explained before, our research activities will be developed in the areas of: 1.computational models, 2. algorithms on very large data sets, and 3. computational biology. Although the greatest effort will be directed to the second area, all the activities will be strictly related. Therefore the following analytic description is somehow artificial, and is maintained for clarity of presentation.

1. Computational models.

Two major research tasks are foreseen in the analysis and validation of models with memory hierarchy, both starting with simple experimental activities to be later refined, together with a theoretical validation of the model. It should be noted that this approach is reverted if compared to our past research, because experimentation will precede theoretical conclusions. The activities of the first task will start with the collection of a repertoire of some well-known and studied problems, such as matrix multiplication and sorting on disks, to set up a benchmark test for the validation of different disk models proposed in the literature (in particular, Vitter'shierarchy model and the classical disk drive model). The problems will be chosen so that the predicted performance of the same algorithm in the two models may vary significantly. Extensive experiments will be conducted to compare the prediction of the different disk models. The Linux/Unix operating system will be adopted, and the performance of algorithms will be studied varying several parameters that determine the amount of I/O performed each time secondary storage is accessed. The second task has to do with parallel processing on a cluster of PCs. Here the focus will be on speed-up results, to set up a significant theory after experimental results will be gathered. Target problems will regard large string sets. Again, the aim is to predict and verify the performance of a coarse-grained model with local disks on problems for which well established algorithms are available and within the expertise of this group. Since the latency and bandwidth of present networks are at least one order of magnitude smaller than the same parameters of disks, a given processor will retrieve data from the central memory of another processor faster than from its own disk. As a consequence we predict, and expect to verify, superlinear speed-up over sequential computation, when the problem size exceeds certain bounds. For this research, a dedicated local network of ten PCs will be made available from another research group at the Dipartimento di Informatica of Pisa.

In the field of distributed algorithms, studies on patterns of faulty processors that prevent a whole system to work will be continued, mainly in cooperation with researchers of Carleton University of Ottawa.

As explained in section 2, the study of the limits of parallel computation has lead this group to some unexpected results in logical design. Although falling in a different field, we foresee that some attention will still be given to this research, to exploit the preliminary results produced thus far. For example, the definition of the logical network of a full multiplier has just started, under the new design approach .

In the field of molecular computation, the next step must be to seriously address the question of the practical feasibility in a laboratory of the different models proposed thus far. This is what some international groups are trying to realize, joining the efforts of biologists and computer scientists. We expect to follow this new phase of research in DNA computing, joining a European group now emerging.

2. Design of algorithms and applications on very large data sets.

Our research will crucially be centered on the problem of searching and updating large text collections which are stored in secondary storage devices, like disks and CD-ROMs. A main topic is related to the efficient construction of index data structures for large strings. Searching and updating a large index is clearly important, but the construction phase can be a serious bottleneck that can even prevent the to be used in practice. Current algorithms for suffix array or suffix tree construction do not seem to offer interesting performance, so we plan to investigate this topic more in detail. Another topic is the design and experimentation of novel indexing data structures for managing string and text collections in hierarchical memory (in particular, in secondary storage), and for supporting specialized search operations. We will focus on indexes for new mark-up languages (e.g. HTML, XML, TEI) which allow one to plug some "structure" into the documents by associating some semantic with their various constituent parts (e.g. title, section, paragraph, etc. etc.). Along these lines, we will extend the powerful data structures designed for "flat" textual documents to the new tagged documents, in order to allow for "semantic" searches. Specifically, we will start out from the String B-tree to new indexing data structures offering good trade-offs between time access and space occupancy. This will be the main activity and will constitute the technical prerequisite for the developments in the application areas of string processing, textual data processing and digital libraries. Additionally, we will investigate novel indexing data structures and algorithms for structured and semi-structured documents (HTML, XML, TEI) with experiments on real tagged data: WEB pages and literary documents. A deliverable software library is the most rewarding outcome of the design of efficient algorithms and data structures. LEDA, an acronym for "Library of Efficient Data types and Algorithms", is a well known example of sizeable collection of basic data types and algorithms for geometrical and combinatorial problems developed by Kurt Mehlhorn and his reasearch group. Among its numerous and precious features there are: portability and high-level specification of data structures. LEDA is very well known in academia and it is getting much used in commercial settings. Although strings are supported in LEDA, they are provided as basic data types only without any data structural or algorithmic support. We intend to provide a new library, to extend LEDA to string data sets, to take into account the memory hierarchy and integrates the research activities on index data structures described so far. We will extend LEDA to string processing, in order to provide a platform available to theoretician for experimenting with new string algorithms, to practitioners for prototyping and developing new programs, and to general users for easily learning string processing.

Testing software is the most time-consuming task in the process of creating a library. Having a benchmark widely accepted in the scientific community helps a lot in code tuning. In this context ACM and EATCS sponsored CATS, an acronym for "Combinatorial Algorithms Test Sets", to facilitate experimental research, algorithm selection for applications, and high-quality implementations of advanced algorithms and data structures. New experiments, problems and benchmarks are periodically added to the official WEB site (www.jea.acm.org/CATS) under the revision of the Editorial Board. String processing has not yet been covered, although its importance is widely recognized in Computer Science. Because of this lack, the proponents of this research group are in charge of setting up a European mirror of CATS at the University of Pisa and will be part of the Editorial Board for the string processing problems. We therefore will provide extensive textual data sets characterized by several combinatorial parameters (e.g., frequency of words or q-grams, entropy, longest repetitions) to help algorithm designers to compare algorithms, to collect other good test sets and to identify significant subproblems that can be solved very efficiently. The target of this work is to contribute to the creation of a systematic experimental methodology in the area of string algorithms (analogously to what has been done in Data Compression and Numerical Analysis).

3. Computational biology.

In the field of genome analysis we are currently exploring techniques for the inference of gene duplications inside a single genome, based on the "tranformation distance". Our main short term goal is to obtain a tree structure from pairwise distances. From this structure we expect to be able to infer a hierarchy that might correspond to the evolution of the genome itself. Biologists who are collaborating with us are responsible to validate our results and steer our choices.


Long term scenarios

It does not seem particularly significant to analyze this point in depth, but some lines of tendency can be clearly drawn. The studies on computational models will remain of great importance in the next several years, until some basic conditions will be hopefully met. Among them, a universally acceptable model of parallel computation should arise, capable of capturing the main features of data processing on real life parallel machines. An even more stringent need exists in distributed computing. The interrelations between the two models will also have to be fully understood. In addition, models of molecular computation will have to be definitely validated by physical experiments. This group will certainly be active in this general area. The amount of available electronic data will continue to increase at spectacular speed in all fields, and the need for advanced data structures will be more stringent every year. On the theoretical side, it is now hard to foreseen the direction to take in a long term period. This will be largely decided along the way, on the basis of the theoretical and experimental results that the group will be able to attain in the next two years. In any case, it is certain that experimentation on data structures, and construction of utility libraries, will continue in a span of several years. If the cooperation recently started with groups of biologists of Pisa, Florence and Paris will show fruitfull in a short term, the development of algorithms for genome analysis will continue, possibly resuming on new bases the former ideas and results on context dependent analysis. A necessary condition for the continuation of these studies is that, as we strongly hope, some results of biological relevance be found within the next two years.


Short curricula of the participants

Paolo Ferragina

Received the "Laurea in Scienze dell'Informazione" summa cum laude in 1992 and a Ph.D. in Computer Science in 1996, both from the University of Pisa. From Septemberb 1997 to September 1998, Post-Doc at the Max-Planck-Institut fur Informatik, Saarbrucken, Germany. From October 1998, Researcher (equivalent to Assistant Professor) at Dipartimento di Informatica, University of Pisa, Italy. Besides at the Max-Planck-Institut fur Informatik, he has spent research periods at the IBM Research Center in Rome, the Computer Science Department of the University of North Texas in Denton TX, AT&T Bell Laboratories in Murray-Hill, NJ. He has been invited speaker in several conferences and research symposia, including the German Conference on BioInformatics (GCB `97), the International School on Computational Biology at CISM, and the conference on Foundations of Software Technology and Theoretical Computer Science (FST & TCS 99) to be held in Chennai (India). His Ph.D. thesis was awarded the Doctoral Dissertation Thesis Award from the European Association for Theoretical Computer Science (EATCS) in 1996, and was ranked among the four finalists of the ACM Doctoral Dissertation Thesis Award in 1997. His first journal paper "Optical recognition of motor vehicle license plates" received the 1995 Best Land Transportation Paper Award from IEEE Vehicular Technology Society. His current research interests are in the analysis and design of algorithms and data structures; in particular ,very large sets of structured and textual data, and external memory usage. He is also interested in practical development of libraries oriented to strings, for textual and biological applications.

Roberto Grossi

Received the "Laurea in Scienze dell'Informazione" summa cum laude in 1988 and a Ph.D. in Computer Science in 1993, both from the University of Pisa. From 1993 to 1998, Assistant Professor in the Department of Systems and Computer Science at the University of Florence. Currently, Associate Professor at the University of Pisa, in the research group on algorithms and data structures at the Department of Informatica. Visiting graduate student in the Department of Computer Science at Columbia University. Visiting researcher at AT&T Bell Laboratories, International Institute of Computer Science at Berkeley, and Institute for Basic Research in Computer Science in Aarhus. (Co)author of more than thirty publications on international journals, such as Discrete Applied Mathematics, Information Processing Letters, Information and Computation, Journal of ACM, Journal of Algorithms, Journal of Complexity, SIAM Journal on Computing, Theoretical Computer Science, and refereed international conferences. Current interests in basic research on analysis and design of algorithms and data structures with particular emphasis on dynamic and external memory settings; experimental work to test the algorithmic performance of external memory computations. Interests in pattern matching on strings, matrices and trees with applications to information retrieval, digital libraries, databases and text indexing.

Fabrizio Luccio

Received the Dr.Ing. degree in electrical engineering from the Politecnico di Milano, Italy, in 1962, and the Libera Docenza in electronic computers from the Italian university system in 1968. He is currently a professor of computer science at the University of Pisa, Italy. After an industrial experience with Olivetti, he joined the Politecnico di Milano starting his research activity in logical design and in programming language translation. In 1966 he moved to M.I.T. as a staff member at Project MAC, working in compiling techniques. He then became a professor at the University of Southern California, and then at New York University, pursuing research in theoretical and algorithmic aspects of logical network synthesis. In 1971 he returned permanently to Italy, as lecturer and later professor of informatics at the University of Pisa. He spent several sabbatical periods as a visiting professor at U.C.L.A., the University of Illinois, the National University of Hawaii, and Carleton University in Ottawa. He has been a visiting scientist at T.J. Watson Research Center of IBM, and at the NTT LSI Laboratories in Morinosato, Japan. He is author of over one hundred research papers published in international journals and conference proceedings, and of three text books of large diffusion. His current research interests are in algorithm design, and in the relationship between abstract computational models and realistic computers and circuits. Professor Luccio is a fellow of IEEE and a member of ACM.

Linda Pagli

Received the "Laurea in Scienze dell'Informazione" from the University of Pisa, Italy, in 1973. Since that year, she was associated as a researcher, with the Department of Informatica of the University of Pisa. In 1987 she was appointed Full Professor of Computer Science at the University of Salerno, to return to the university of Pisa in 1990. She teaches in the fields of ``Algorithm and Data Structures'' and ``Parallel and Distributed Computing''. In 1991 she has been Visiting Professor at Carleton University of Ottawa, Canada. She was then appointed coordinator of the study programs of the Department of Computer Science of Pisa for one triennium. Under her direction the study programs were completely revised. In the framework of a UNESCO program she has given many advances courses in informatics for university professors of developing countries. She has also been a professor at the National University of Somalia, and a lecturer at the university of Mu'tah in Jordan. She is referee of several international journal, and has served in the program commettee of many international conference. Her research activity, started in the field of data structures and sequential algorithm, has developed in the field of parallel and distributed algorithms and models of computation, which constitute her major present inetersts. Her research results are attested by more than fifthy papers published in leading scientific journals and proceedings of international conferences.

Ph.D. students: Gabriele Lenzini, Giuseppe Prencipe, Nadia Pisanti, Valentina Ciriani

All graduates of the university of Pisa. Respectively at the fourth, third, second and first year of their four years doctoral curriculum in Informatics. Lenzini is active in bioinformatics, and has published in the field of phylogenetic trees. Prencipe is spending a one year period at Carleton University in Ottawa, working in a joint research progect on the theory and implementation of parallel algorithms. Pisanti got a DEA degree from the Universite Paris VII in Informatics with applications in biology, and is pursuing research in this field. Among other papers, she has published a very well known survey: "DNA computing: a survey", Bulletin of the EATCS 64 (1998) 188--216. Ciriani is starting now.


Some recent publications of the participants

[BCF99] Burkardt S., Crauser A., Ferragina P., Lenhof H.P., Rivals E., Vingron M., "q-gram based database searching using a suffix array", Proc. RECOMB (1999) 77-83.

[CFM98] Crauser A., Ferragina P., Mehlhorn K., Meyer U., Ramos E., "Randomized external-memory algorithms for some geometric problems", Proc.ACM Symposium on Computational Geometry (1998),259--268.

[DMP98] De Prisco R., Monti A. and Pagli L., "Testing and Reconfiguration of VLSI Linear arrays", Theoretical Computer Science 197 (1998) 171-188.

[F97] Ferragina P., "Dynamic text indexing under string updates", Journal of Algorithms 22 (1997) 296-328.

[FFM98] Farach M., Ferragina P., S. Muthukrishnan. "Overcoming the memory bottleneck in suffix tree construction", Proc. IEEE FOCS (1998) 194-183.

[FG99] Ferragina P. and Grossi R., "Optimal on-line search and sublinear time update in string matching", SIAM Journal on Computing (to appear 1999). Preliminary conference version in: Proc. IEEE FOCS (1995) 604-612.

[FG95/99] Ferragina P.and Grossi R., "The String B-Tree: A New Data Structure for String Search in External Memory and its Applications", Journal of the ACM (to appear 1999). Preliminary conference version in: Proc. ACM STOC (1995) 693-702.

[FL96] Ferragina P. and Luccio F., "On the parallel dynamic dictionary matching problem: new results with applications", LNCS 1136, Springer-Verlag (1996) 261-275 (Proc. ESA96).

[FL98] Ferragina P., Luccio F., "Dynamic Dictionary Matching in External Memory", Information and Computation 146 (1998) 85-99.

[FL99] Ferragina P., Luccio F., "String Search in Coarse-Grained Parallel Computers", Algorithmica (1999, to appear).

[FLL98] Flocchini P., Lodi E., Luccio F., Pagli L. and Santoro N.: "Irreversible dynamos in tori", LCNS 1470 (1998), 554-562 ( Proc.Euro-Par 98 ).

[FMB99] Ferragina P., Muthukrishnan S., deBerg M., "Multi-Method dispatching: A geometric approach with applications to string matching", Proc. ACM STOC (1999) 483-491.

[GG99] Giancarlo R.and Grossi R., "Parallel Construction and Query of Index Data Structures for Pattern Matching on Square Matrices", J ournal of Complexity (to appear 1999).

[GI98] Grossi R., Italiano G.F, "Efficient Splitting and Merging Algorithms for Order Decomposable Problems'', Information and Computation (to appear 1999). Preliminary conference version in: Proc. ICALP (1997)

[GI99] Grossi R., Italiano G.F, "Efficient Techniques for Maintaining Multidimensional Keys in Linked Data Structures'' Proc. ICALP (1999).

[LMO98] Luccio F., Mahafzah M., Omari M. and Pagli L., ``Routing with the use of masks'', Proc. 5-th Internat. Coll. on Structural Information and Communication Complexity, 1998.

[LP95] Luccio F.and Pagli L., "Approximate matching for two families of trees", Information and Computation 123 (1995) 111-120.

[LP97] Luccio F.and Pagli L., ``An insight on PRAM computational bounds'', Information Processing Letters 63 (1997) 331-336.

[LP98] Luccio F. and Pagli L., "Computing with time-varying data: sequential complexity and parallel speed up", Theory of Computing Systems (formerly: Mathematical Systems Theory) 31, 1 (1998) 5-26.

[LP99] Luccio F. and Pagli L., "On a new Boolean function with applications", IEEE Transactions on Computers 48, 3 (1999) 296-310.


Collaborators

N. Santoro (Carleton University, Ottawa); F. Dehne (Carleton University, Ottawa); R. Giancarlo (University of Palermo); G. Italiano (University of Rome); S. Muthukrisnan (Bell Labs, Murray Hil); P.Crescenzi (University of Forence); E. Lodi (University of Siena); M. Buiatti (Biologist, University of Florence); R. Marangoni (Biologist, University of Pisa); A. Viari (Biologist, University of Paris VI).


Keywords

Algorithms, Strings, Models of Computation, Very Large Data Sets, Computational Molecular Biology


Index Page