Giuseppe Ottaviano

About me

I am currently a research scientist at Facebook. I am mainly interested in the storage and indexing of large amounts of data, and applications to information retrieval and search engines. I am also interested in machine learning and computer vision.

I received my Ph.D. from the University of Pisa, where I also did a postdoc. After that I did another postdoc at ISTI-CNR. During my studies I worked for Ask.com and Bing, and done two internships at Microsoft Research Cambridge. You can find my CV here, or you can have a look at my LinkedIn page.

Publications

Antonio Mallia, Giuseppe Ottaviano, Elia Porciani, Nicola Tonellotto, Rossano Venturini Faster BlockMax WAND with Variable-sized Blocks In Proceedings of the International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR), 2017
Laxman Dhulipala, Igor Kabiljo, Brian Karrer, Giuseppe Ottaviano, Sergey Pupyrev, Alon Shalita Compressing Graphs and Indexes with Recursive Graph Bisection In Proceeding of the ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD), 2016
[arXiv]
Marco Genuzio, Giuseppe Ottaviano, Sebastiano Vigna Fast Scalable Construction of (Minimal Perfect Hash) Functions In Proceedings of the International Symposium on Experimental Algorithms (SEA), 2016
[arXiv]
Giuseppe Ottaviano, Nicola Tonellotto, Rossano Venturini Optimal Space-Time Tradeoffs for Inverted Indexes In Proceedings of the ACM International Conference on Web Search and Data Mining (WSDM), 2015
[pdf] [code]
Roi Blanco, Giuseppe Ottaviano, Edgar Meij Fast and Space-efficient Entity Linking in Queries In Proceedings of the ACM International Conference on Web Search and Data Mining (WSDM), 2015
[pdf] [code]
Roberto Grossi, Giuseppe Ottaviano Fast Compressed Tries through Path Decompositions In ACM Journal of Experimental Algorithmics, October 2014
[pdf] [code]
An earlier version appeared in Proceedings of the Meeting on Algorithm Engineering and Experiments (ALENEX), 2012
[arXiv] [slides]
Giuseppe Ottaviano, Rossano Venturini Partitioned Elias-Fano Indexes In Proceedings of the International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR), 2014 Best Paper Award!
[pdf] [code] [slides (pptx)] [slides (pdf)]
Djamal Belazzougui, Paolo Boldi, Giuseppe Ottaviano, Rossano Venturini, Sebastiano Vigna Cache-Oblivious Peeling of Random Hypergraphs In Proceedings of the Data Compression Conference (DCC), 2014
[extended version on arXiv] [code]
Giuseppe Ottaviano Space-Efficient Data Structures for Collections of Textual Data Ph.D. thesis, June 2013
[pdf]
Roberto Grossi, Giuseppe Ottaviano Design of Practical Succinct Data Structures for Large Data Collections In Proceedings of the Symposium on Experimental Algorithms (SEA), 2013 Invited paper
[pdf]
Giuseppe Ottaviano, Pushmeet Kohli Compressible Motion Fields In Proceedings of the Conference on Computer Vision and Pattern Recognition (CVPR), 2013
[pdf] [supplementary material]
Paul Hsu, Giuseppe Ottaviano Space-Efficient Data Structures for Top-k Completion In Proceedings of the 22st World Wide Web Conference (WWW), 2013
[pdf] [slides]
Roberto Grossi, Giuseppe Ottaviano The Wavelet Trie: Maintaining an Indexed Sequence of Strings in Compressed Space In Proceedings of the Symposium on Principles of Database Systems (PODS), 2012
[arXiv] [slides] [poster]
Giuseppe Ottaviano, Roberto Grossi Semi-Indexing Semi-Structured Data in Tiny Space In Proceedings of the 20th ACM Conference on Information and Knowledge Management (CIKM), 2011
[pdf] [slides] [code]

Code

ds2i: A collection of algorithms and data structures for inverted indexes, described in the papers Partitioned Elias-Fano Indexes and Optimal Space-Time Tradeoffs for Inverted Indexes.
emphf: A minimal perfect hashing library for large-scale key sets focused on speed and low memory usage. This implementation is significantly faster than similar libraries such as cmph. The algorithms implemented in the library are described in the paper Cache-Oblivious Peeling of Random Hypergraphs.
path decomposed tries: An implementation of the data structures described in the paper Fast Compressed Tries using Path Decomposition.
semi-index: An implementation of the algorithm described in the paper Semi-Indexing Semi-Structured Data in Tiny Space. The datasets used in the experiments can be downloaded here.
succinct: A collection of C++ succinct data structures. Many other implementations of the same structures already exist; this library however has some rare or unique features: all the code is 64-bit clean, so it can support very big datasets. Furthermore, the serialization scheme is designed to allow the binary data to be memory-mapped instead of loaded into memory.
It also fares extremely well compared to other implementations. In particular, the data structure used for balanced parentheses is very fast.
bpt: A tool to compile and install packages in a sandboxed directory. Like virtualenv, but not restricted to python packages.
It is very useful during development, when specific versions of the libraries must be compiled and installed. The directory containing the sandbox can be moved (even to another machine) after the packages have been compiled, even if during compilation the prefix path is hardcoded in the compiled files.
It works under Linux and OSX.
python-llvm-jit: An implementation of a JIT compiler on top of Python 2.5.2. It works by translating the Python bytecode into LLVM bitcode which calls back the CPython runtime, basically unrolling the interpreter loop.
No more work was done on this because of not very promising results (in particular the compilation times are very high with LLVM 2.5, but the situation may have improved with newer versions of the framework), and because Unladen Swallow was announced shortly after, which uses roughly the same approach. A short report on my experiments was posted in this thread in the Unladen Swallow mailing list.
inpytex: A Python-based pre-processor for (La)TeX files. It executes Python snippets in comments and inserts their output under the comment. This script was written while working on my master's thesis to automate the creation of some TiKz figures and tables. It has not been used or maintained since then (except minor cosmetic changes).

Contacts

e-mail (personal): giuott@gmail.com
e-mail (work): ott@fb.com
Twitter: ot_y