I am currently a research scientist at Facebook. I am mainly interested in the storage and indexing of large amounts of data, and applications to information retrieval and search engines. I am also interested in machine learning and computer vision.
I received my Ph.D. from the University of Pisa, where I also did a postdoc. After that I did another postdoc at ISTI-CNR. During my studies I worked for Ask.com and Bing, and done two internships at Microsoft Research Cambridge. You can find my CV here, or you can have a look at my LinkedIn page.
- Optimal Space-Time Tradeoffs for Inverted Indexes To appear in Proceedings of the ACM International Conference on Web Search and Data Mining (WSDM), 2015
- Fast and Space-efficient Entity Linking in Queries To appear in Proceedings of the ACM International Conference on Web Search and Data Mining (WSDM), 2015
Fast Compressed Tries through Path
In ACM Journal of Experimental
Algorithmics, October 2014
An earlier version appeared in Proceedings of the Meeting on Algorithm Engineering and Experiments (ALENEX), 2012
Partitioned Elias-Fano Indexes
In Proceedings of the
International ACM SIGIR Conference on Research and Development in
Information Retrieval (SIGIR), 2014
Best Paper Award!
Cache-Oblivious Peeling of Random
In Proceedings of the Data
Compression Conference (DCC), 2014
Space-Efficient Data Structures for
Collections of Textual Data
Ph.D. thesis, June 2013
Design of Practical Succinct Data
Structures for Large Data Collections
In Proceedings of the Symposium on
Experimental Algorithms (SEA), 2013
Compressible Motion Fields
In Proceedings of the Conference on
Computer Vision and Pattern Recognition (CVPR), 2013
Space-Efficient Data Structures
for Top-k Completion
In Proceedings of
the 22st World Wide Web Conference (WWW), 2013
The Wavelet Trie: Maintaining an
Indexed Sequence of Strings in Compressed Space
In Proceedings of the
Symposium on Principles of Database Systems (PODS),
Semi-Indexing Semi-Structured Data in Tiny Space
In Proceedings of the 20th ACM
Conference on Information and Knowledge Management (CIKM),
- emphf: A minimal perfect hashing library for large-scale key sets focused on speed and low memory usage. This implementation is significantly faster than similar libraries such as cmph. The algorithms implemented in the library are described in the paper Cache-Oblivious Peeling of Random Hypergraphs.
- path decomposed tries: An implementation of the data structures described in the paper Fast Compressed Tries using Path Decomposition.
- semi-index: An implementation of the algorithm described in the paper Semi-Indexing Semi-Structured Data in Tiny Space. The datasets used in the experiments can be downloaded here.
A collection of C++ succinct data structures. Many other
implementations of the same structures already exist; this
library however has some rare or unique features: all the
code is 64-bit clean, so it can support very big
datasets. Furthermore, the serialization scheme is designed
to allow the binary data to be memory-mapped instead of
loaded into memory.
It also fares extremely well compared to other implementations. In particular, the data structure used for balanced parentheses is very fast.
A tool to compile and install packages in a sandboxed
directory. Like virtualenv,
but not restricted to python packages.
It is very useful during development, when specific versions of the libraries must be compiled and installed. The directory containing the sandbox can be moved (even to another machine) after the packages have been compiled, even if during compilation the prefix path is hardcoded in the compiled files.
It works under Linux and OSX.
An implementation of a JIT compiler on top of Python
2.5.2. It works by translating the Python bytecode into LLVM
bitcode which calls back the CPython runtime, basically
unrolling the interpreter loop.
No more work was done on this because of not very promising results (in particular the compilation times are very high with LLVM 2.5, but the situation may have improved with newer versions of the framework), and because Unladen Swallow was announced shortly after, which uses roughly the same approach. A short report on my experiments was posted in this thread in the Unladen Swallow mailing list.
- inpytex: A Python-based pre-processor for (La)TeX files. It executes Python snippets in comments and inserts their output under the comment. This script was written while working on my master's thesis to automate the creation of some TiKz figures and tables. It has not been used or maintained since then (except minor cosmetic changes).