About me
I am a Ph.D. student in compressed data structures, under the supervision of Roberto Grossi.
I am mainly interested in the storage and indexing of large amounts of data, and applications to search engines. I am also interested in geometric indexing of high-dimensional point sets, in particular for applications to media indexing and computer vision.
In the past I have worked for Ask.com and Bing, and I am doing an internship at Microsoft Research Cambridge. You can find my CV here, or you can have a look at my LinkedIn page.
I currently live in Cambridge, UK.
Publications
-
The Wavelet Trie: Maintaining an Indexed Sequence of Strings in Compressed Space
In Proceedings of the Symposium on Principles of Database Systems (PODS), Scottsdale, 2012
[arXiv] [slides] -
Fast Compressed Tries through Path Decompositions
In Proceedings of the Meeting on Algorithm Engineering and Experiments (ALENEX), Kyoto, 2012
[arXiv] [slides] [code] -
Semi-Indexing Semi-Structured Data in Tiny Space
In Proceedings of the 20th ACM Conference on Information and Knowledge Management (CIKM), Glasgow, 2011
[pdf] [slides] [code]
Code
- path decomposed tries: An implementation of the data structures described in the paper Fast Compressed Tries using Path Decomposition.
- semi-index: An implementation of the algorithm described in the paper Semi-Indexing Semi-Structured Data in Tiny Space. The datasets used in the experiments can be downloaded here.
-
succinct:
A collection of C++ succinct data structures. Many other
implementations of the same structures already exist; this
library however has some rare or unique features: all the
code is 64-bit clean, so it can support very big
datasets. Furthermore, the serialization scheme is designed
to allow the binary data to be memory-mapped instead of
loaded into memory.
It also fares extremely well compared to other implementations. In particular, the data structure used for balanced parentheses is very fast.
It is used in semi-index and some other yet-unpublished projects. -
bpt:
A tool to compile and install packages in a sandboxed directory. Like
virtualenv, but not restricted to python packages.
It is very useful during development, when specific versions of the libraries must be compiled and installed. The directory containing the sandbox can be moved (even to another machine) after the packages have been compiled, even if during compilation the prefix path is hardcoded in the compiled files.
Works on Linux and OSX.
Contacts
- e-mail (university): ottavian@di.unipi.it
- e-mail (personal): giuott@gmail.com
- Twitter: ot_y