About me
I am a Ph.D. student at the Computer Science Departement of the University of Pisa, and I hold a research fellowship at the same department in collaboration with ISTI-CNR and Tiscali.
I am mainly interested in the storage and indexing of large amounts of data, and applications to information retrieval and search engines. I am also interested in machine learning and computer vision.
In the past I have worked for Ask.com and Bing, and done two internships at Microsoft Research Cambridge. You can find my CV here, or you can have a look at my LinkedIn page.
Publications
-
Design of Practical Succinct Data Structures
for Large Data Collections
To appear in Proceedings of
the International Symposium on Experimental Algorithms (SEA), 2013 [Invited Paper]
[pdf] -
Compressible Motion Fields
To appear in Proceedings of
the Conference on Computer Vision and Pattern Recognition
(CVPR), 2013
[pdf] [supplementary material] -
Space-Efficient Data Structures
for Top-k Completion
To appear in Proceedings of
the 22st World Wide Web Conference (WWW), 2013
[pdf] [slides] -
The Wavelet Trie: Maintaining an
Indexed Sequence of Strings in Compressed Space
In Proceedings of the
Symposium on Principles of Database Systems (PODS),
2012
[arXiv] [slides] [poster] -
Fast Compressed Tries through Path
Decompositions
In Proceedings of the Meeting
on Algorithm Engineering and Experiments (ALENEX),
2012
[arXiv] [slides] [code] -
Semi-Indexing Semi-Structured Data in Tiny Space
In Proceedings of the 20th ACM
Conference on Information and Knowledge Management (CIKM),
2011
[pdf] [slides] [code]
Code
- path decomposed tries: An implementation of the data structures described in the paper Fast Compressed Tries using Path Decomposition.
- semi-index: An implementation of the algorithm described in the paper Semi-Indexing Semi-Structured Data in Tiny Space. The datasets used in the experiments can be downloaded here.
-
succinct:
A collection of C++ succinct data structures. Many other
implementations of the same structures already exist; this
library however has some rare or unique features: all the
code is 64-bit clean, so it can support very big
datasets. Furthermore, the serialization scheme is designed
to allow the binary data to be memory-mapped instead of
loaded into memory.
It also fares extremely well compared to other implementations. In particular, the data structure used for balanced parentheses is very fast.
-
bpt:
A tool to compile and install packages in a sandboxed
directory. Like virtualenv,
but not restricted to python packages.
It is very useful during development, when specific versions of the libraries must be compiled and installed. The directory containing the sandbox can be moved (even to another machine) after the packages have been compiled, even if during compilation the prefix path is hardcoded in the compiled files.
It works under Linux and OSX. -
python-llvm-jit:
An implementation of a JIT compiler on top of Python
2.5.2. It works by translating the Python bytecode into LLVM
bitcode which calls back the CPython runtime, basically
unrolling the interpreter loop.
No more work was done on this because of not very promising results (in particular the compilation times are very high with LLVM 2.5, but the situation may have improved with newer versions of the framework), and because Unladen Swallow was announced shortly after, which uses roughly the same approach. A short report on my experiments was posted in this thread in the Unladen Swallow mailing list. - inpytex: A Python-based pre-processor for (La)TeX files. It executes Python snippets in comments and inserts their output under the comment. This script was written while working on my master's thesis to automate the creation of some TiKz figures and tables. It has not been used or maintained since then (except minor cosmetic changes).
Contacts
- e-mail (university): ottavian@di.unipi.it
- e-mail (personal): giuott@gmail.com
- Twitter: ot_y