We will explore using text documents as a source of knowledge, extracting it by means of document analysis techniques which combine statistical and semantic analysis, including Natural Language Processing (language is still the most effective human mean of expressing and communicating knowledge).
The general approach is to analyse text identifying entities and relations, exploiting the dependency trees produced by our parser.
Question Answering is a challenging benchmark application for several of these techniques, from Information Retrieval (the IXE search library), to machine learning (e.g. Maximum Entropy and Named Entity Recognizer within the PiQASso QA system), to Natural Language parsing (our multilanguage dependency parser).
We will apply document analysis to tasks such as opinion mining or more generally intent classification, determining the purpose of a sentence or document, such as expressing: problem (description, solution), agreement (assent, dissent), preference (likes, dislikes), statement (claim, denial). This analysis could be used for instance in mining blogs.
We plan to improve and refine the various tools that we developed to perform text analysis. In particular we plan to improve the parser, by combining results from multiple learned classifiers to increase the accuracy.
We will also work on creating a TreeBank for Italian in order to train the parser on it. The TreeBank will be made available for use in the next edition of CoNLL, so that parsers built by other teams could also be tested on Italian.
Some specific aspects of a Question Answering system can also be studied in isolation include:
We will continue to participate in challenges like the TREC tasks and the CoNLL Shared Task, which provide an effective comparative measure of the achieved quality and performance of real systems on realistic benchmarks..