Call for application - Ph.D. program in Computer Science (2008-2010)
The Department of Computer Science (Dipartimento di Informatica) of the University Cà Foscari of Venice has started a Ph.D. Program in Computer Science (Dottorato di Ricerca in Informatica) since the academic year 2000-2001. Four full scholarships are available for the call for application for the next three-year program, starting on January 1st, 2008. One of the four scholarships will be funded to conduct research in the field of Web Search, in collaboration with Ricardo Baeza-Yates (Yahoo! Research - Barcelona - Spain, and ICREA Professor at the Dept. of Technology of Univ. Pompeu Fabra of Barcelona - Spain) and the HPC-Lab of ISTI-CNR of Pisa.
PhD student in the field of Web Search
Title of the research: Improving efficiency and efficacy of Web Search Engines via query log analysis
Web Search Engines (WSEs), which have permitted users to face Web information overload, perhaps constitute the most important novelty of the information and communication technology (ICT) field in the last decade. Even if current WSEs are based on techniques of distributed and parallel Information Retrieval (IR), their design has created new challenges for researchers working in the field. One of the most important one deals with the very large and growing amount of information present in the Web. This large amount of information has called for novel IR approaches that not only improve scalability and efficiency of WSEs, but also the efficacy in the quality and relevance of discovered contents.
Moreover, despite the growing influence of current WSEs, whose distributed services are managed by large and important ITC industries in a centralized way, several researchers envision the future development of peer-to-peer (P2P) WSEs. The features of the P2P approach potentially offer enormous benefits for search capabilities in terms of scalability/efficiency, resilience to failures, and dynamic managements. Additionally, such P2P WSEs can potentially benefit from the intellectual contributions of large user communities.
The purpose of this research is to apply Web Mining technique to the optimization of WSE design, both centralized and P2P, where the exploited knowledge is concerned with their usage by analyzing, in various ways, the logs of user queries. Note that these logs can be collected in several loci, on clients, WSE servers, Web servers, and proxies. The logs constitute a sort of secondary data derived from the user interactions with the Web and a WSE, while the primary data are the Web contents or the associated WSE indexes. Since we aim to focus on the analysis of logs, we are interested in a particular research subject in the area of Web Mining, called Web Usage Mining.
Among the possible applications of this research, we envision the use of query log analyses to modify the replacement policies of a cache that stores WSE query results. In this case we aim to improve the cache hit-ratio, and thus to enhance the responsiveness and throughput of the underlying WSE. Other possible optimizations made possible by these analyses concern the optimal partitioning of data (or indices) among the parallel servers of a WSE, aiming at reducing the amount of servers working to resolve a WSE query, and thus the throughput of the distributed system. We can find applications of this intelligent data partitioning technique for P2P WSEs. Finally, query log analyses could be exploited to improve the efficacy of the WSEs, for example in improving the relevance of the results supplied to a user. For example, an on-line analysis of logs could be used to discover topics that suddenly become interesting for specific communities of users. On the basis of such topics, we could modify the algorithms to rank the relevant results, or to focus the search/crawling of important data to index.
For further information about this scholarship, refer to Salvatore Orlando, sending him an email (firstname.lastname@example.org).