Pilot Projects Year 3

EXPANSION MODELS FOR BIOMEDICAL DATA SEARCH

PI: Craig Willis, University of Illinois Urbana-Champaign

Searching for biomedical information often requires the use of specialized language or vocabularies that may not always be reflected in users’ queries. Feedback-based query expansion models have been demonstrated to alleviate this problem and improve search engine performance across many domains by automatically expanding the user's query to include additional terms related to their information need. However, in the area of dataset retrieval, metadata describing datasets is often sparse, incomplete or otherwise inadequate for query expansion.

The goal of this project is to prototype and evaluate expansion models for integration into the DataMed system. Preliminary work based on our participation in the 2016 BioCADDIE Dataset Retrieval Challenge suggests that expanding the query based on information from external collections such as PubMed can improve overall retrieval effectiveness. We will evaluate several expansion models with both general and domain-specific collections, with the ultimate goal of improving DataMed search effectiveness.


IMPROVING SEARCH RANKING FOR BIOMEDICAL DATASET RETRIEVAL WITH MACHINE LEARNING AND RELEVANCE FEEDBACK

Leaders: Eugene Agichtein, PhD., and Payam Karisani, Emory University

This project will develop and integrate a Learning-to-Rank (LTR) module into the BioCADDIE retrieval framework. As there are many possible signals which can capture the usefulness of a dataset for a user's query, it is difficult to manually design a ranking function that will perform well for all cases. Instead, this project will automatically learn a ranking function, or multiple functions, most effective for the available features, expected queries, and the supported datasets. The learning-to-rank code, as well as the ranking methods using the learned models, will be incorporated into the BioCADDIE framework. Additionally, if time permits, this project will investigate the use of the searchers' feedback, explicit or implicit, as additional relevance signals, and will incorporate this feedback as features into the ranking model to enable the BioCADDIE search to improve over time.