The QSGisBingRank_DS is a dataset designed for evaluating the relatedness between (queries, snippets).

It consists of 38 text queries selected from the 27 categories of Google Insights for Search webpage trends (closed on September 27, 2012), after removing duplicates, atemporal queries and queries with multiple meanings.

Each query was issued in Bing search engine on August 2012, collecting the top best 50 relevant web results, using for this purpose the Bing Web search API, parameterized with the en-US market language parameter.

The final set consists of 1900 distinct web snippets, of which 543 have candidate years.

Based on this we formed two distinct datasets. The first one, denoted Temp_DS, comprises only those snippets having temporal features retrieved per each query. TempTopic_DS on its turn, includes the set of 50 web snippets retrieved for each query.

The ground truth was then obtained over this dataset by assigning a relevance label to each (q, Si) on a 4-level scale:

  (0) Not Relevant;

  (1) Fair;

  (2) Good;

  (3) Excellent;

Based on this, a snippet containing both temporal and conceptual information matching the query needs is considered to be extremely relevant and is labeled with a score of 3. It is worth noting that relevant snippets without year temporal information may also get a score of 3 (e.g. “Amy Winehouse consumed a very large quantity of alcohol before dying at her London home, a pathologist said Wednesday as she declared Winehouse's demise...” for the query “Amy Winehouse”).

In the opposite direction, a snippet that is not conceptually, nor temporally relevant, gets a score of 0. Similarly, snippets having a year temporal reference may end up getting a score of 0 (e.g. “©2011 EA Fragrances Co. Britney Spears™ is a trademark licensed to Elizabeth Arden, Inc. by Britney Brands, Inc.” for the query “Britney Spears”).


The QSGisBingRank_DS dataset consists of two folders described below:



