--- Analysis of Dates and Query Classification ---

http://www.ccc.ipt.pt/~ricardo/experiments/AnalysisOfDates.html

http://www.ccc.ipt.pt/~ricardo/experiments/AnalysisOfDates.zip (for downloading data)


DATASET REFERENCE

This dataset may be used for any research purposes upon referring the following reference:

Campos, R., Dias, G. & Jorge, A. (2011). What is the Temporal Value of Web Snippets? In Proceedings of the 1st International Temporal Web Analytics Workshop (TWAW2011) associated to the 20th International World Wide Web Conference (WWW2011), pp 9 – 16, Hyderabad, India, 28th March, ISSN 1613 - 0073.

 

SUMMARY

In our research we are particularly interested in using information retrieval techniques to exploit temporal information existing in web contents, particularly in web snippets. To this end we have conducted a set of experiments in order to evaluate the feasibility of such approach.

We rely on a dataset consisting of 465 queries manually extracted from Google Insights for Search, which registers the hottest and rising searches performed worldwide in a given period of time. We collected queries belonging to the period of Jan 2010 – Oct 2010.

Each query is classified with regard to its conceptual and temporal type. To a more comprehensive analysis please refer to the GISQC_DS dataset.

Upon these queries, we conducted three experiments in December 2010 using our meta-search engine VipAccess parameterized to run over Yahoo and Bing search engines. We are especially interested in running queries without any explicit temporal information, given that the existence of any date associated to the query would end up by influencing the temporal value of the results. As a consequence, and in adition to the initial set of 465 queries, we considered a new dataset consisting of 450 implicit temporal queries (465 queries - those 15 explicit).

Depending on the experiment we retrieved for each query, 20 and 100 results. As such, we may observe any variations that may exist due to different amounts of retrieved data. In practice this corresponds aproximately to 40 and 200 results respectively, given we base our searches over Bing and Yahoo.

We are particularly interested in studying the existence of temporal information in web documents, specifically within web snippets. Thus we decided to extract dates, particularly year dates of the period [1000 - 2090] within each of the retrieved snippet, title and url.

In order to better understand and determine the temporal value of each of the queries, we assessed the percentage of web snippets, titles and urls having temporal features. We register this information under three basic measures: TSnippets(q), TTitle(q) and TUrl(q), where q is the query.

Particularizing, TSnippets is the computed ratio between the number of snippets returned with dates, divided by the total number of snippets returned. TTitle and TUrl are computed similarly.

The GISQC_Experiment is an Excel file consisting of twenty six spreadsheets described below:

Moreover we have also conducted a user survey on the temporal classification of clear concept queries. Results are described in the purple spreadsheets.

A further analysis was made on the comparison between TTitles, TSnippets, TURLs and TLogYahoo and TLogGoogle. Results of this analysis are in the brown spreadsheets:

 

OTHER REFERENCES

More details on this dataset can be found in the following papers:

Campos, R., Dias, G. & Jorge, A. (2011). What is the Temporal Value of Web Snippets? In Proceedings of the 1st International Temporal Web Analytics Workshop (TWAW2011) associated to the 20th International World Wide Web Conference (WWW2011), pp 9 – 16, Hyderabad, India, 28th March, ISSN 1613 - 0073.

Campos, R., Jorge, A. and Dias, G. (2011). Using Web Snippets and Query-logs to Measure Implicit Temporal Intents in Queries. In Proceedings of the Query Representation and Understanding Workshop (QRU 2011) associated to 34th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR2011) Beijing, China, 28 July, pp 13 - 16.

 

DOWNLOAD

http://www.ccc.ipt.pt/~ricardo/experiments/AnalysisOfDates.zip

 

MORE INFO

If you have any further questions, please contact Ricardo Campos (ricardo.campos@ipt.pt).