--- AOL_DS ---

http://www.ccc.ipt.pt/~ricardo/datasets/AOL_DS.html

http://www.ccc.ipt.pt/~ricardo/datasets/AOL_DS.zip (for downloading data)


DATASET REFERENCE

This dataset may be used for any research purposes upon referring the following reference:

Campos, R., Dias, G. & Jorge, A. (2011). What is the Temporal Value of Web Snippets? In Proceedings of the 1st International Temporal Web Analytics Workshop (TWAW2011) associated to the 20th International World Wide Web Conference (WWW2011), pp 9 – 16, Hyderabad, India, 28th March, ISSN 1613 - 0073.

 

SUMMARY

The AOL_DS is a dataset designed for evaluating the temporal value of web query logs.

It consists of 21.011.240 queries extracted from a previous release of AOL search engine collected from 650.000 users over three months (01 March, 2006 - 31 May, 2006)

Of the initial collection of 21.011.240 queries, we were left with 10,154,742 queries after removing doubled entries. Over this collection, we executed a rule based model so as to detect only those queries with years, particularly those belonging to the period of [1000 - 2090].

We ended up with 143.590 explicit temporal queries, from which, a representative sample of 601 queries, denoted Q601, was selected.

To reach this value, we rely on the work of Barbetta et al. and define a maximum tolerated average sampling error of E=4% for a confidence interval of 95%, where Zp, which in this case equals 1.96, is the p-th quantileof the normal distribution and  is the determined number of queries.

The AOL_DS dataset is an Excel file consisting of four spreadsheets described below:

 

OTHER REFERENCES

More details on this dataset can be found in the following papers:

Campos, R., Dias, G. & Jorge, A. (2011). What is the Temporal Value of Web Snippets? In Proceedings of the 1st International Temporal Web Analytics Workshop (TWAW2011) associated to the 20th International World Wide Web Conference (WWW2011), pp 9 – 16, Hyderabad, India, 28th March, ISSN 1613 - 0073.

 

DOWNLOAD

http://www.ccc.ipt.pt/~ricardo/datasets/AOL_DS.zip

 

MORE INFO

If you have any further questions, please contact Ricardo Campos (ricardo.campos@ipt.pt).