Ricardo Campos
Belvedere, Vienna, AustriaGellért, Budapeste, HungriaPonte das Correntes, Budapeste, HungriaPraça dos Heróis, Budapeste, Hungria
  • Demos
  • Python Packages
  • APIs
  • APPs
  • Datasets
  • Experiments

Conta-me Histórias / Tell me Stories


Conta-me Histórias (Tell me stories) is a tool that allows to automatically generate temporal summarization of news collections. through a friendly user interface that enables anyone to explore and revisit events in the past. To select relevant stories and temporal periods, we rely on YAKE!, a key-phrase extraction algorithm developed by our research team, and event detection methods made available by the research community. Additionally, we offer the engine as an open source package that can be extended to support different datasets or languages. The work described here stems from our participation at the Arquivo.pt 2018 competition, where we have been awarded the first prize.

References

Interactive System for Automatically Generating Temporal Narratives [Article Download]

 

YAKE! - Yet Another Keyword Extractor


YAKE! (Best Short Paper of ECIR'18) is a light-weight unsupervised automatic keyword extraction method which rests on text statistical features extracted from single documents to select the most important keywords of a text. Our system does not need to be trained on a particular set of documents, neither it depends on dictionaries, external-corpus, size of the text, language or domain. To demonstrate the merits and the significance of our proposal, we compare it against ten state-of-the-art unsupervised approaches (TF.IDF, KP-Miner, RAKE, TextRank, SingleRank, ExpandRank, TopicRank, TopicalPageRank, PositionRank and MultipartiteRank), and one supervised method (KEA). Experimental results carried out on top of twenty datasets show that our methods significantly outperform state-of-the-art methods under a number of collections of different sizes, languages or domains. YAKE is available through a demo, a Python package and an API.

References

YAKE! Keyword Extraction from Single Documents using Multiple Local Features [Article]

A Text Feature Based Automatic Keyword Extraction Method for Single Documents [Article Download]

YAKE! Collection-independent Automatic Keyword Extractor [Article Download]

 

GTE-Cluster and GTE-Rank


Here we provide two user interfaces so that the research community can test the GTE-Cluster and the GTE-Rank temporal search engine applications. In order to retrieve the query results, we rely on the recently launched Bing Search API (5000 transactions/month allowed) parameterized with the en-US market language parameter to retrieve 50 results per query. The proposed solutions are computationally efficient and can easily be tested online. Although the main motivation of our work is focused on queries with temporal nature, the implemented prototypes allow the execution of any query including non-temporal ones. Below is a detailed description of both user interfaces.

GTE-Cluster

GTE-Cluster


GTE-Cluster is a temporal clustering search engine, which offers the user two options: to return all the clusters (including the non-relevant) or to return only the relevant oCThe values that appear in front of the cluster, reflect the similarity value computed by the GTE similarity measure. Note that clusters with a similarity value < 0.35 are considered non-relevant and marked in red. In contrast, relevant clusters are marked in blue.

 

 

 

GTE-Rank

GTE-Rank


GTE-Rank is a temporal re-ranking search engine, which offers the user two options: to return all the web snippets (including those not having dates) or to return only the web snippets with relevant dates. The number in red color is the ranking position initially obtained by Bing search engine. The values in front of the snippet ID, reflect the ranking value computed by the GTE-Rank methodology.

 

 

Below you can find a number of Python packages made available by our research team.

 

Conta-me Histórias / Tell me Stories


Conta-me Histórias (Tell me stories) is a tool that allows to automatically generate temporal summarization of news collections. through a friendly user interface that enables anyone to explore and revisit events in the past. Conta-me Histórias is available as an open source Python package that can be extended to support different datasets or languages. The work described here stems from our participation at the Arquivo.pt 2018 competition, where we have been awarded the first prize.

References

Interactive System for Automatically Generating Temporal Narratives [Article Download]

 

YAKE! - Yet Another Keyword Extractor


YAKE! (Best Short Paper of ECIR'18) is a light-weight unsupervised automatic keyword extraction method which rests on text statistical features extracted from single documents to select the most important keywords of a text. YAKE is available as an open source Python package.

References

YAKE! Keyword Extraction from Single Documents using Multiple Local Features [Article]

A Text Feature Based Automatic Keyword Extraction Method for Single Documents [Article Download]

YAKE! Collection-independent Automatic Keyword Extractor [Article Download]

 

Time-Matters


Time-Matters (winner of the Fraunhofer Portugal Challenge 2013 PhD Contest) is an algorithm that enables to extract relevant dates from a set of documents or multiple docs. Time-Matters is available as an open source Python package.

References

Campos, R., Dias, G., Jorge, A. and Nunes, C. (2017). Identifying Top Relevant Dates for Implicit Time Sensitive Queries. In Information Retrieval Journal. Springer, Vol 20(4), pp 363-398 [Article Download]

Campos, R., Dias, G., Jorge, A. and Nunes, C. (2016). GTE-Rank: a Time-Aware Search Engine to Answer Time-Sensitive Queries. In Information Processing & Management an International Journal. Elsevier, Vol 52(2), pp. 273-298 [Article Download]

Campos, R., Dias, G., Jorge, A., and Nunes, C. (2014). GTE-Cluster: A Temporal Search Interface for Implicit Temporal Queries. In M. de Rijke et al. (Eds.), Lecture Notes in Computer Science - Advances in Information Retrieval - 36th European Conference on Information Retrieval (ECIR2014). Amesterdam, Netherlands, 13 - 16 April. (Vol. 8416-2014, pp. 775 - 779) [Article Download]

Campos, R., Jorge, A., Dias, G. and Nunes, C. (2012). Disambiguating Implicit Temporal Queries by Clustering Top Relevant Dates in Web Snippets. In Proceedings of The 2012 IEEE/WIC/ACM International Joint Conferences on Web Intelligence and Intelligent Agent Technologies Macau, China, 04 - 07 December, Vol. 1, pp 1 - 8. IEEE Computer Society Press. [Article Download]

 

 

Python Wrapper for Heideltime Temporal Tagger


py_heideltime is a python wrapper for the famous Heideltime temporal tagger. Py_heideltime is available as an open source Python package.

References

Strötgen, Gertz: Multilingual and Cross-domain Temporal Tagging. Language Resources and Evaluation, 2013. [Article Download]

 

 

Rule-Based Temporal Expression Detection


py_rule_based is a simple temporal expression detection (mostly year-based) supported by regex rules. Py_rule_based is available as an open source Python package.

 

Here we make available a number of APIs, so that each software can be easily tested by the research community.

 

Conta-me Histórias / Tell me Stories


Conta-me Histórias (Tell me stories) is a tool that allows to automatically generate temporal summarization of news collections. through a friendly user interface that enables anyone to explore and revisit events in the past. The work described here stems from our participation at the Arquivo.pt 2018 competition, where we have been awarded the first prize. Conta-me Histórias is available as an API that can be invoked by means of an interface or programatically (through its endoint). In any case, it will always return a JSON file as a result.

References

Interactive System for Automatically Generating Temporal Narratives [Article Download]

 

 

YAKE! - Yet Another Keyword Extractor


YAKE! (Best Short Paper of ECIR'18) is a light-weight unsupervised automatic keyword extraction method which rests on text statistical features extracted from single documents to select the most important keywords of a text. YAKE is available as an API that can be invoked by means of an interface or programatically (through its endoint). In any case, it will always return a JSON file as a result.

References

YAKE! Keyword Extraction from Single Documents using Multiple Local Features [Article]

A Text Feature Based Automatic Keyword Extraction Method for Single Documents [Article Download]

YAKE! Collection-independent Automatic Keyword Extractor [Article Download]

 

 

Time-Matters


Time-Matters (winner of the Fraunhofer Portugal Challenge 2013 PhD Contest) is an algorithm that enables to extract relevant dates from a set of documents or multiple docs. Time-Matters is available as an API that can be invoked by means of an interface or programatically (through its endoint). In any case, it will always return a JSON file as a result.

References

Campos, R., Dias, G., Jorge, A. and Nunes, C. (2017). Identifying Top Relevant Dates for Implicit Time Sensitive Queries. In Information Retrieval Journal. Springer, Vol 20(4), pp 363-398 [Article Download]

Campos, R., Dias, G., Jorge, A. and Nunes, C. (2016). GTE-Rank: a Time-Aware Search Engine to Answer Time-Sensitive Queries. In Information Processing & Management an International Journal. Elsevier, Vol 52(2), pp. 273-298 [Article Download]

Campos, R., Dias, G., Jorge, A., and Nunes, C. (2014). GTE-Cluster: A Temporal Search Interface for Implicit Temporal Queries. In M. de Rijke et al. (Eds.), Lecture Notes in Computer Science - Advances in Information Retrieval - 36th European Conference on Information Retrieval (ECIR2014). Amesterdam, Netherlands, 13 - 16 April. (Vol. 8416-2014, pp. 775 - 779) [Article Download]

Campos, R., Jorge, A., Dias, G. and Nunes, C. (2012). Disambiguating Implicit Temporal Queries by Clustering Top Relevant Dates in Web Snippets. In Proceedings of The 2012 IEEE/WIC/ACM International Joint Conferences on Web Intelligence and Intelligent Agent Technologies Macau, China, 04 - 07 December, Vol. 1, pp 1 - 8. IEEE Computer Society Press. [Article Download]

 

Below you can find a number of APPs developed by our team.

 

Conta-me Histórias / Tell me Stories


Conta-me Histórias (Tell me stories) is now available on Google Play

References

Interactive System for Automatically Generating Temporal Narratives [Article Download]

 

YAKE! - Yet Another Keyword Extractor


YAKE! is now available on Google Play

References

YAKE! Keyword Extraction from Single Documents using Multiple Local Features [Article]

A Text Feature Based Automatic Keyword Extraction Method for Single Documents [Article Download]

YAKE! Collection-independent Automatic Keyword Extractor [Article Download]

 


Query-Snippet Portuguese Google Trend Bing Ranking Dataset (QSPTGtBingRank_DS)


[QSPTGtBingRank_DS Webpage]

 

 

 

Query-Snippet Google Insights for Search Bing Ranking Dataset (QSGisBingRank_DS)


[QSGisBingRank_DS Webpage]

 

Web Content TREC Dataset (WC_TREC_DS)


[WC_TREC_DS Webpage]

 

 

Web Content Dataset (WC_DS)


[WC_DS Webpage]

 

Campos, R., Dias, G., Jorge, A. and Nunes, C. (2012). GTE: A Distributional Second-Order Co-Occurrence Approach to Improve the Identification of Top Relevant Dates in Web Snippets. In Proceedings of the 21st ACM International Conference on Information and Knowledge Management (CIKM 2012) Maui, Hawaii, October 29 - November 02, ISBN 978-1-4503-1156-4, pp 2035 - 2039. ACM Press

 

Query Logs Dataset (QLog_DS)


[QLog_DS Webpage]

 

Campos, R., Jorge, A. and Dias, G. (2011). Using Web Snippets and Query-logs to Measure Implicit Temporal Intents in Queries. In Proceedings of the Query Representation and Understanding Workshop (QRU 2011) associated to 34th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR2011) Beijing, China, 28 July, pp 13 - 16.

 

Query Logs Dataset (AOL_DS)


[AOL_DS Webpage]

 

Campos, R., Dias, G. & Jorge, A. (2011). What is the Temporal Value of Web Snippets? In Proceedings of the 1st International Temporal Web Analytics Workshop (TWAW2011) associated to the 20th International World Wide Web Conference (WWW2011), pp 9 – 16, Hyderabad, India, 28th March, ISSN 1613 - 0073.

 

Google Insights for Search Future Dates Dataset (GISFD_DS)


[GISFD_DS Webpage]

 

Campos, R., Dias, G. & Jorge, A. (2011). An Exploratory Study on the impact of Temporal Features on the Classification and Clustering of Future-Related Web Documents. In L. Antunes and H.S. Pinto (Eds.), Lecture Notes in Artificial Intelligence - Progress in Artificial Intelligence, - 15th Portuguese Conference on Artificial Intelligence (EPIA2011) associated to APPIA: Portuguese Association for Artificial Intelligence Lisbon, Portugal, 10 - 13 October. (Vol. 7026-2011, pp. 581 - 596). ISBN: 978-3-642-24768-2. DBLP. Springer. Thomson ISI Web of Knowledge. ACM Press.

 

 

Google Insights for Search Query Classification Dataset (GISQC_DS)


[GISQC_DS Webpage]

 

Campos, R., Dias, G. & Jorge, A. (2011). What is the Temporal Value of Web Snippets? In Proceedings of the 1st International Temporal Web Analytics Workshop (TWAW2011) associated to the 20th International World Wide Web Conference (WWW2011), pp 9 – 16, Hyderabad, India, 28th March, ISSN 1613 - 0073.

 

 

GTE-Rank: Evaluating GRank under a set of Portuguese queries by means of a crowdsourcing experiment


[GTE-Rank Crowdsourcing Experiment Webpage]

 

Campos, R., Dias, G., Jorge, A. and Nunes, C. (2016). GTE-Rank: a Time-Aware Search Engine to Answer Time-Sensitive Queries. In Information Processing & Management an International Journal. Elsevier, Vol 52(2), pp 273-298, ISSN 0306-4573.

 

GTE-Rank: Temporal Re-Ranking


[GTE-Rank Temporal Re-Ranking Experiment Webpage]

Campos, R., Dias, G., Jorge, A. and Nunes, C. (2016). GTE-Rank: a Time-Aware Search Engine to Answer Time-Sensitive Queries. In Information Processing & Management an International Journal. Elsevier, Vol 52(2), pp 273-298, ISSN 0306-4573.

 

GTE-Cluster: Flat Temporal Clustering


[GTE-Cluster Flat Temporal Clustering Experiment Webpage]

 

Campos, R., Dias, G., Jorge, A. and Nunes, C. (2012). Disambiguating Implicit Temporal Queries by Clustering Top Relevant Dates in Web Snippets In Proceedings da IEEE Main Conference Proceedings of the 2012 IEEE/WIC/ACM International Conference on Web Intelligence, Macau, China, December 04 – 07.

 

Comparing a Web Content approach (WC_DS) against a Query Log one (QLog_DS)


[WC_DS vs. QLog_DS Experiment Webpage]

 

Campos, R., Dias, G., Jorge, A. and Nunes, C. (2012). Enriching Temporal Query Understanding through Date Identification: How to Tag Implicit Temporal Queries? In Proceedings of the 2nd International Temporal Web Analytics Workshop (TWAW 2012) associated to 21th International World Wide Web Conference (WWW2012) Lyon, France, 17 April. ISBN 978-1-4503-1188-5, pp 41 – 48. ACM Press.

 

GTE: Comparing GTE against a number of different association measures


[GTE Experiment Webpage]

 

Campos, R., Dias, G., Jorge, A. and Nunes, C. (2012). GTE: A Distributional Second-Order Co-Occurrence Approach to Improve the Identification of Top Relevant Dates in Web Snippets. In Proceedings of the 21st ACM International Conference on Information and Knowledge Management (CIKM 2012), Maui, Hawaii, October 29 - November 02, ISBN 978-1-4503-1156-4, pp 2035 - 2039. ACM Press.

 

Classification and clustering of future-related texts


[Future Temporal Data Experiment Webpage]

 

Campos, R., Dias, G. & Jorge, A. (2011). An Exploratory Study on the impact of Temporal Features on the Classification and Clustering of Future-Related Web Documents. In L. Antunes and H.S. Pinto (Eds.), Lecture Notes in Artificial Intelligence - Progress in Artificial Intelligence, - 15th Portuguese Conference on Artificial Intelligence (EPIA2011) associated to APPIA: Portuguese Association for Artificial Intelligence Lisbon, Portugal, 10 - 13 October. (Vol. 7026-2011, pp. 581 - 596). ISBN: 978-3-642-24768-2. DBLP. Springer. Thomson ISI Web of Knowledge. ACM Press.

 

Temporal data analysis of web snippets and classification of queries with regards to the topical and temporal dimension


[Temporal Query Classification Experiment Webpage]

 

Campos, R., Dias, G. & Jorge, A. (2011). What is the Temporal Value of Web Snippets? In Proceedings of the 1st International Temporal Web Analytics Workshop (TWAW2011) associated to the 20th International World Wide Web Conference (WWW2011), pp 9 – 16, Hyderabad, India, 28th March, ISSN 1613 - 0073.

 

Temporal data analysis of explicit temporal queries. AOL dataset (AOL_DS)


[AOL_DS Experiment Webpage]

 

Campos, R., Dias, G. & Jorge, A. (2011). What is the Temporal Value of Web Snippets? In Proceedings of the 1st International Temporal Web Analytics Workshop (TWAW2011) associated to the 20th International World Wide Web Conference (WWW2011), pp 9 – 16, Hyderabad, India, 28th March, ISSN 1613 - 0073.