What is the TriviaQA dataset?

Chris Staff asked 11 months ago
1 Answers
Best Answer
Chris Staff answered 10 months ago

As the name suggests, TriviaQA is a dataset that can be used in reading comprehension tasks in machine learning. Here are some facts from Joshi et al. (2017), the paper introducing this dataset:

1. It’s built around trivia questions, with > 650K question-answer-evidence triples, including 95K question-answer pairs created by trivia enthusiasts.
2. It is a difficult dataset: its complexity is what makes it great for making progress in NLP. It is large scale, uses freeform answers that are well form, the questions themselves are collected independently of the evidence supporting the answer, and the evidence is varied.
3. Beyond the noisy, large-scale dataset, a smaller one that is edited by humans is offered. This dataset contains 1975 question-answer triplets.

An example from the dataset (source: Joshi et al., 2017)
Question: The Dodecanese Campaign of WWII that
was an attempt by the Allied forces to capture islands in
the Aegean Sea was the inspiration for which acclaimed
1961 commando film?
Answer: The Guns of Navarone
Excerpt: The Dodecanese Campaign of World War II
was an attempt by Allied forces to capture the Italianheld Dodecanese islands in the Aegean Sea following
the surrender of Italy in September 1943, and use them
as bases against the German-controlled Balkans. The
failed campaign, and in particular the Battle of Leros,
inspired the 1957 novel The Guns of Navarone and
the successful 1961 movie of the same name
 
Joshi, M., Choi, E., Weld, D. S., & Zettlemoyer, L. (2017). Triviaqa: A large scale distantly supervised challenge dataset for reading comprehension. arXiv preprint arXiv:1705.03551.

Your Answer

19 + 20 =