Skip to content

Latest commit

 

History

History
73 lines (49 loc) · 3.87 KB

PlotQA_Dataset.md

File metadata and controls

73 lines (49 loc) · 3.87 KB

PlotQA Dataset Download Links

Images

png directory contains RGBA images of the scientific plots in .png format. The plot images are named as 0.png, 1.png, etc.

The plot images for different data splits can be downloaded from the following link:

Train, Validation, Test

Annotations

annotations.json is a list of dictionaries where each dictionary represents the ground-truth annotations of a plot. It consists of the following fields:

models: It is a list of dictionaries. Depending on the type of the plot (single or 2,3,4-multi), 
	the length of the dictionary can vary from 1 to 4. Each dictionary contains the following keys-
		name: Label corresponding to the datapoint.
		color: Color corresponding to the `name` datapoint.
		bboxes: Bounding boxes corresponding to the `name` datapoints in the plot.
		label: label corresponding to the datapoint which will appear as the legend (same as the `name` field).
		x: x-value of the datapoints.
		y: y-value of the datapoints.

type: Type of the plot (vbar_categorical, hbar_categorical, dot_line, line).

general_figure_info: It is a dictionary containng the following keys-
		title: Bounding box and the text corresponding to the title of the plot.
		x_axis: Bounding boxes, axis labels, ticks lables corresponding to the x-axis of the plot.
		y_axis: Bounding boxes, axis labels, ticks lables corresponding to the y-axis of the plot.
		legend: Bounding boxes, axis labels, ticks lables corresponding to the legend of the plot.
		plot_info: Bounding box corresponding to the plot.
		figure_info: Bounding box corresponding to the figure.
	
image_index: Image-index corresponding to each image.

The annotations for the plot images for different data splits can be downloaded from the following links:

Train, Validation, Test

Question-Answer pairs

qa_pairs.json is a list of dictionaries where each dictionary represents a question. Each question is represented using the following fields:

image_index: Image-index corresponding to the image on which this question is being asked.

question_string: Text of the question.

answer: Answer corresponding to the question `question_string`.

answer_bbox: Bounding box of the answer if the answer comes from the plot itself.

template: Template from which `question_string` is being instantiated.

type: Type of the plot (vbar_categorical, hbar_categorical, dot_line, line).

qa_pairs_v1.json: This .json file contains 8,190,674 number of question-answer pairs.

The question-answer pairs for different data splits can be downloaded from the following links:

Train, Validation, Test

qa_pairs_v2.json: This .json file is an extended version of the qa_pairs_v1.json which has 28,952,641 number of question-answer pairs.

The extended version of the question-answer pairs for different data splits can be downloaded from the following links:

Train, Validation, Test