What is WebSRC?

WebSRC is a novel Web-based Structural Reading Comprehension dataset. It consists of 0.44M question-answer pairs, which are collected from 6.5K web pages with corresponding HTML source code, screenshots and metadata. Each question in WebSRC requires a certain structural understanding of a web page to answer, and the answer is either a text span on the web page or yes/no.


Mar 21, 2021 The full dataset and baseline is available in: Extraction code: s4js Download from Amazon.com Baseline is available on WebSRC-Baseline

Contact Us

If you have any questions about this dataset, please contact chenlusz@sjtu.edu.cn or galaxychen@sjtu.edu.cn


Three metrics, i.e. Exact Match (EM), F1 score, and Path Overlap Score (POS), are used to evaluate on the test set of WebSRC. Please refer to the paper to find more details about evaluation metrics.

Rank Model EM F1 POS
1 Jan 01, 2021 V-PLM (ELECTRA)

Shanghai Jiao Tong University

69.12 75.96 84.98
2 Jan 01, 2021 V-PLM (BERT)

Shanghai Jiao Tong University

54.71 62.01 77.04