The Web Data Commons Schema.Org Data Set Series

Brinkmann, Alexander ; Primpeli, Anna ; Bizer, Christian

Document Type: Conference or workshop publication
Year of publication: 2023
Book title: The ACM Web Conference : Companion of the World Wide Web Conference WWW 2023
Page range: 136-139
Conference title: ACM Web Conference 2023
Location of the conference venue: Austin, TX
Date of the conference: 30.04.-04.05.2023
Publisher: Ding, Ying ; Tang, Jie ; Sequeda, Juan
Place of publication: New York, NY
Publishing house: Association for Computing Machinery
ISBN: 978-1-4503-9416-1
Related URLs:
Publication language: English
Institution: School of Business Informatics and Mathematics > Information Systems V: Web-based Systems (Bizer 2012-)
Subject: 004 Computer science, internet
Keywords (English): web science , semantic annotations , , information extraction
Abstract: Millions of websites have started to annotate structured data within their HTML pages using the vocabulary. Popular entity types annotated with terms are products, local businesses, events, and job postings. The Web Data Commons project has been extracting data from the Common Crawl every year since 2013 and offers the extracted data for public download in the form of the data set series. The latest release in the series consists of 106 billion RDF quads describing 3.1 billion entities. The entity descriptions originate from 12.8 million different websites. From a Web Science perspective, the data set series lays the foundation for analyzing the adoption process of annotations on the Web over the past decade. From a machine learning perspective, the annotations provide a large pool of training data for tasks such as product matching, product or job categorization, information extraction, or question answering. This poster gives an overview of the content of the Web Data Commons data set series. It highlights trends in the adoption of annotations on the Web and discusses how the annotations are being used as training data for machine learning applications.

Dieser Eintrag ist Teil der Universitätsbibliographie.

Metadata export


+ Search Authors in

+ Page Views

Hits per month over past year

Detailed information

You have found an error? Please let us know about your desired correction here: E-Mail

Actions (login required)

Show item Show item