The Web Data Commons Schema.Org Data Set Series
Brinkmann, Alexander
;
Primpeli, Anna
;
Bizer, Christian
DOI:
|
https://doi.org/10.1145/3543873.3587331
|
URL:
|
https://dl.acm.org/doi/10.1145/3543873.3587331
|
Document Type:
|
Conference or workshop publication
|
Year of publication:
|
2023
|
Book title:
|
The ACM Web Conference : Companion of the World Wide Web Conference WWW 2023
|
Page range:
|
136-139
|
Conference title:
|
ACM Web Conference 2023
|
Location of the conference venue:
|
Austin, TX
|
Date of the conference:
|
30.04.-04.05.2023
|
Publisher:
|
Ding, Ying
;
Tang, Jie
;
Sequeda, Juan
|
Place of publication:
|
New York, NY
|
Publishing house:
|
Association for Computing Machinery
|
ISBN:
|
978-1-4503-9416-1
|
Related URLs:
|
|
Publication language:
|
English
|
Institution:
|
School of Business Informatics and Mathematics > Information Systems V: Web-based Systems (Bizer 2012-)
|
Subject:
|
004 Computer science, internet
|
Keywords (English):
|
web science , semantic annotations , schema.org , information extraction
|
Abstract:
|
Millions of websites have started to annotate structured data within their HTML pages using the schema.org vocabulary. Popular entity types annotated with schema.org terms are products, local businesses, events, and job postings. The Web Data Commons project has been extracting schema.org data from the Common Crawl every year since 2013 and offers the extracted data for public download in the form of the schema.org data set series. The latest release in the series consists of 106 billion RDF quads describing 3.1 billion entities. The entity descriptions originate from 12.8 million different websites. From a Web Science perspective, the data set series lays the foundation for analyzing the adoption process of schema.org annotations on the Web over the past decade. From a machine learning perspective, the annotations provide a large pool of training data for tasks such as product matching, product or job categorization, information extraction, or question answering. This poster gives an overview of the content of the Web Data Commons schema.org data set series. It highlights trends in the adoption of schema.org annotations on the Web and discusses how the annotations are being used as training data for machine learning applications.
|
| Dieser Eintrag ist Teil der Universitätsbibliographie. |
Search Authors in
You have found an error? Please let us know about your desired correction here: E-Mail
Actions (login required)
|
Show item |
|
|