The Web Data Commons Schema.Org Data Set Series
Brinkmann, Alexander
;
Primpeli, Anna
;
Bizer, Christian
DOI:
|
https://doi.org/10.1145/3543873.3587331
|
URL:
|
https://dl.acm.org/doi/10.1145/3543873.3587331
|
Dokumenttyp:
|
Konferenzveröffentlichung
|
Erscheinungsjahr:
|
2023
|
Buchtitel:
|
The ACM Web Conference : Companion of the World Wide Web Conference WWW 2023
|
Seitenbereich:
|
136-139
|
Veranstaltungstitel:
|
ACM Web Conference 2023
|
Veranstaltungsort:
|
Austin, TX
|
Veranstaltungsdatum:
|
30.04.-04.05.2023
|
Herausgeber:
|
Ding, Ying
;
Tang, Jie
;
Sequeda, Juan
|
Ort der Veröffentlichung:
|
New York, NY
|
Verlag:
|
Association for Computing Machinery
|
ISBN:
|
978-1-4503-9416-1
|
Verwandte URLs:
|
|
Sprache der Veröffentlichung:
|
Englisch
|
Einrichtung:
|
Fakultät für Wirtschaftsinformatik und Wirtschaftsmathematik > Information Systems V: Web-based Systems (Bizer 2012-)
|
Fachgebiet:
|
004 Informatik
|
Freie Schlagwörter (Englisch):
|
web science , semantic annotations , schema.org , information extraction
|
Abstract:
|
Millions of websites have started to annotate structured data within their HTML pages using the schema.org vocabulary. Popular entity types annotated with schema.org terms are products, local businesses, events, and job postings. The Web Data Commons project has been extracting schema.org data from the Common Crawl every year since 2013 and offers the extracted data for public download in the form of the schema.org data set series. The latest release in the series consists of 106 billion RDF quads describing 3.1 billion entities. The entity descriptions originate from 12.8 million different websites. From a Web Science perspective, the data set series lays the foundation for analyzing the adoption process of schema.org annotations on the Web over the past decade. From a machine learning perspective, the annotations provide a large pool of training data for tasks such as product matching, product or job categorization, information extraction, or question answering. This poster gives an overview of the content of the Web Data Commons schema.org data set series. It highlights trends in the adoption of schema.org annotations on the Web and discusses how the annotations are being used as training data for machine learning applications.
|
| Dieser Eintrag ist Teil der Universitätsbibliographie. |
Suche Autoren in
Sie haben einen Fehler gefunden? Teilen Sie uns Ihren Korrekturwunsch bitte hier mit: E-Mail
Actions (login required)
|
Eintrag anzeigen |
|
|