The Web Data Commons structured data extraction

Primpeli, Anna ; Meusel, Robert ; Bizer, Christian ; Stuckenschmidt, Heiner

est_poster_vice-uc_17-03-2017.pdf - Published

Download (460kB)

Additional URL:
URN: urn:nbn:de:bsz:180-madoc-478707
Document Type: Conference or workshop publication
Year of publication: 2017
Book title: E-Science-Tage 2017: Forschungsdaten managen
Page range: 1
Conference title: E-Science-Tage 2017
Location of the conference venue: Heidelberg, Germany
Date of the conference: 16.-17. März 2017
Place of publication: Heidelberg
Publishing house: Heidelberg University
Publication language: English
Institution: School of Business Informatics and Mathematics > Wirtschaftsinformatik V (Bizer)
School of Business Informatics and Mathematics > Praktische Informatik II (Stuckenschmidt 2009-)
Subject: 004 Computer science, internet
Abstract: More and more websites annotate their content using different markup formats. These annotations involve a large number of topics such as persons, events, products, hotels, organizations and cities. The purpose of embedding structured data in HTML pages is to make the content of those pages understandable to web applications. In this way, the retrieval and integration of data deriving from different web pages is greatly facilitated. The presented poster gives an overview of the Web Data Commons - structured data project for the year 2016. The Web Data Commons project extracts structured data from the web corpus provided by Common Crawl, the largest public web corpus, and offers the extracted data for public download. In order to process these huge amounts of data, Web Data Commons builds upon its Extraction Framework and the Amazon Web Services.

Dieser Eintrag ist Teil der Universitätsbibliographie.

Das Dokument wird vom Publikationsserver der Universitätsbibliothek Mannheim bereitgestellt.

Metadata export


+ Search Authors in

+ Download Statistics

Downloads per month over past year

View more statistics

You have found an error? Please let us know about your desired correction here: E-Mail

Actions (login required)

Show item Show item