Graph structure in the Web - aggregated by Pay-Level Domain


Lehmberg, Oliver ; Meusel, Robert ; Bizer, Christian



DOI: https://doi.org/10.1145/2615569.2615674
URL: http://www.planet-data.eu/sites/default/files/publ...
Additional URL: http://de.slideshare.net/oli-unima/webgraph-websci...
Document Type: Conference or workshop publication
Year of publication: 2014
Book title: WebSci 2014 : Proceedings of the 6th ACM Conference on Web Science, Bloomington, IND, USA, June 23 - 26, 2014
Page range: 119-128
Date of the conference: 23.06.2014
Publisher: Menczer, Filippo
Place of publication: New York, NY
Publishing house: ACM
ISBN: 978-1-4503-2622-3
Publication language: English
Institution: School of Business Informatics and Mathematics > Information Systems V: Web-based Systems (Bizer 2012-)
Subject: 004 Computer science, internet
Abstract: Previous research on the overall graph structure of the World Wide Web mostly focused on the page level, meaning that the graph that directly results from hyperlinks between individual web pages was analyzed. This paper aims to provide additional insights about the macroscopic structure of the World Web Web by analyzing an aggregated version of a recent web graph. The graph covers over 3.5 billion web pages and 128 billion hyperlinks between pages. It was crawled in the first half of 2012. We aggregate this graph by pay-level domain (PLD), meaning that all pages that belong to the same pay-level domain are represented by a single node and that an arc exists between two nodes if there is at least one hyperlink between pages of the corresponding pay-level domains. The resulting PLD graph covers 43 million PLDs and contains 623 million arcs between PLDs. Analyzing this aggregated graph allows us to present findings about linkage patterns between complete websites and not only individual HTML pages. In this paper, we present basic statistics about the PLD graph, such as degree distributions, top-ranked PLDs, distances and diameter. We analyze whether the bow-tie structure introduced by Broder et al. can also be identified in our PLD graph and reveal a backbone of highly interlinked websites within the graph. We group the websites by top-level domain and report findings about the overall linkage within and between different top-level domains. In a last experiment, we use data from the Open Directory Project (DMOZ) to categorize websites by topic and report findings about linkage patterns between websites belonging to different topical categories.




Dieser Eintrag ist Teil der Universitätsbibliographie.




Metadata export


Citation


+ Search Authors in

+ Page Views

Hits per month over past year

Detailed information



You have found an error? Please let us know about your desired correction here: E-Mail


Actions (login required)

Show item Show item