Graph structure in the Web - aggregated by Pay-Level Domain
Lehmberg, Oliver
;
Meusel, Robert
;
Bizer, Christian
DOI:
|
https://doi.org/10.1145/2615569.2615674
|
URL:
|
http://www.planet-data.eu/sites/default/files/publ...
|
Additional URL:
|
http://de.slideshare.net/oli-unima/webgraph-websci...
|
Document Type:
|
Conference or workshop publication
|
Year of publication:
|
2014
|
Book title:
|
WebSci 2014 : Proceedings of the 6th ACM Conference on Web Science, Bloomington, IND, USA, June 23 - 26, 2014
|
Page range:
|
119-128
|
Date of the conference:
|
23.06.2014
|
Publisher:
|
Menczer, Filippo
|
Place of publication:
|
New York, NY
|
Publishing house:
|
ACM
|
ISBN:
|
978-1-4503-2622-3
|
Publication language:
|
English
|
Institution:
|
School of Business Informatics and Mathematics > Information Systems V: Web-based Systems (Bizer 2012-)
|
Subject:
|
004 Computer science, internet
|
Abstract:
|
Previous research on the overall graph structure of the World Wide Web mostly focused on the page level, meaning that the graph that directly results from hyperlinks between individual web pages was analyzed. This paper aims to provide additional insights about the macroscopic structure of the World Web Web by analyzing an aggregated version of a recent web graph. The graph covers over 3.5 billion web pages and 128 billion hyperlinks between pages. It was crawled in the first half of 2012. We aggregate this graph by pay-level domain (PLD), meaning that all pages that belong to the same pay-level domain are represented by a single node and that an arc exists between two nodes if there is at least one hyperlink between pages of the corresponding pay-level domains. The resulting PLD graph covers 43 million PLDs and contains 623 million arcs between PLDs. Analyzing this aggregated graph allows us to present findings about linkage patterns between complete websites and not only individual HTML pages. In this paper, we present basic statistics about the PLD graph, such as degree distributions, top-ranked PLDs, distances and diameter. We analyze whether the bow-tie structure introduced by Broder et al. can also be identified in our PLD graph and reveal a backbone of highly interlinked websites within the graph. We group the websites by top-level domain and report findings about the overall linkage within and between different top-level domains. In a last experiment, we use data from the Open Directory Project (DMOZ) to categorize websites by topic and report findings about linkage patterns between websites belonging to different topical categories.
|
| Dieser Eintrag ist Teil der Universitätsbibliographie. |
Search Authors in
BASE:
Lehmberg, Oliver
;
Meusel, Robert
;
Bizer, Christian
Google Scholar:
Lehmberg, Oliver
;
Meusel, Robert
;
Bizer, Christian
ORCID:
Lehmberg, Oliver, Meusel, Robert and Bizer, Christian ORCID: https://orcid.org/0000-0003-2367-0237
You have found an error? Please let us know about your desired correction here: E-Mail
Actions (login required)
|
Show item |
|
|