Graph structure in the Web - aggregated by Pay-Level Domain
Lehmberg, Oliver
;
Meusel, Robert
;
Bizer, Christian
DOI:
|
https://doi.org/10.1145/2615569.2615674
|
URL:
|
http://www.planet-data.eu/sites/default/files/publ...
|
Weitere URL:
|
http://de.slideshare.net/oli-unima/webgraph-websci...
|
Dokumenttyp:
|
Konferenzveröffentlichung
|
Erscheinungsjahr:
|
2014
|
Buchtitel:
|
WebSci 2014 : Proceedings of the 6th ACM Conference on Web Science, Bloomington, IND, USA, June 23 - 26, 2014
|
Seitenbereich:
|
119-128
|
Veranstaltungsdatum:
|
23.06.2014
|
Herausgeber:
|
Menczer, Filippo
|
Ort der Veröffentlichung:
|
New York, NY
|
Verlag:
|
ACM
|
ISBN:
|
978-1-4503-2622-3
|
Sprache der Veröffentlichung:
|
Englisch
|
Einrichtung:
|
Fakultät für Wirtschaftsinformatik und Wirtschaftsmathematik > Information Systems V: Web-based Systems (Bizer 2012-)
|
Fachgebiet:
|
004 Informatik
|
Abstract:
|
Previous research on the overall graph structure of the World Wide Web mostly focused on the page level, meaning that the graph that directly results from hyperlinks between individual web pages was analyzed. This paper aims to provide additional insights about the macroscopic structure of the World Web Web by analyzing an aggregated version of a recent web graph. The graph covers over 3.5 billion web pages and 128 billion hyperlinks between pages. It was crawled in the first half of 2012. We aggregate this graph by pay-level domain (PLD), meaning that all pages that belong to the same pay-level domain are represented by a single node and that an arc exists between two nodes if there is at least one hyperlink between pages of the corresponding pay-level domains. The resulting PLD graph covers 43 million PLDs and contains 623 million arcs between PLDs. Analyzing this aggregated graph allows us to present findings about linkage patterns between complete websites and not only individual HTML pages. In this paper, we present basic statistics about the PLD graph, such as degree distributions, top-ranked PLDs, distances and diameter. We analyze whether the bow-tie structure introduced by Broder et al. can also be identified in our PLD graph and reveal a backbone of highly interlinked websites within the graph. We group the websites by top-level domain and report findings about the overall linkage within and between different top-level domains. In a last experiment, we use data from the Open Directory Project (DMOZ) to categorize websites by topic and report findings about linkage patterns between websites belonging to different topical categories.
|
| Dieser Eintrag ist Teil der Universitätsbibliographie. |
Suche Autoren in
BASE:
Lehmberg, Oliver
;
Meusel, Robert
;
Bizer, Christian
Google Scholar:
Lehmberg, Oliver
;
Meusel, Robert
;
Bizer, Christian
ORCID:
Lehmberg, Oliver, Meusel, Robert and Bizer, Christian ORCID: https://orcid.org/0000-0003-2367-0237
Sie haben einen Fehler gefunden? Teilen Sie uns Ihren Korrekturwunsch bitte hier mit: E-Mail
Actions (login required)
|
Eintrag anzeigen |
|
|