Graph structure in the Web - aggregated by Pay-Level Domain


Lehmberg, Oliver ; Meusel, Robert ; Bizer, Christian



DOI: https://doi.org/10.1145/2615569.2615674
URL: http://www.planet-data.eu/sites/default/files/publ...
Weitere URL: http://de.slideshare.net/oli-unima/webgraph-websci...
Dokumenttyp: Konferenzveröffentlichung
Erscheinungsjahr: 2014
Buchtitel: WebSci 2014 : Proceedings of the 6th ACM Conference on Web Science, Bloomington, IND, USA, June 23 - 26, 2014
Seitenbereich: 119-128
Veranstaltungsdatum: 23.06.2014
Herausgeber: Menczer, Filippo
Ort der Veröffentlichung: New York, NY
Verlag: ACM
ISBN: 978-1-4503-2622-3
Sprache der Veröffentlichung: Englisch
Einrichtung: Fakultät für Wirtschaftsinformatik und Wirtschaftsmathematik > Information Systems V: Web-based Systems (Bizer 2012-)
Fachgebiet: 004 Informatik
Abstract: Previous research on the overall graph structure of the World Wide Web mostly focused on the page level, meaning that the graph that directly results from hyperlinks between individual web pages was analyzed. This paper aims to provide additional insights about the macroscopic structure of the World Web Web by analyzing an aggregated version of a recent web graph. The graph covers over 3.5 billion web pages and 128 billion hyperlinks between pages. It was crawled in the first half of 2012. We aggregate this graph by pay-level domain (PLD), meaning that all pages that belong to the same pay-level domain are represented by a single node and that an arc exists between two nodes if there is at least one hyperlink between pages of the corresponding pay-level domains. The resulting PLD graph covers 43 million PLDs and contains 623 million arcs between PLDs. Analyzing this aggregated graph allows us to present findings about linkage patterns between complete websites and not only individual HTML pages. In this paper, we present basic statistics about the PLD graph, such as degree distributions, top-ranked PLDs, distances and diameter. We analyze whether the bow-tie structure introduced by Broder et al. can also be identified in our PLD graph and reveal a backbone of highly interlinked websites within the graph. We group the websites by top-level domain and report findings about the overall linkage within and between different top-level domains. In a last experiment, we use data from the Open Directory Project (DMOZ) to categorize websites by topic and report findings about linkage patterns between websites belonging to different topical categories.




Dieser Eintrag ist Teil der Universitätsbibliographie.




Metadaten-Export


Zitation


+ Suche Autoren in

+ Aufruf-Statistik

Aufrufe im letzten Jahr

Detaillierte Angaben



Sie haben einen Fehler gefunden? Teilen Sie uns Ihren Korrekturwunsch bitte hier mit: E-Mail


Actions (login required)

Eintrag anzeigen Eintrag anzeigen