FinMatcher at FinSim-2: Hypernym Detection in the Financial Services Domain using Knowledge Graphs

This paper presents the FinMatcher system and its results for the FinSim 2021 shared task which is co-located with the Workshop on Financial Technology on the Web (FinWeb) in conjunction with The Web Conference. The FinSim-2 shared task consists of a set of concept labels from the financial services domain. The goal is to find the most relevant top-level concept from a given set of concepts. The FinMatcher system exploits three publicly available knowledge graphs, namely WordNet, Wikidata, and WebIsALOD. The graphs are used to generate explicit features as well as latent features which are fed into a neural classifier to predict the closest hypernym.


INTRODUCTION
A hypernym or hyperonym is a concept which is superordinate to another one. In computer science, it is often represented as an IS-A relationship. For example, animal is a hypernym of cat and equity index is a hypernym of S&P 500 Index. A hyponym, on the other hand, is a concept which is subordinate to another one. For example, cat is a hyponym of animal and S&P 500 Index is a hyponym of equity index. [15] Hypernymy detection can be broadly applied in real-world applications. The detection of hypernyms in the financial services domain is particularly interesting due to a Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from permissions@acm.org. WWW '21 Companion, April 19-23, 2021 domain specific vocabulary and a lack of publicly available domainspecific resources and concept representations.
The FinSim task models the hypernym detection task as a multi class classification problem: Given a concept label (i.e., the hyponym), the correct hypernym is to be found from a set of 10 mutually exclusive classes (i.e., hypernyms). A system participating in this task can return a sorted list of classes. The task is evaluated with two performance metrics: mean rank and accuracy.
The FinMatcher system uses two very broad publicly available knowledge graphs (Wikidata and WebIsALOD) as well as a small linguistic graph resource (WordNet). A knowledge graph contains real world entities from various domains and the relationships that hold between them in a graph format [16]. The system presented in this paper calculates multiple explicit features and uses RDF2vec embeddings obtained from WebIsALOD. The features are concatenated into a feature vector which is presented to a neural classifier which was trained with the provided FinSim training data.
In the following section, related work is introduced. Afterwards, the provided dataset is quickly described. In Section 4, the Fin-Matcher system is presented. The results of FinSim task are given in Section 5 together with an ablation study. The paper is concluded in Section 6 where future research directions are also presented.

RELATED WORK 2.1 Shared Tasks for Hypernym Detection
Hypernym discovery has been addressed before as challenge, for example at SemEval-2018 [1]. Unique to the FinSim task is the focus on the financial services industry. The evaluation campaign premiered in 2020 [4] and has been extended for the 2021 campaign, also referred to as FinSim-2 [11]: Two additional tags have been introduced and the training and evaluation datasets have been extended.

Knowledge Graphs
FinMatcher uses three external knowledge graphs as background knowledge for the task of hypernym detection.
WordNet [5] is a well known lexical resource. It is a database of English words grouped in sets which represent a particular meaning, called synsets; further semantic relations such as hypernymy also exist in the database. The resource is publicly available. 1 Wikidata is a knowledge graph hosted by the Wikimedia Foundation which is publicly available 2 and maintained by an open community. The graph contains class-like entities, such as "stock market index", and also instance-like entities, such as "MSCI World". An example for a Wikidata statement would be "MSCI World" instance of "stock market index" 3 . The graph can be queried using SPARQL 4 .
A frequent problem that occurs when working with external background knowledge in the financial services domain is the fact that less common entities -so called long tail entities -are not contained within a knowledge base. The WebIsA [22] database is an attempt to tackle this problem by providing a dataset which is not based on a single source of knowledge -like DBpedia [10]but instead on the whole Web: The dataset consists of hypernymy relations extracted from the Common Crawl 5 , a freely downloadable crawl of a significant portion of the Web. For the automated extraction, lexico-syntactic patterns similar to those presented by Hearst [6] were used. Like Wikidata, the graph contains class-like and instance-like concepts. A sample triple from the dataset is "zerocoupon bond" skos:broader "bond" 6 . The dataset is also available via a Linked Open Data (LOD) endpoint 7 under the name WebIsALOD [7] -hence, it can be queried like Wikidata using SPARQL.

Knowledge Graph Embeddings
In recent years, latent representations have gained traction not only in natural language processing but also in other data science communities. RDF2vec [21] is a knowledge graph embedding approach which allows to obtain a latent representation for the elements of a knowledge graph, i.e. a vector, for each node and each edge in a graph. It applies the word2vec [12,13] model to RDF data: Random walks are performed for each node and are interpreted as sentences. After the walk generation, the sentences are used as input for the word2vec algorithm. As a result, one obtains a vector for each word, i.e., a concept in the RDF graph. Multiple flavors of RDF2vec have been developed in the past such as biased walks [3] or RDF2vec Light [20]. 8 The calculation of knowledge graph embeddings on large graphs can require a significant amount of resources. Therefore, KGvec2go 9 [19] provides pre-trained RDF2vec knowledge graph embeddings through a Web API as well as via download. For the system presented in this paper, a pre-trained embedding of WebIsALOD has been downloaded from KGvec2go.
Both, RDF2vec and WebIsALOD, have been used for integration tasks in the financial services domain before [14,17].

FINSIM DATASET DESCRIPTION
The FinSim dataset consists of 614 hyponym-hypernym pairs. There are 10 class labels, i.e. hypernyms: The class labels presented above classify concepts not according to their features but instead according to their prototypical kind. The distribution of class labels is not balanced. As shown in Figure 1, the distribution of labels follows a power-law with 286 entries for "equity index" and only 9 entries for "forward". This is a challenging setting for multiple reasons: (i) the training dataset is comparatively small, (ii) the hypernyms are semantically very related, (iii) industry abbreviations are used, and (iv) there are textual overlaps. The FinSim-2 test dataset consists of 212 entries, the distribution of class labels is not known.
Compared to other evaluation campaigns where participants have to submit their implementations, such as the Ontology Alignment Evaluation Initiative (OAEI), participants of the FinSim task run their system on their own premises and submit the predictions made by their system.

SYSTEM DESCRIPTION
The FinMatcher system combines explicit and latent features. In total, there are five groups of features which will be presented in the following. The overall architecture is shown in Figure 2.

Features
Word Overlap. The overlap between hyponym and class label is a strong signal for a match. An example would be "Supranational Bond" which is a "Bond". As such constellations are relatively frequent in the provided dataset, the first feature vector encodes whether the label contains the class label. For this feature minimal text pre-processing is applied including lower-casing and removal of the plural suffix "s". As this step is performed for each class label, a vector of length 10 is obtained. The overlap feature vector is displayed in green in Figure 2.
Wikidata Hypernym Lookup. Wikidata is a large general-purpose knowledge graph which is not tailored to the financial domain. Nonetheless, the data source contains many financial concepts and relations between them. For example, the concept "UCITS" can be linked to"Undertakings for Collective Investment in Transferable Securities" via the also known as label; due to the annotated relation subclass of, it is easily recognizable that "UCITS" is an "investment fund". 10 This notion is exploited in this set of features: A comprehensive linking mechanism from the MELT framework 11 [8,9] is used to link classes (the hypernyms) as well as labels (the hyponyms) to Wikidata concepts and then relations 31 (instance of) and 279 (sublcass of) are followed up to two hops to evaluate whether the class label appears. Distant matches receive a lower signal strength which is calculated through the inverse hop-distance: A direct hypernym annotation (as in the UCITS example stated earlier) receives the value 1 1 = 1 whereas a two-hop match would 10 see https://www.wikidata.org/wiki/Q25323628 11 The Matching EvaLuation Toolkit (MELT) is a framework for ontology and instance matching (development, evaluation, visualization [18]). However, components can also be exploited for other tasks. For a better overview, see https://github.com/dwslab/melt/ WordNet Hypernym Lookup. The same exploitation approach chosen for Wikidata is applied on the WordNet graph: Hypernyms and hyponyms are linked into WordNet and then the inverse hopdistance is used as feature value. This is done for each class label that could be linked. The WordNet lookup feature vector is displayed in yellow in Figure 2.
WebIsALOD Hypernym Lookup. In a similar fashion to the Wikidata hypernym lookup, class labels as well as hyponym labels are linked to the WebIsALOD graph using a linker from the MELT framework. In this graph, there exists only one significant relation: skos:broader. For each hyponym, the broader concepts are obtained and it is checked whether the hypernym appears. Due to a high level of noise, the number of upwards hops is limited to 1. As this step is performed for each class label, a vector of length 10 is obtained. The WebIsALOD lookup feature vector is displayed in purple in Figure 2.
WebIsALOD RDF2vec Similarity. For the embedding feature, each class label as well as each hyponym label is linked again into the WebIsALOD knowledge graph. Each concept in WebIsALOD has an associated embedding vector ∈ 200 . For comparisons, the cosine similarity between the hyponym and the class label is calculated.
If the whole concept cannot be linked, multiple sub-concepts are detected and linked. Within this linking process, longer subconcepts are favored. For example, the string "CDX Emerging Markets" cannot be directly linked -however, the longest substring that can be linked here is "Emerging Markets"; in addition, "CDX" can also be linked. Comparisons in such cases are performed as follows: where represents the set of links of the hyponym, represents the set of links of the hypernym, and correspond to the vectors of the links and refers to a similarity function. In this case, the cosine is used as similarity function. As this step is performed for each class label, a vector of length 10 is obtained. The WebIsALOD lookup feature vector is displayed in salmon in Figure 2.
Feature Composition. Each of the features returns a signal vector ∈ 10 . All vectors are concatenated to form the final signal vector = ∥ 5 =1 , which is used as input for the classifier.

Classifier
Due to the small total number of training examples, a very simple artificial neural network architecture has been chosen. It is configured with one fully connected layer of size 10 and mean squared error as loss. The network was trained with 100 epochs and a batch size of 25 on a consumer PC. The vector that is to be predicted is of size 10 and represents the one-hot-encoded class label. The neural network classifier performed best among the classifiers evaluated: Naïve Bayes, J48 decision trees, random forests, and a regression. As the distribution of class labels is skewed (see Figure 1), we applied the synthetic minority oversampling technique (SMOTE) [2] to upsample underrepresented class labels. We experimentally chose 33% of the majority class total as the upsampling barrier; this means that if the majority class in the training split totals to 229 records, upsampling for class labels with less than 1 3 * 229 = 76 records will be performed so that there are 76 records for the underrepresented class label.

Results on the Training Data and Ablation Study
We evaluated our matching system by performing a stratified fivefold cross validation on the training data. We trained each ANN configuration 10 times and report the average results for accuracy and mean rank. We further performed an ablation study by training and evaluating the performance when leaving out each of the five feature groups. The results can be found in Table 1.
It is visible that the most important feature group in terms of accuracy is word overlap. This is not surprising given the high number of labels that contain the hypernym within their name (for example "green bonds" → "bonds") and shows that it is sensible for the task at hand to combine explicit and latent features. The  observation that the inclusion of the target label in the term is a significant signal has also been made in the last FinSim campaign [4]. The negligible role of WordNet in terms of accuracy is also comprehensible since this particular external background knowledge dataset contains merely general-purpose class knowledge (such as "call option") but no knowledge about instances (such as "MSCI EMU Index"). For the FinSim dataset, very large knowledge graphs that contain class as well as instance knowledge are more beneficial due to their higher concept coverage. However, the information in the knowledge graphs used also contain some redundancy, as can be observed in Table 1: leaving out a single knowledge graph does not significantly change the results.
To further analyze the contribution of the different signals, we plotted the weights of the input features. As the weight of each input neuron relates to label , we can directly observe which features the trained model considers relevant to identify which label. Table 2 shows the summed absolute weight per feature group. This allows to analyze the overall contribution of the individual feature group. Here, it is visible that the latent RDF2vec feature group receives the highest weight -higher than the word overlap group. While the word overlap feature is important for the majority labels (equity index, credit index), it is not equally important for all labels and does not have the overall highest weight: Figure 3 shows the summed absolute weight per feature group and class label. The class labels are sorted in descending order by frequency. Here, it is visible that the word overlap has the highest contribution for the equity index as well as a high contribution for the credit index but low weights for the remaining minority classes.

Results on the Reference Data
FinMatcher participated only with one configuration and achieved an accuracy of 81.1% and a mean rank of 1.415 on the reference data below the expected scores from the training data shown in Table 1.

CONCLUSION
In this paper, we presented FinMatcher, a hypernym detection system for the financial services domain which exploits multiple knowledge graphs by combining explicit and latent features. We could show that the task can be addressed by including external knowledge in the form of knowledge graphs and that the combination of multiple graphs is overall beneficial.
In the future, we strive to improve the results through the inclusion of more advanced embedding techniques as well as the exploration of additional external datasets.

ACKNOWLEDGMENTS
We would like to thank the FinSim-2 organizers (Youness Mansar, Ismaïl El Maarouf, and Juyeon Kang) for compiling the data, conducting the evaluation campaign, and for promptly answering all questions.