Perplexity-inspired metasearch-based alternatives to FAIR GPT: Open-source AI consultants for RDM
Schmidt, Thomas
;
Shigapov, Renat
;
Schumm, Irene
;
Kamlah, Jan
![[img]](https://madoc.bib.uni-mannheim.de/style/images/fileicons/application_pdf.png) |
PDF
2025_Open-Source-AI-consultants-for-RDM.pdf
- Veröffentlichte Version
Download (1MB)
|
DOI:
|
https://doi.org/10.5281/zenodo.15038486
|
URL:
|
https://doi.org/10.5281/zenodo.15038486
|
URN:
|
urn:nbn:de:bsz:180-madoc-694515
|
Dokumenttyp:
|
Präsentation auf Konferenz
|
Erscheinungsjahr:
|
2025
|
Veranstaltungstitel:
|
E-Science-Tage 2025
|
Veranstaltungsort:
|
Heidelberg, Germany
|
Veranstaltungsdatum:
|
12.-14.03.2025
|
Verlag:
|
Zenodo
|
Verwandte URLs:
|
|
Sprache der Veröffentlichung:
|
Englisch
|
Einrichtung:
|
Zentrale Einrichtungen > UB Universitätsbibliothek
|
Bereits vorhandene Lizenz:
|
Creative Commons Namensnennung 4.0 International (CC BY 4.0)
|
Fachgebiet:
|
020 Bibliotheks- und Informationswissenschaft
|
Freie Schlagwörter (Englisch):
|
Research Data Management , RDM , LLMs , Large Language Models , AI Assistants , FAIR data , FAIR GPT
|
Abstract:
|
FAIR GPT was recently proposed as a virtual consultant for research data management (RDM) designed to help researchers and institutions in making their data FAIR (Findable, Accessible, Interoperable, Reusable). To reduce hallucinations and improve accuracy for certain tasks, FAIR GPT uses external APIs (FAIR-Checker, FAIR Enough, TIB Terminology, and re3data) and uploaded RDM resources (Horizon 2020 guidelines and the awesome-RDM GitHub repository). Its functionalities include metadata enhancement, dataset organization, repository selection, FAIRness assessment, license recommendations, and generating documentation such as data management plans, README files, and codebooks.
However, FAIR GPT has limitations. It does not provide sources for its answers, which reduces transparency and trust in its outputs. As part of OpenAI's "Custom GPTs", FAIR GPT is not open source, which limits customization, and it lacks an API for integration into existing RDM workflows. Reliance on external cloud-based services leads to privacy concerns when dealing with sensitive (meta)data. These issues led us to explore alternative open-source solutions.
We specifically searched for open-source alternatives to Perplexity AI, a system known for its ability to provide citations for the information it retrieves. We identified three candidates available on GitHub: Perplexica, sensei, and farfalle. These tools use local instances of SearXNG to perform internet search, using the results as contextual input for large language models (LLMs). We modified each of these tools to focus specifically on RDM tasks, releasing the new versions on GitHub openly under the names FAIR-Perplexica, FAIR-sensei and FAIR-farfalle.
We conducted a comparative analysis of these open-source candidates against each other and FAIR GPT, including (but not limited to) the following criteria:
1. Provenance. Unlike FAIR GPT, all three tools provide clear links to the sources of their search results, which improves transparency and trust in their outputs.
2. Privacy. While these tools are designed to run locally, they also send requests to the internet for information retrieval, which leads to privacy concerns.
3. Up-to-dateness. FAIR GPT partly relies on a static knowledge base, which may become outdated. The other tools use internet searches that contain more up-to-date, RDM-specific information.
4. Customizability. The open-source nature of the new tools allows users to customize them according to their specific RDM needs, which contrasts with FAIR GPT.
5. Ease of installation and use. All tools are straightforward to install using Docker Compose, and they offer intuitive, user-friendly graphical interfaces.
6. Community support. Open-source tools benefit from upstream development and a community of contributors.
7. Accuracy and Completeness. Each tool's responses were evaluated for missing information and potential errors.
8. Performance. Due to the varying pre- and post-processing steps involved in each tool, their overall performance differs.
In this work, we introduce and compare the open-source solutions FAIR-Perplexica, FAIR-sensei, and FAIR-farfalle as alternatives to FAIR GPT. These tools are designed for users who prioritize transparency, customization, and control over their (meta)data workflows. However, these tools involve sending requests to the internet via metasearch engine SearXNG, which may lead to privacy challenges.
|
 | Dieser Eintrag ist Teil der Universitätsbibliographie. |
 | Das Dokument wird vom Publikationsserver der Universitätsbibliothek Mannheim bereitgestellt. |
Suche Autoren in
BASE:
Schmidt, Thomas
;
Shigapov, Renat
;
Schumm, Irene
;
Kamlah, Jan
Google Scholar:
Schmidt, Thomas
;
Shigapov, Renat
;
Schumm, Irene
;
Kamlah, Jan
ORCID:
Schmidt, Thomas ORCID: 0000-0003-3620-3355 ; Shigapov, Renat ORCID: 0000-0002-0331-2558 ; Schumm, Irene ORCID: 0000-0002-0167-3683 ; Kamlah, Jan ORCID: 0000-0002-0417-7562
Sie haben einen Fehler gefunden? Teilen Sie uns Ihren Korrekturwunsch bitte hier mit: E-Mail
Actions (login required)
 |
Eintrag anzeigen |
|