sis 24(2):

Research Article

ALGORITHMIC LITERACY: Generative Artificial Intelligence Technologies for Data Librarians

Download449 downloads
  • @ARTICLE{10.4108/eetsis.4067,
        author={Alexandre Semeler and Adilson Pinto and Tibor Koltay and Thiago  Dias and Arthur Oliveira and Jos\^{e} Gonz\^{a}lez and Helen Beatriz Frota Rozados},
        title={ALGORITHMIC LITERACY: Generative Artificial Intelligence Technologies for Data Librarians},
        journal={EAI Endorsed Transactions on Scalable Information Systems},
        volume={11},
        number={2},
        publisher={EAI},
        journal_a={SIS},
        year={2024},
        month={1},
        keywords={Generative Pre-trained Transformer, Algorithmic Literacy, Python, Open AI, Data Librarian},
        doi={10.4108/eetsis.4067}
    }
    
  • Alexandre Semeler
    Adilson Pinto
    Tibor Koltay
    Thiago Dias
    Arthur Oliveira
    José González
    Helen Beatriz Frota Rozados
    Year: 2024
    ALGORITHMIC LITERACY: Generative Artificial Intelligence Technologies for Data Librarians
    SIS
    EAI
    DOI: 10.4108/eetsis.4067
Alexandre Semeler1,*, Adilson Pinto2, Tibor Koltay3, Thiago Dias4, Arthur Oliveira1, José González5, Helen Beatriz Frota Rozados1
  • 1: Universidade Federal do Rio Grande do Sul
  • 2: Universidade Federal de Santa Catarina
  • 3: Eszterházy Károly Catholic University
  • 4: CEFET-MG
  • 5: Carlos III University of Madrid
*Contact email: alexandre.semeler@ufrgs.br

Abstract

INTRODUCTION: Artificial intelligence (AI) is a novel type of library technology. AI technologies and the needs of data librarians are hybrid and symbiotic, because academic libraries must insert AI technologies into their information and data services. Library services need AI to interpret the context of big data. OBJECTIVES: In this context, we explore the use of the the OpenAI Codex, a deep learning model trained on Python code from repositories, to generate code scripts for data librarians. This investigation examines the practices, models, and methodologies for obtaining code script insights from complex code environments linked to AI GPT technologies.  METHODS: The proposed AI-powered method aims to assist data librarians in creating code scripts using Python libraries and plugins such as the integrated development environment PyCharm, with additional support from the Machinet AI and Bito AI plugins. The process involves collaboration between the data librarian and the AI agent, with the librarian providing a natural language description of the programming problem and the OpenAI Codex generating the solution code in Python. RESULTS: Five specific web-scraping problems are presented. The scripts demonstrate how to extract data, calculate metrics, and write the results to files. CONCLUSION: Overall, this study highlights the application of AI in assisting data librarians with code script creation for web scraping tasks. AI may be a valuable resource for data librarians dealing with big data challenges on the Web. The possibility of creating Python code with AI is of great value, as AI technologies can help data librarians work with various types of data sources. The Python code in Data Science web scraping projects uses a machine-learning model that can generate human-like code to help create and improve the library service for extracting data from a web collection. The ability of nonprogramming data librarians to use AI technologies facilitates their interactions with all types and data sources. The Python programming language has artificial intelligence modules, packages, and plugins such as the OpenAI Codex, which serialises automation and navigation in web browsers to simulate human behaviour on pages by entering passwords, selecting captcha options, collecting data, and creating different collections of datasets to be viewed.