ct 21(29): e5

Research Article

On Trusting a Cyber Librarian: How rethinking underlying data storage infrastructure can mitigate risks of automation

Download59 downloads
  • @ARTICLE{10.4108/eai.1-12-2021.172359,
        author={Maria Joseph Israel and Mark Graves and Ahmed Amer},
        title={On Trusting a Cyber Librarian: How rethinking underlying data storage infrastructure can mitigate risks of automation},
        journal={EAI Endorsed Transactions on Creative Technologies},
        volume={8},
        number={29},
        publisher={EAI},
        journal_a={CT},
        year={2021},
        month={12},
        keywords={Intelligent systems, AI-Human problem, semantic sentiment analysis, artificial intelligence, ethics of AI, cyber curation of scholarship},
        doi={10.4108/eai.1-12-2021.172359}
    }
    
  • Maria Joseph Israel
    Mark Graves
    Ahmed Amer
    Year: 2021
    On Trusting a Cyber Librarian: How rethinking underlying data storage infrastructure can mitigate risks of automation
    CT
    EAI
    DOI: 10.4108/eai.1-12-2021.172359
Maria Joseph Israel1,*, Mark Graves2, Ahmed Amer1
  • 1: Santa Clara University, Santa Clara, CA 95053, USA
  • 2: University of Notre Dame, Notre Dame, IN 46556 USA
*Contact email: misrael@scu.edu

Abstract

INTRODUCTION: The increased ability of Artificial Intelligence (AI) technologies to generate and parse texts will inevitably lead to more proposals for AI’s use in the semantic sentiment analysis (SSA) of textual sources. We argue that instead of focusing solely on debating the merits of automated versus manual processing and analysis of texts, it is critical to also rethink our underlying storage and representation formats. Specifically, we argue that accommodating multivariate metadata is an example of how underlying data storage infrastructure can reshape the ethical debate surrounding the use of such algorithms. In other words, a system that employs automated analysis may typically require manual intervention to assess the quality of its output, or demand that we select between multiple competing NLP algorithms. Settling on whichever algorithm or ensemble can produce the best results, this is a decision that need not be made a priori at all.

OBJECTIVES: An underlying storage and representation system that allows for the existence and evaluation of multiple variants of the same source data, while maintaining attribution to the individual sources of each variant, would be an example of a much-needed enhancement to existing storage technologies, especially in anticipation of the proliferation of AI semantic analysis technologies.

METHODS: To this end, we take the view of AI in SSA as a sociotechnical system, and describe a possible novel solution that would allow for safer cyber curation. This can be done by allowing multiple different annotations to coexist within a single publishing ecosystem (whether those different annotations are the result of competing algorithmic models, or varying degrees of human intervention).

RESULTS: We discuss the feasibility of such a scheme, using our own infrastructure model (MultiVerse) as an illustrative model for such a system, and analyse the ethical implications.

CONCLUSION: Considering an underlying storage and representation system that allows for the existence and evaluation of multiple variants of the same source data, while maintaining attribution to the individual sources of each variant within a single publishing ecosystem helps mitigate risks of automation and enhances AI (semantic) explainability.