Gesture Sonification. An Interaction Design Approach to an Artistic Research Case

INTRODUCTION: The subject of this paper is a phenomenological study of cognitively plausible2 relationships between gesture and sound mediated by technologies. In this discussion, the term ‘gesture’ indicates a movement of the body consciously performed and able to express or communicate something. The expression ‘cognitively plausible’ refers to an interactive sound response that is immediate, continuously varying and enactively3 coherent with the generating (sonified) gesture. OBJECTIVES: The objectives of this paper are two. The first is an artistic exploration of gesture sonifications4 in various contexts. The second is rather methodological, i.e., to provide a possible general paradigm for Artistic Research5 (AR). METHODS: More in detail, the AR phenomenological approach is modeled according to an Interaction Design (IxD) research paradigm. RESULTS: A number of case studies of gesture sonification are presented and discussed according to the above methodological framework. CONCLUSION: We claim that the introduction of such methodological framework was successful in terms of providing robust guidelines for our research and for a clear and structured presentation of its results.


Introduction
In general, we view art as a means to explore and propose new perspectives for comprehending the world. We also think that artistic investigation can parallel scientific and humanitarian research in terms of construction of evidence, case studies, counterexamples and critique of questioned ideas. In our case, we take into consideration the field of Sonic Interaction Design (SID) [9], [10], adopting the concept of interaction with sound as a possible reference point for cooperation between art, science and technology. We also believe that embracing design methodologies in art could be an effective way to foster a dialogue and a mutual exchange between artistic and scientific researchers.

HCI Epistemological Revolution
In order to face the new challenges posed by an Information and Communication Technology (ICT) society, the necessity of promoting a wide-ranging interdisciplinary research effort from the humanities to the natural sciences has clearly emerged over the past decades [11], [12]. The epistemological revolution introduced by Human Computer Interaction (HCI) [13] and Artificial Intelligence (AI) [14] provides examples of this. When technology was employed only within the scope of the industrial revolution in the epistemological framework of a positivist philosophy of science [15], the complexity of human psychology, social behavior and culture was oversimplified, if not neglected. On the contrary, with fast-evolving digital means and the computer undergoing what Brey calls the passage from an epistemic role to an ontic role [13], it becomes necessary to conceive a technological development according to culturally, psychologically and anthropologically ergonomic criteria. Although, AI is gaining importance in interaction with sound and music as well [16], in our work, AI is not involved at all. The focus of this paper is on HCI body/perception aspects rather than on brain/intelligence issues.
In this sense, it is also remarkable as the name of a discipline born in a computer science framework, such as HCI, is gradually changing its denomination into Interaction Design (IxD), dropping the term machine and introducing the concept of design with all its ambiguities [17], [18]. In particular, IxD can be thought of as a discipline, which is in middle of natural sciences, humanities, technology and art. As Stolterman says "interaction design research has for some decades developed theoretical approaches, methods, tools, and techniques aimed at supporting interaction designers in their practice" and "many of them have intellectual roots in other academic areas, such as science, engineering, social science, humanities, and in the traditional art and design disciplines" [19].
In the debate within the HCI/IxD community, one of the questions is if a quantitative validation of the results could even be omitted, since in some cases it is rather meaningless due to the kind of studies and subjects involved in the discipline [20]. By pursuing the definition of general, simple models and the reproducibility of results as a guarantee of objectivity, reductionism shows its relativity as soon as the object of study and its environment involve human factors either psychological, social or cultural.
An example of a qualitative research approach in SID can be found in a comprehensive work by Frauenberger and Stockman [21]. In that paper, the authors propose to consider design pattern analysis as a point of reference for the discipline, introducing a method based on the definition of context spaces explored by pattern mining. They are aware of the pros and cons of a pattern-based approach, and the context is meant as an organizing substrate. This would provide designers facing new problems with useful references to already existing IxD cases and promote the growth of the discipline by building upon previous knowledge in the form of qualitative descriptions and analysis.

Artistic Research Practices for a Sustainable Relationship with Technology
HCI investigates technology at the point of contact with human complexity, i.e., with the non univocal nature of human thoughts, emotions and behaviors. As a consequence, HCI becomes a natural laboratory for an epistemological revolution, and, at the same time, provides the challenging playground for the development of a technology devoted to humans (and not the opposite). The goal is to go beyond the "computer metaphor and the related Cartesian mind-body dualism [that] have resulted in a fairly mechanical comprehension of the human being using a technical device" [22], in order to develop a technology, which has to be meaningful, ergonomic, and sustainable from many points of view: physical, psychological, social, cultural, ethical, and environmental.
As already claimed, we believe that art can play the role of a laboratory for developing compelling examples and new perspectives of comprehension of the world. A kind of research shortcut, following the path of intuition and creativity instead of that of strictly logic thinking. In particular, we believe that Interactive Art (IA) is an important actor in the development of a sustainable relationship with technology. IA can constitute the workbench for a free experimentation with conceiving, employing, analyzing, interpreting, and critiquing technology in a complementary and synergic way with respect to IxD and HCI.
Given the relevance of AR and IA, it is then useful to pose a methodological question: could design theory act as point of reference for an artist that works with digital technologies? We believe that the interconnection between art and design can be reciprocal, i.e., a designerly approach can be methodologically fruitful if adopted in IA (and art in general), and facilitate a dialogue between the two fields.

A Design-Oriented Artistic Research Paradigm
As a first step, we have to reconsider the value and, in some cases, the necessity of a reductionist strategy. Some sort of simplification has to be taken into account, in order to face questions and problems that are complex and multifaceted. We take as point of reference the methodological approach of basic design as developed in post-Bauhaus design schools (Ulm, Yale, Chicago). The main principle of basic design is to identify fundamental categories of problems that can be investigated by means of the proposition of design exercises with well-defined objectives and constraints around specific themes [23]. Basic design methods consist of analyzing actions, extracting what Findeli [24] calls interaction gestalts, i.e., elementary interactions and general primitives, and designing exercises around each of them (see also [25], [26]).
Our first claim is that making art inspired by a basic design approach, can be a fruitful strategy for art as well. Fundamental problems can be dealt with by numerous alternative solutions, which in turn can be compared to reveal and to combine multiple critical points of view. Second, the principle of cyclic iteration of i) proposed solutions, ii) evaluation and iii) redefinition of the solutions according to the evaluation results can provide a strong reference point to establish an artistic practice structured and biased towards a systematic investigation of a problem. Third, the fundamental design principle of going through rapid sketching and/or realization of mock-ups offers a powerful operating paradigm to face the unavoidable rapidity of technological evolution. Since technology is ever changing, it is problematic to base on it an interactive system that is durable and repeatable. However, an interactive artwork can be considered independent from any particular technology, if sketches and mock-ups can play the role of the 'score' of the artwork. Fourth, a design approach involves teamwork as a connate praxis: the artwork becomes a product of a team used to sharing ideas, plans and goals. Working in a team on an artistic project in a design fashion can be likened to the activity of a scientific research group, contrary to the model of an artist as an isolated creator. Nowadays, interactive artists are increasingly trained to work with other people. This means a shared capacity among interactive artists to work synergistically with people having disciplinary skills, methodologies and goals different from theirs.
The framework of an artistic practice inspired by a design methodology is shown in the diagram in Figure 1. In the artistic case, the starting idea is not the outcome of brainstorming around consumer/stakeholder needs and/or requirements, but a free artistic investigation. In the same way, the validation/evaluation phase by means of user-tests is substituted by the rehearsals and discussions with the performers or by the observation of the reactions of the visitors to a system exhibited in public spaces (a sort of qualitative user test), without any quantitative evaluation as usually required in product design.

Gesture Sonification from an Artistic Research Point of View
In this section, we discuss the application of the previously introduced methodological framework to an artistic research project on gesture sonification. In general, we talk about gesture rather than body movement, since we are concerned about the expressiveness and the meaning that a gesture gives. As already mentioned, we embrace a definition of gesture as a body movement able to consciously or unconsciously express or communicate something. Gesture is a concept that one can extend also to the sonic world and to the musical domain: musical events considered in their parametric dimensions (pitch, duration, dynamics and timbre) can be described and characterized according to the temporal directionality of those parameters. One of the first artworks on human gesture controlling digitally generated sounds in a continuous and expressive way was A Very Nervous System the pioneering interactive artwork by David Rokeby (1986-1990) 6 .

The Design-Oriented AR Methodological Framework Applied to Gesture Sonification
We look at design in two different ways, as argued in this and in the next subsection. The first reference to design regards the application of the methodology (Figure 1) to the investigation of interactive systems used for creating embodied and enactively coherent relationship between gesture and ICT generated sound. As depicted in Figure 2, we tackled the problem by defining four scenarios ranging from an artistic and professional context to an everyday one that can be experienced by anyone (see the alternatives in Figure 2). 6   The four rectangles correspond to the four iteratively cycling steps or Figure 1.
Each scenario produced multiple prototypes. The whole structure of the research is illustrated and documented in a website 7 The first scenario was developed into two subscenarios, one related to the contemporary dance and the other to contemporary circus denoted as Cirque Nouveau and based almost solely on human skills and crossfertilized by other performing arts [27]. The cyclic iterations leading to the refinement of the artistic 7 http://www.visualsonic.eu/production.html (consulted 20.02.2021).
prototypes were mainly carried out through discussions among the authors and the performers during the developing phase, as well as by considering the response of the audience, the critique by colleagues after the public performances and the analysis of the audiovisual documentation.
The second scenario gave birth to two versions of the same prototype, and the refinements were due to a) discussions among the authors with the collaborating professional dancer or with occasional non-professional users, and b) critique by colleagues after public performances in international conferences. The prototypes of the third scenario were in the form of a public artwork, and they were the most complex, since they also involved visual aspects. The first version was iteratively refined according to the users' comments during many exposure of the system to small audiences in a portable form using a laptop and a graphic tablet as interface. Finally, a second version was presented in the form of a public artwork with a large video projection in an open space allowing people to move freely and expressively, while performing their task.
The two prototypes of the fourth scenario, with the second being an evolution of the first, were based on the discussions between the authors, the comments of the users during the exhibitions and the analysis of the audiovisual documentation after the exhibitions. The collection and reorganization of video documentation provided a fundamental tool for qualitative data analysis. Even at a rudimentary level, the video documentation was the basis for building up a corpus of compelling examples and evidence. Each prototype will be discussed in detail in Section 4.

Use of Elementary Gestalts: Klee's and Bauhaus Legacy
The second reference to design is given by Paul Klee's work as educator during the period he was part of the Staatliches Bauhaus project in the Weimar Republic of Germany. A common idea of Bauhaus was the existence of a universal, not figurative visual language, often compared to the universal language of music [28]. Being mainly a painter, Klee represented a paradigm of collaboration between design and artistic practices. In 1927, Klee wrote the Pedagogical Sketchbook that was adopted at the Bauhaus as a fundamental reference for the module Design Theory [29]. In the Sketchbook, Klee traced a didactic route and, in parallel to this, he introduced the general tenets of his own artistic investigation. For instance, Klee illustrates the conversion of a dot from a "static element" into "linear dynamics". Sybil Moholy Nagy, a historian of art and architecture, wrote in her preface to the Sketchbook how the line, being a progression of dots, "walks, circumscribes, creates passive-blank and active filled planes" (see Figures 3,4 and 5).   In the same way as Klee started from a dot, which moving, creates lines and planes, we started thinking of gestures as generated by sequences of punctual positions, which formed configurations that grew in complexity in different degrees. Initially, the approach was essentially abstract. The aim was to create an interface, which produced sounds controlled by means of gesture analysis and recognition, and where gestures were disaggregated into elementary trajectory segments. Elementary sounds were coherently defined and employed for the sonification of different elementary gesture gestalts. We were interested in the emergence of basic gestalts, i.e., elementary perceptual/expressive units, composed by gestures and sounds. Acknowledging that any connection between gesture and technologically mediated sound is arbitrary, the aim was to explore ways to build new cognitively effective and meaningful relationships between them by means of appropriate mappings. Our initial idea was to determine a few elementary gesture trajectories and map each of them onto a well-defined set of sounds.
From a practical point of view, the first mobility agent we used, corresponding to Klee's dot, was a hand-held light. Indeed, the movements of a light (hand-held by the performer and optically sampled by a camera) were the dot source forming lines and curves in a 2D space and controlling the sound generation. The beginning, thus, was a punctual position in space, e.g., a hand able to EAI Endorsed Transactions on Creative Technologies 03 2021 -05 2021 | Volume 8 | Issue 27 | e1 generate gestures that were kinetically analyzed in terms of trajectory elements, in order to produce convenient sound responses.
As a first step, we segmented gestures taking into account a limited number of components, according to linear and curvilinear distinctions as in Klee's examples of Figures 3, 4, and 5. For the gesture sonification, we used a series of sounds matching the segment categories. The selection of the different sounds, in fact, was controlled by means of these basic geometric principles. Such a setup was employed in two preliminary artworks for a dancer and a computer presented in Trieste, Italy, at Sala Tripcovich in 2008 and in Graz, Austria, at Theater im Palais in 2009 8 .

Everyday and Choreographic Gesture Sonification. The EGGS System
At a certain point in the development of the new system for gesture sonification, we decided to call it, Elementary Gestalts for Gesture Sonification (EGGS) [30]. EGGS provides a system for gesture sonification, based on configurable and redefinable elementary mappings between gesture and sound that can be combined and articulated with a high degree of freedom. The system is versatile and has proven useful in performing arts, interactive dance and public interactive installations. Quite different from other gesture-sound mapping strategies [31], [32], [33], we have chosen an elementary kinematic gesture analysis, which is abstract and suitable for dealing with general categories. In this sense, our approach also differs from the affective analysis of gesture adopted in other works on gesture sonification [34], [35], [36]. Furthermore our work does not concern gestures in musical interactions [37], [38], since sound outputs are considered as a sonic representation of nonmusical gestures.
As explained before, we initially regarded gestures as kinematic trajectories generated by moving points, with the point being a marked hand, or elbow or knee, and we established some segmentation rules based on simple geometric criteria. The EGGS system, in fact, processes the visual data of a moving punctual source with a trajectory tracking routine, which returns different indexes matched to different trajectory categories. We expanded the system from the two initial basic categories, straight and circular, to five categories: i) circular clockwise ii) circular counter-clockwise iii) straight iv) direction inversion and v) still. Additional motion parameters were taken into account for the purpose of making the sound feedback more perceptually coherent with gesture evolution. These parameters are vector and scalar velocity, vector and scalar acceleration, and absolute position. Such attributes were necessary to go beyond an abstract geometric approach and gain expressivity and dynamic coherence between gestures and sonic feedback. The very first simple (however effective) ideas were to map velocity and acceleration onto sound level, and the absolute position onto the pitch (high position meaning high pitch and, vice-versa, low position, low pitch). Sometimes, the absolute position was mapped onto some audio effects by varying the timbre, mainly along the vertical axis. Furthermore, vector quantities allowed us to discriminate among different inclinations with respect to the vertical axis. In some of the latest versions of the system, we considered also angular velocity and acceleration, spanning all of the Dynamic Interaction Primitives (DIP) introduced in Virtual Reality during the last decade [39].
A challenge we soon faced during the early experiments with professional performers was that EGGS entails learnability issues (concerning apprenticeship in New Interfaces for Musical Expression -NIME 9 research, see [40]). However, after many iterative refinements, the system was developed into a more immediate version, easily usable by anybody since it could be setup so as to output meaningful sonifications of any arbitrary and ordinary gesture. The user-friendly issue was fundamental in public installations that foresaw the involvement of an audience. Even visitors with no specific training could have a stimulating, enjoyable and satisfying experience during the sonic-embodied interaction with the public installations. We avoided any complex and nonimmediate mappings, such as those obtainable by storing and recalling past gestures, i.e., by introducing memory in the system [41].
Provided these founding criteria, the investigation on sonification as well as the system development followed the design principle of realizing multiple alternative versions of the same idea through different artworks (the prototypes). Also, similar to a design practice, where new prototypes of a certain version depend on a cycling way on a critical analysis of the previous results, we produced subsequent artworks of the same scenario. The two strategies allow us, respectively, to broaden and to refine our knowledge about the main theme: the effectiveness of sound as representation and continuous feedback of expressive gestures in a multimodal sense. Here, multimodality has to be considered both from the audience point of view (watching the dancer, while listening to the sound produced by the gestures) and from the performer/user point of view (listening to the sound in a proprioceptive way). 9 https://www.nime.org/ (consulted 20.02.2021).

Case Studies
In this section, we analyze in detail the four scenarios and the related cases corresponding to the prototypes of Figure  2. In the following, we designate the exercises as cases.

Scenario 1: Professional Performance
In case of an artistic context, when working with professional performers, we first introduce them to the system saying that sound is treated as a consequence and a representation of the choreographic gesture and its expressive content. We could conceive of the EGGS system as a 'choreophone', in that the performers do not follow a musical piece, do not conduct a musical piece, and do not even create any music. Rather, they 'listen' to their gestures, enactively managing and adjusting their choreographic actions according to the sonic responses of the system. Thus, sound is an auditory after-effect and an embodied continuous feedback, able to augment the proprioception of the performer, and in no way separable from the gesture itself.
In the form of an exercise in a basic design fashion, the work with professional performers can be formulated as follows:

Case 1
Theme: Continuous sonic feedback for two independent arms moving in a kinesthetic sphere.

Objectives:
• Overcome the separation of music and dance (sound and movement) in professional dance.
• Develop the idea of 'listening to gestures'.

Constraints:
The sonic feedback should be continuous, ecological sound-based [42], intuitive and trajectory-dependent in order to provide a variety of gesture-sound gestalts.
From this case, we created an interactive performance for solo dancer entitled Swish 'n' Break (SnB) 10 . In SnB, the performer held one light per hand. Each light worked as a marker and defined the gesture. A gesture was, thus, processed by a light tracking routine able to discriminate between different categories of trajectories, such as previously mentioned: straight, inversion, circular clockwise (CW), circular counter-clockwise (CCW), as well as different inclinations with respect to the horizontal plane, and different curvature angles. In all of the artworks, still corresponded to silence. Technically, the distinction between different trajectories (gesture gestalts) was achieved by evaluating the angle variations of the segments connecting three consecutive pairs of detected 10 See the first video at http://visualsonic.eu/performance.html (consulted 20.02.2021). points, which corresponded to the centripetal acceleration (see Figures 6 and 7). A variation close to zero mapped to a linear trajectory, while a bigger variation mapped to one of the curvilinear categories according to the sign of the variation.  The coupling of the gesture dynamics with the sound dynamics and other sound parameters added a further and fundamental expressive layer. In Tables 1 and 2, we show some examples of mapping strategies. According to performer feedback after a number of rehearsals, the outcomes were deemed immediately expressive as well as potentially elaborate, and the system offered and elicited unexpected solutions in terms of sound-gesture relationships.

EAI Endorsed Transactions on
Creative Technologies 03 2021 -05 2021 | Volume 8 | Issue 27 | e1 SnB was presented for the first time at the SMC 2010 conference [43]. The performance was conceived as a controlled improvisation on a predefined score of sounds and gestures. All of the sounds were drawn from the Freesound 11 project. A set of keywords were established in advance, in order to retrieve and define a number of sound families. More in detail, following a programmatic compositional approach, the keywords determined the three sections: 1) Swish, 2) Air -Water -Fire -Earth [the basic elements of nature] and 3) Break. The central section 2), the most elaborate, from a sonic point of view, was developed in terms of a gradual change from an outdoor natural soundscape to an artificial indoor one. In each section, we opted for different sound-trajectory mappings (see Tables 1 and 2 illustrating the mapping of the first and the second sections, respectively). Moreover, within fixed constraints, the laptop performers could modulate the sound timbre as well as other parameters of the system responsiveness, thus creating a dialogue with the dancer.
The final output was a mostly predefined sonicchoreographic score, agreed upon by the three authors after many rehearsals. During the rehearsals, the dancer explored the potentialities of the system sonic feedback and selected, adjusted and refined the gesture accordingly. 11 www.freesound.org (consulted 20.02.2021).
Reciprocally, the dancer learned how to manage and exploit the sound materials through the gesture. Any decision about the development and refinement of both the performance and the system, were agreed upon by all of the three through discussions and experimentation during the rehearsals. Typically, different selections, combinations and concatenations of gesture-sound mappings were tested, aiming at an optimization of the final result, in accordance with the creative methodology previously discussed and illustrated in Figure 2.
According to basic design practices, the same theme with a different set of constraints may make perfect sense as a different case. This happens in Case 2, where we changed the constraints concerning the trajectory dependence.

Case 2
Constraints: With respect to Case 1, we waived the distinction between straight, CW and CCW trajectories (while maintaining the discrimination among the inclinations) in order to explore a less geometrically constrained gesture and a simplified gesture-sound mapping.
Case 2 resulted in a performance entitled un-pLugged pLoden, a Ligament Lento. The performance was in the same style as SnB and was presented at the International Computer Music Conference ICMC-2012 in Ljubljana 12 .
The sounds utilized in un-pLugged pLoden were retrieved from the above mentioned Freesound project using the keywords slowdown, decreasing and braking. As in SnB, the keywords defined the global structure of the artwork. Its form, in fact, was characterized by a constant and gradual deceleration beginning with explosive sounds produced by large decelerating engines, then decelerating train rumbles, next car braking sounds, finally faint bike brake whistles.
The next two cases were the outcome of experimentation in a contemporary circus performing context.

Case 3
Theme: Discrete sonic feedback for four independent limb movements in a free space.

Objectives:
• Overcome the separation of sound and gesture in acrobatic exercises exercise in a professional contemporary circus context.
• Develop the idea of 'listening to gestures'.

Constraints:
The sonic feedback should be discrete, speech-based, intuitive and (occasionally) limbdependent. 12  We started to work with professional contemporary circus artists for the first time in 2013. The research was funded by a grant of the Lerici Foundation of Stockholm, within the Gynoïdes Project and in the context of a collaboration between the KTH Royal Institute of Technology and the DOCH University of Dance and Circus in Stockholm [44]. After the first test with circus artists of different disciplines, we decided to move to Wireless Inertial Measurement Units (WIMU) for motion tracking purposes, relinquishing the optical system previously employed. In fact, it was clear that light bulbs used as optical markers as in dance performances (Cases 1 and 2) were not convenient in the circus environment. Glass fragility is a severe limitation. In circus practice, movements can be very broad and powerful, and include most body parts touching the ground or tools. Bulbs break easily, becoming useless as optical markers and potential health hazards. Another good reason for using a WIMUbased motion tracking system instead of a single camerabased optical one was the complete freedom of movements around the stage for the circus artist, and the independence from lighting conditions. Also, the experimentation with WIMUs led to the introduction of a new category of elementary gesture in the EGGS repertoire, rotation around one axis, a movement whose velocity value was given directly by the gyroscope of the WIMU.
The first WIMU-based circus performance was called CyborgAcrobat 13 . In this performance, the artist, a free body acrobat, performed stylized and mechanical gestures sonified by recorded excerpts of her own voice reading an ironic text. Four WIMUs were placed on the each limb of the performer (see Figure 8), and an audio fragment (elementary sonic gestalt) of a pre-recorded text was triggered each time the rotation velocity value on one axis of one of the WIMUs exceeded a certain threshold. The text was composed and recorded by the performer herself, and then segmented into single words or longer phrases, the constrain being that every single audio text unit had to be understandable. During the performance, these elementary audio fragments could be selected in three different modalities: normal order, random order, and one that considered only four predefined words triggered by four predefined gestures, one for each limb. During the performance, the three modalities were changed several times by the laptop performer, following a choreographic score. Tracked gestures were forearm extensions, and leg flexions and rotations. Compared to Case 1 and 2, where the 2D projections of straight and curvilinear trajectories in the camera visual field were considered, gestures in this case were invariant in the 3D space, because their rotation axes were integral to the WIMUs.
After the performance, we collected the impressions from the audience and had discussions with the performer in order to evaluate the outcome. Our conclusion was that using a discrete sonic feedback was a good solution specifically for audio text. According to the performer, this kind of gesture-controlled triggering of speech fragments was effective, simple, clear and successful in generating sonically complex and interesting solutions. Several comments from the audience confirmed that this kind of sonification was effective, because the gesturesound relationships were understandable and enjoyable. Additionally, the speech fragments enhanced the expressivity by adding a semantic layer to the multimodal perception and to the cognitive aspects involved in the performance.

Case 4
Theme: Continuous sonic feedback for tool-motion sonifications.

Objectives:
• Overcome the separation of sound and gesture in acrobatic exercises, in a professional contemporary circus context.
• Enhance the acrobat's proprioception and responsive listening.

Constraints: The sonic feedback should be continuous, ecological sound-based, intuitive.
Although the Case 3 outcome was positive, after initial experimentation with Case 4, the following basic design exercise, it soon became clear that a discrete, speechbased sonic feedback would not be the optimal solution in any circus discipline. Case 4, titled Sonified Wheel 14 , was different in various aspects: the artist, sensors-free, used a tool, a Cyr Wheel. The Cyr Wheel is a large metal wheel, in which performers can stand forming a single unit with the wheel and roll around the stage [44]. Two WIMUs were fixed on the internal side of the Cyr Wheel, (see Figure 9). The sonic feedback was chosen to be continuous and based on everyday sounds. In correspondence with the wheel shape and motion, we decided to use a continuous and cyclic sonic feedback, such as that of sound loops. The loops were triggered by 14 See the fourth video at http://visualsonic.eu/performance.html (consulted 20.02.2021).

EAI Endorsed Transactions on
Creative Technologies 03 2021 -05 2021 | Volume 8 | Issue 27 | e1 the lateral and longitudinal orientation of the wheel, and the playback velocity was modified by the rotation velocity at the triggering times. During the performance, different types of sounds were used by the laptop performer following a predetermined score. In a range that went from human to mechanical sounds, the sound sequence was: human breath, wind, old scanner, techno loop and industrial loop. All the sounds were taken from the already mentioned Freesound project.
Differently from the other cases treated in this article, sounds were controlled by the motion of a tool, and not directly by body movements. Indeed, this was an object motion sonification and not a gesture sonification. However, every movement of the Cyr Wheel depended on the artist's gestures, when held, and from the Cyr Wheel inertia, when released, so that this can be regarded as a gesture-mediated sonification.
Based on the public performances, discussions with the performer and several comments from the audience, we can conclude that the sonification adopted in Case 4 was expressively effective, but the gesture-sound relationships were less clear to the audience than in the other cases. This was probably due to the complexity of some of the sonic loops, resulting in sonic feedback in some parts less adherent to the motion. For the same reason, a very careful listening is required to the performer, in order to be able to responsively control the sonic feedback, and hence Case 4 can be considered a good exercise for sound exploration and a success for enhancing the performer enactive listening.

Scenario 2: Professional and Non Professional Disco-Club Dance
The second scenario considered a different kind of performance/dance, which could also involve nonprofessional dancers, for example, people having fun in clubs.

Case 5
Theme: Continuous and discrete sonic feedback for two independent arms moving in a kinesthetic sphere.

Objectives:
• Overcome the separation of music and dance (sound and movement) for trained club dancers.
• Introduce a flavor of electroacoustic music in disco club culture.

Constraints:
The feedback should be discrete, musical, intuitive, and multimodal (sonic and visual).
From this case, we created a performance, entitled Body Jockey 15 that was presented in a concert at the international conference NIME 2011 in Oslo [45]. In Body Jockey, the idea was to introduce embodiment in Electronic Dance Music (EDM). The hardware setup was a readaptation of that of SnB. The software for the sound control was completely rewritten. The Freesound project was once more the source of the major part of the sounds. During the performance, the dancer triggered sounds through movements (see Figure 10), while the two electronic performers acted as a DJ and a VJ, changing sound and video mappings, respectively. In Oslo, the visual part consisted of a graphical representation of the dancer's gestures projected on a big screen at the back of the stage. The aim was to obtain a multimodal experience both for the dancer and the audience. As in the previous cases, the purpose was also to generate a tight relationship between the laptop performers and the dancer's actions.
The overall structure of the performance was fixed. On the other hand, within each of its sections, the musical and visual discourse was developed by means of controlled improvisation. Sounds, graphics, mappings and beat changed in every section. The first section was based on a regular beat and percussive sounds. Trajectory inversion was the elementary gesture gestalt employed to trigger sounds. The main rhythmic patterns were based on the usual EDM even meters. The volume of each sound was constant, while the dynamic changes were obtained by varying the sound event density. The central section was more free-style and based on sustained sounds that were not beat constrained, somewhat in a SnB fashion. In fact, the elementary gesture gestalts illustrated in Case 1, such as the straight and circular trajectories, were employed to modulate the sound parameters. The final section was a reprise of the first one in an EDM style.
The ultimate goal was to produce an enhanced discoclub event, where body, music and video provided a multimodal, however indivisible, experience. By means of the gesture sonification and visualization, the dancer seemed to embody the music and, according to the commentaries gathered after the performance, this was clearly perceived also by the audience. 15 See the first video at http://visualsonic.eu/discoclub.html (consulted 20.02.2021).

Case 6
Theme: Discrete sonic feedback for two independent arm movements in a kinesthetic sphere.

Objectives:
• Overcome the separation of music and dance (sound and movement) in contemporary untrained club dancing.

Constraints:
The feedback should be discrete, musical sound-based, intuitive.
Case 6 was mainly the product of the critique of various colleagues after the performance at NIME 2011. The resulting performance was Body Jockey-2, a completely re-composed version of that presented in Oslo. The video feedback was dropped, since it was judged redundant and distracted from the musical contribution of the choreographic action. Also, the hybrid music style, mixing EDM and Electroacoustic music, that characterized the first version of Body Jockey was critiqued as non-suitable for untrained club dancing. Thus, Body Jockey-2 was conceived in a pure EDM style and it was presented at the international conference ICMC-SMC 2014 in Athens 16 . We tested the system with non-professional users too, who gave univocally very positive feedback. Even if the tests were not performed in a real club, everybody judged the experience as exciting and involving. 16 See the second video at http://visualsonic.eu/discoclub.html (consulted 20.02.2021).

Scenario 3: Public, Pseudo-Artistic and Multimodal Case 7
Theme: Continuous sonic and visual feedback for hand-drawing movements on a surface.

Objectives:
• Fuse proprioception, sonic and visual enactive feedback into a unique self-representation.
• Develop the idea of 'listening to gestures'.

Constraints:
The sonic and visual feedback should be continuous, intuitive and trajectory dependent; the visual feedback should also be abstract.
The outcome of this scenario was Visual Sonic Enaction (VSE) 17 , an interactive and multimodal installation aimed at creating audiovisual representations of the gestural expressivity of interacting visitors. VSE was initially presented as a system for painting on a computer screen guided by sounds. In fact, sound stimulated and drove the movements of the visitor's hand using a wireless pencil on an external tablet interface, thus producing an embodied, multimodal and continuous feedback to the hand gesture. Indeed, sound worked as the pivot component of the visual, auditory and proprioceptive elements of VSE.
Three groups of basic sounds and three families of elementary graphic icons were selected and used for the sonification and visualization of two fundamental categories of gesture gestalts: circular and straight. For each visual-sonic drawing, the user could employ only one sonic group. On the other hand, within a single drawing, the user could change graphic families at any moment and any number of times. The three graphic families were schematically depicted in the bottom-right corner of the VSE canvas shown in Figure 11, while the icons in the top-right corner represented the three sound groups: i) metallic and tinkling samples, ii) low pitch Frequency Modulation (FM) synthesis-generated sounds, and iii) crystal tinning sounds synthesized by means of the Sound Design Toolkit (SDT) physics-based sound models [46].
The three groups of sounds were experimented with individually in three separate visual-sonic drawings. Only when the user saved an existing drawing and started the next one from scratch, was it then possible to proceed to the next sound group. In VSE, we implemented various types of mappings, more or less modulated in timbre and other parameters in correspondence to the gesture classification. Some mappings were based on a distinction between straight and circular trajectories, while others depended continuously on the trajectory curvature angle. Similar mappings were applied to the graphic parameters. This resulted in a significant differentiation among the nine possible combinations of sonic and graphic families. When presenting VSE, users were told that the aim was not to paint. Rather, what appeared on the computer screen was a visualization of their hand gesture expressivity. In this way, the visual feedback enactively guided the visitors' gesture differently according to the different graphic families. Employing distinct graphic families was a crucial strategy for allowing people to distinguish between achieving self-awareness of a gesture and the mere act of painting. In addition, it uncoupled a graphic family from a particular sonic group. The same gesture, in fact, could produce completely different, however coherent, graphic feedback, given a certain sonic feedback. Indeed, according to Michel Chion's definition of audio-vision [47], any association of audio and moving images creates a complex and independent object belonging to a third multimodal dimension, i.e., a kind of perceptual and cognitive 'vector product'. Moreover, in VSE, a concurrent production of sounds and graphics as a consequence of a single gesture analysis gave an impression of coherence in the arbitrary juxtaposition of basic sonic and graphic families unified by the same body action.
VSE was proposed to different users in a studio environment and it was refined by means of many cyclic iterations of observations of the users' behavior, discussions with the users after the experience as well as discussions among the authors.

Case 8
Theme: Continuous sonic and visual feedback for one arm movements in a kinesthetic sphere.

Objectives:
• Fuse proprioception, sonic and visual enactive feedback into a unique self-representation.
• Develop the idea of 'listening to gestures'.

Constraints:
The sonic and visual feedback should be continuous, intuitive and trajectory dependent; the visual feedback should also be abstract.
Case 8 represents an extension of Case 7. The system was installed in a public location, where a video art exhibition was taking place. It was introduced to the audience by means of a graffiti painting metaphor: we let the visitors freely 'paint' on a wide wall using an electric torch, which simulated a spray can. The same graphic and sound materials and algorithms were adopted from Case 7. The torch light was detected by a camera as in Case 1 and 2. The pictures on the wall were video projections from the computer. Moreover, the users could freely choose the different graphics by shaking three colored interactive bottles available on a small table beside the painting area. The bottles were augmented by means of wireless accelerometers. The interacting user wore wireless headphones, which provided a more intimate and immersive experience. At the end of the experience, the visitors could take away their abstract visual-sonic 'self portrait' in the form of an audio-video file containing their gestural expressivity. Furthermore, bystanders could watch and listen to the bodily expressivity of the interactive visitor both during the event and afterwards, on the website, where the visual-sonic self-portraits were uploaded. Among the audience, not everybody but many were able to understand very quickly the aim of VSE and strove to listen and watch their gesture as represented by the various sonic/graphic combinations. Some were able to reach a good coherence among all of the aspects involved (see, for instance, the outcomes by Serena 18 ). Figure 11 reports the graphical part of one of the visualsonic self-portrait. The video documentation is fundamental for the analysis of the visitors' responses. On the website, we uploaded a short excerpt of some of the users' performances. For example, it is interesting to notice the case of Fabiola, a professional painter, who first hesitated and painted with small gestures, and then, when she discovered the sonification effect, her gestures became broader and more embodied.

Scenario 4: Everyday Life
We explored the main theme of this study also in an everyday context. The particular subject we investigated was the sonification of gait expressiveness. As Marcel Mauss [48] has already observed, gestures such as standing, sitting, or walking are important vehicles of communication. With this in mind, we formulated the following two case studies.

Case 9
Theme: Continuous sonic feedback for two independent legs walking or running along a straight path. 18  This scenario was firstly explored by means of a public interactive installation titled Sonic Walking (SW) 19 and presented at the European Night of Researchers in Trieste, Italy, in 2010. With respect to VSE, we shifted the attention from a creative movement to an everyday action and from the upper limbs to the lower ones. In fact, visitors walked along a linear path in an ordinary space. The objective was the gait sonification through ecological sounds. More precisely, the sounds were related to the basic elements of nature, i.e., air, fire, earth and water. In particular, we chose the sound of a strong wind, the sound of a roaring fire, and, for the earth, the sound of a rain stick containing grains. For the water, we chose two different sounds: gentle waves on the seaside and an underwater sound. The audience experimented with the five sounds in a pre-established sequence: water, earth, fire, air, underwater. When introducing the system to the visitors, they were told that they would listen to their walking and that their walking would first patter in water, then crunch in the sand, crackle in the fire, blow like the wind and, finally, submerge deep underwater. The interacting visitors could move along an approximately nine-meter-long track, wearing two small lamps fixed to the exterior sides of their knees. In this way, every one of the two cameras positioned on the two sides of the track could separately detect each of the lights. Furthermore, the interacting visitors used wireless headphones, which provided a more intimate and immersive experience of their body movements, represented by a continuous sonic feedback. The sounds were also amplified by four loudspeakers positioned at the far ends of the track, so that the bystanders could listen to the gait sonification of the interacting visitor. Differently from the previous scenarios, in Sonic Walking there was no trajectory analysis, and the sonic feedback was driven only by the dynamic characteristics of the gait.
The emotional response of the visitors is well represented in the video documentation: one adult woman was almost dancing, while a child was almost scared by the fire sound. On the other hand, a second child was clearly engaged in the ear-driven action of swimming underwater. It is significant that we were not able to engage any adult man among the hundreds of visitors to the event: only women and children of both sexes. 19 See the first video at http://visualsonic.eu/everydaylife.html (consulted 20.02.2021).

Case 10
Theme: Continuous sonic feedback for two independent legs of two people moving in a free area.

Objectives:
• Fuse proprioception and sonic feedback into a unique gait self-representation.
• Enhance ecological sound listening awareness.
• Elicit social behaviors and non-verbal communication through sound and gesture expressiveness.
• Develop the idea of 'listening to everyday gestures'.

Constraints:
The feedback should be continuous, ecological sound-based, intuitive.
Case 10 was presented at the international FKL symposium on soundscape, in Florence in 2011. It represents a further development of SW after discussion between the authors. This time, the public installation involved two visitors at a time, engaging sonic dialogues through walking. The socialization potentialities of the tool are well described by the short video 20 that shows two users employing the system for the first time.

Discussion
During the past years, AR has become an important topic at different levels throughout academia. Different stakeholders in politics and academia are enhancing the role of creativity in research by means of funding strategies and policy objectives. In the last two decades, many university of the arts have opened throughout Europe. Often, this was a direct outcome of political strategies that encouraged aggregations and collaboration among partners/faculties/stakeholders from different artistic fields, ranging from visual arts to music and extending in some cases to design. One of the main challenges was, and is currently, to define research paradigms that involve practitioners, and develop PhD programs endowed with proper assessment criteria, procedures and standards 21 .
One of the main points of reference for AR is the widely discussed concept of Practice-Based Research (PBR). PBR refers to not only arts but also design, and other disciplines. In the creative arts, PBR has been present in academic contexts for over 35 years [49]. Many questions and issues emerged from scholarly debates, leading to different definitions for 'practice' used in research, e.g., PBR, Practice-led research, Practice as research (PaR), and creative arts PhD [50]. According to Candy and Edmonds [49] PBR concerns the production of 20  creative artifacts that contribute to general knowledge, while Practice-led research "leads primarily to new understanding about practice".
The research methodology introduced in this paper can apply to research in artistic practice, where the main goal is the development of art for art's sake. However, as argued in the introduction of this paper, our model aims at artistic production as a means of investigating the world, in parallel with scientific research. Art as a means of producing evidence and questioning of facts. The aim of AR is not to establish facts or test and confirm theories as in scientific research rather it is intended to 'generate' new 'facts' that can help to interpret, to change perspectives and to understand facts that are objects of systematic scientific investigation. We think that imagination is the common ground/asset that allows the expansion of human awareness and knowledge both in the sciences and the arts. In this sense, we argue that a cross-fertilization is possible if the different 'imaginaries'/imaginations are able to communicate and dialogue. A common language or at least clear translation parameters between the different languages/disciplines and methodologies needs to be agreed. This paper goes in this direction. Furthermore, our model, based on an IxD paradigm, provides a general framework for this common language. This methodological framework is not a particular product of a specific project (as in [49]) but a conceptual paradigm that paraphrases the epistemic structure of an interdisciplinary field such as design. Additionally, it represents a research paradigm that goes beyond the circumscribed scope of PhD program development and proposes to artists a conceptual framework for planning and reflecting on their artworks beyond academic contexts. As demonstrated in the ten cases discussed in this paper, this paradigm provided a fundamental guideline for the development of the research path that we followed in our artistic investigation of gesture sonification. For this reason, we argue that the ten cases generated, in Candy and Edmonds words, "new knowledge that can be shared and scrutinized" [49].
Another important point is the idea of artistic production as teamwork, in opposition to the paradigm of a single or main author. We believe that the role of artistic production can be redefined as a shared activity, involving common authorship, i.e., art as an outcome of multiple minds.
Finally, we admit one of the limitations of this paper is that it lacks effective and structured assessment procedures. In the ten cases, the assessment was embryonic and not systematic. No structured collection and interpretation of output data according to some predefined experimental protocol was performed. Nonetheless, we argue that the discussions with peers during conferences and the public exposure to the cases were of extreme importance, in order to stimulate debate and reflection within our team, and to direct each following step of the research. Furthermore, artistic research frameworks discussed by Candy and Edmonds [49], though single project-related, presented detailed assessment procedures and results. The definition of assessment procedures could be a further development and refinement of the methodological framework introduced in this paper. A recent interesting example of evaluation procedures in the context of AR was proposed in ARCAA (Actors, Roles, Contexts, Activities, and Artefacts) [51], based on thematic analysis borrowed from the qualitative research methodologies adopted in psychology, However, we should keep in mind what Vanlee and Ysebaert said about, "the importance of allowing an assessment culture to emerge from practitioners themselves, instead of imposing ill-suited methods borrowed from established scientific evaluation models" [52].

Conclusions
In this paper, we presented the results of several years of work on gesture sonification. Gesture sonification was defined as expressive body movement in space, interactively mediated and conditioned by ICT generated sound. The subject was investigated from many points of view and in many contexts ranging from professional artistic performance to everyday scenarios in the form of public art installations. In some cases, we added a multimodal dimension, taking into consideration gesture visualization.
The resulting ten case studies were organized in an AR methodological framework that takes as the point of reference the theory and methodologies of IxD. We claim that the introduction of such a methodological framework was successful in terms of providing solid guidelines to our AR work and its results organization and presentation. The AR framework is considered the main result of this paper.