Bridging Sensorial Realms in Multimedia Composition: Towards an Artistic Intelligence in Cross-Modal Interaction

Exposé by Francisco Uberto

Abstract
This research explores cross-modal methodologies linking video, sound, and artificial intelligence (AI), focusing on mapping and machine learning to create dynamic, immersive multimedia compositions. Building on my work at ICST’s EARS pre-PhD program at ARTILACS, I will investigate how AI-driven latent space navigation can enhance interactive artistic intelligence and redefine human-machine collaboration.
By incorporating biofeedback and real-time adaptive audiovisual systems, I aim to develop new models for artistic organicity in multimedia creation
Research Context and Objectives
The hybridization of human creativity and AI demands critical artistic engagement—not merely adopting new tools but reshaping and questioning them. Situated in a post-digital paradigm (Cascone, 2000), my research establishes new bridges between sensory modalities through real-time algorithmic interaction, examining whether meaning in multimedia composition can emerge autonomously from a dense network of mapped parameters.  
Key research questions include:  
How can live audio input alter AI latent space mapping to create self-organizing audiovisual compositions that transcend traditional synchrony?  
– What are the cognitive and perceptual implications of such modifications in an immersive multimedia experience?  
– How can biofeedback and physiological data enhance multimodal coherence in artistic systems?
Methodology
This research follows an iterative process combining:  
Theoretical investigation: Reviewing literature on cross-modal perception (Spence & Deroy, 2013), AI in the arts (Manovich, 2018), and post-digital aesthetics.  
Algorithmic development: Implementing machine learning models for feature extraction and latent space navigation, alongside real-time biofeedback mechanisms.  
Experimental multimedia composition: Developing original artistic works where video, sound, and AI-driven processes dynamically interact.  
Audience studies: Evaluating perceptual responses through empirical testing, informed by cognitive psychology (Spence & Deroy, 2013; Ramachandran & Hubbard, 2001) and media theory (Manovich, 2018; Hayles, 1999; McLuhan, 1964).  
Preliminary Experiments and Findings  
During my time at ICST ZhDK, I developed INFUSED 1.02e+6ms, a multimedia work integrating live performers, electronic music, and video manipulation. Utilizing OpenCV-based camera movement detection, I mapped compositional changes across sound and image, demonstrating how a dense network of mappings can achieve an organic aesthetic in glitch art.  
A second work, All My Neurons Have Clits, explored glitch aesthetics in audiovisual performance. By employing transient detection on electric guitar, I triggered real-time video distortions, echoing Helmholtz’s principles of temporal proximity in multimodal perception. The integration of AI-driven decision-making in this context remains an open frontier.
Next Steps: AI, Biofeedback, and Virtual Organicity
Building on these foundations, my PhD research will integrate machine learning for generative audiovisual structures, focusing on:  
– AI-based latent space exploration for multimedia composition.  
– Biofeedback integration (e.g., EMG, EEG) to enhance performer interactivity.  
– Dynamic real-time parameter mapping, investigating emergent aesthetics through stochastic models and AI-driven mediation.  
Expected Contributions
– A conceptual framework for virtual organicity in multimedia composition.  
– Original works demonstrating AI-driven audiovisual interaction.  
– Theoretical insights into cross-modal perception and real-time adaptation.  
– Practical applications in interactive multimedia systems, performance, and installation contexts.  
Conclusion
This project advances discourse on AI and artistic intelligence, challenging boundaries between human creativity and machine agency. By leveraging AI’s latent spaces, biofeedback, and multimodal integration, it aims to contribute both theoretical and practical insights into artistic intelligence in multimedia composition.  
REFERENCES

BAUDRILLARD, J. (1981). Simulacra and Simulation. University of Michigan Press.  

CASCONE, K. (2000). “Post-Digital” Tendencies in Contemporary Computer Music. Computer Music Journal, 24(4), 12-18.  

MANOVICH, L. (2018). AI Aesthetics. Strelka Press.  

SPENCE, C., & Deroy, O. (2013). How Automatic Are Cross-Modal Correspondences? Consciousness and Cognition, 22(1), 245-260.