IJCAI.2023 - Others

| Total: 212

#1 Evaluating Human-AI Interaction via Usability, User Experience and Acceptance Measures for MMM-C: A Creative AI System for Music Composition [PDF] [Copy] [Kimi1] [REL]

Authors: Renaud Bougueng Tchemeube, Jeffrey Ens, Cale Plut, Philippe Pasquier, Maryam Safi, Yvan Grabit, Jean-Baptiste Rolland

With the rise of artificial intelligence (AI), there has been increasing interest in human-AI co-creation in a variety of artistic domains including music as AI-driven systems are frequently able to generate human-competitive artifacts. Now, the implications of such systems for the musical practice are being investigated. This paper reports on a thorough evaluation of the user adoption of the Multi-Track Music Machine (MMM) as a minimal co-creative AI tool for music composers. To do this, we integrate MMM into Cubase, a popular Digital Audio Workstation (DAW), by producing a "1-parameter" plugin interface named MMM-Cubase, which enables human-AI co-composition. We conduct a 3-part mixed method study measuring usability, user experience and technology acceptance of the system across two groups of expert-level composers: hobbyists and professionals. Results show positive usability and acceptance scores. Users report experiences of novelty, surprise and ease of use from using the system, and limitations on controllability and predictability of the interface when generating music. Findings indicate no significant difference between the two user groups.


#2 The ACCompanion: Combining Reactivity, Robustness, and Musical Expressivity in an Automatic Piano Accompanist [PDF] [Copy] [Kimi1] [REL]

Authors: Carlos Cancino-Chacón, Silvan Peter, Patricia Hu, Emmanouil Karystinaios, Florian Henkel, Francesco Foscarin, Gerhard Widmer

This paper introduces the ACCompanion, an expressive accompaniment system. Similarly to a musician who accompanies a soloist playing a given musical piece, our system can produce a human-like rendition of the accompaniment part that follows the soloist's choices in terms of tempo, dynamics, and articulation. The ACCompanion works in the symbolic domain, i.e., it needs a musical instrument capable of producing and playing MIDI data, with explicitly encoded onset, offset, and pitch for each played note. We describe the components that go into such a system, from real-time score following and prediction to expressive performance generation and online adaptation to the expressive choices of the human player. Based on our experience with repeated live demonstrations in front of various audiences, we offer an analysis of the challenges of combining these components into a system that is highly reactive and precise, while still a reliable musical partner, robust to possible performance errors and responsive to expressive variations.


#3 TeSTNeRF: Text-Driven 3D Style Transfer via Cross-Modal Learning [PDF] [Copy] [Kimi] [REL]

Authors: Jiafu Chen, Boyan Ji, Zhanjie Zhang, Tianyi Chu, Zhiwen Zuo, Lei Zhao, Wei Xing, Dongming Lu

Text-driven 3D style transfer aims at stylizing a scene according to the text and generating arbitrary novel views with consistency. Simply combining image/video style transfer methods and novel view synthesis methods results in flickering when changing viewpoints, while existing 3D style transfer methods learn styles from images instead of texts. To address this problem, we for the first time design an efficient text-driven model for 3D style transfer, named TeSTNeRF, to stylize the scene using texts via cross-modal learning: we leverage an advanced text encoder to embed the texts in order to control 3D style transfer and align the input text and output stylized images in latent space. Furthermore, to obtain better visual results, we introduce style supervision, learning feature statistics from style images and utilizing 2D stylization results to rectify abrupt color spill. Extensive experiments demonstrate that TeSTNeRF significantly outperforms existing methods and provides a new way to guide 3D style transfer.


#4 Graph-based Polyphonic Multitrack Music Generation [PDF] [Copy] [Kimi] [REL]

Authors: Emanuele Cosenza, Andrea Valenti, Davide Bacciu

Graphs can be leveraged to model polyphonic multitrack symbolic music, where notes, chords and entire sections may be linked at different levels of the musical hierarchy by tonal and rhythmic relationships. Nonetheless, there is a lack of works that consider graph representations in the context of deep learning systems for music generation. This paper bridges this gap by introducing a novel graph representation for music and a deep Variational Autoencoder that generates the structure and the content of musical graphs separately, one after the other, with a hierarchical architecture that matches the structural priors of music. By separating the structure and content of musical graphs, it is possible to condition generation by specifying which instruments are played at certain times. This opens the door to a new form of human-computer interaction in the context of music co-creation. After training the model on existing MIDI datasets, the experiments show that the model is able to generate appealing short and long musical sequences and to realistically interpolate between them, producing music that is tonally and rhythmically consistent. Finally, the visualization of the embeddings shows that the model is able to organize its latent space in accordance with known musical concepts.


#5 Towards Symbiotic Creativity: A Methodological Approach to Compare Human and AI Robotic Dance Creations [PDF] [Copy] [Kimi] [REL]

Authors: Allegra De Filippo, Luca Giuliani, Eleonora Mancini, Andrea Borghesi, Paola Mello, Michela Milano

Artificial Intelligence (AI) has gradually attracted attention in the field of artistic creation, resulting in a debate on the evaluation of AI artistic outputs. However, there is a lack of common criteria for objective artistic evaluation both of human and AI creations. This is a frequent issue in the field of dance, where different performance metrics focus either on evaluating human or computational skills separately. This work proposes a methodological approach for the artistic evaluation of both AI and human artistic creations in the field of robotic dance. First, we define a series of common initial constraints to create robotic dance choreographies in a balanced initial setting, in collaboration with a group of human dancers and choreographer. Then, we compare both creation processes through a human audience evaluation. Finally, we investigate which choreography aspects (e.g., the music genre) have the largest impact on the evaluation, and we provide useful guidelines and future research directions for the analysis of interconnections between AI and human dance creation.


#6 Automating Rigid Origami Design [PDF] [Copy] [Kimi] [REL]

Authors: Jeremia Geiger, Karolis Martinkus, Oliver Richter, Roger Wattenhofer

Rigid origami has shown potential in large diversity of practical applications. However, current rigid origami crease pattern design mostly relies on known tessellations. This strongly limits the diversity and novelty of patterns that can be created. In this work, we build upon the recently developed principle of three units method to formulate rigid origami design as a discrete optimization problem, the rigid origami game. Our implementation allows for a simple definition of diverse objectives and thereby expands the potential of rigid origami further to optimized, application-specific crease patterns. We showcase the flexibility of our formulation through use of a diverse set of search methods in several illustrative case studies. We are not only able to construct various patterns that approximate given target shapes, but to also specify abstract, function-based rewards which result in novel, foldable and functional designs for everyday objects.


#7 Collaborative Neural Rendering Using Anime Character Sheets [PDF] [Copy] [Kimi1] [REL]

Authors: Zuzeng Lin, Ailin Huang, Zhewei Huang

Drawing images of characters with desired poses is an essential but laborious task in anime production. Assisting artists to create is a research hotspot in recent years. In this paper, we present the Collaborative Neural Rendering (CoNR) method, which creates new images for specified poses from a few reference images (AKA Character Sheets). In general, the diverse hairstyles and garments of anime characters defies the employment of universal body models like SMPL, which fits in most nude human shapes. To overcome this, CoNR uses a compact and easy-to-obtain landmark encoding to avoid creating a unified UV mapping in the pipeline. In addition, the performance of CoNR can be significantly improved when referring to multiple reference images, thanks to feature space cross-view warping in a carefully designed neural network. Moreover, we have collected a character sheet dataset containing over 700,000 hand-drawn and synthesized images of diverse poses to facilitate research in this area. The code and dataset is available at https://github.com/megvii-research/IJCAI2023-CoNR.


#8 IberianVoxel: Automatic Completion of Iberian Ceramics for Cultural Heritage Studies [PDF] [Copy] [Kimi] [REL]

Authors: Pablo Navarro, Celia Cintas, Manuel Lucena, José Manuel Fuertes, Antonio Rueda, Rafael Segura, Carlos Ogayar-Anguita, Rolando González-José, Claudio Delrieux

Accurate completion of archaeological artifacts is a critical aspect in several archaeological studies, including documentation of variations in style, inference of chronological and ethnic groups, and trading routes trends, among many others. However, most available pottery is fragmented, leading to missing textural and morphological cues. Currently, the reassembly and completion of fragmented ceramics is a daunting and time-consuming task, done almost exclusively by hand, which requires the physical manipulation of the fragments. To overcome the challenges of manual reconstruction, reduce the materials' exposure and deterioration, and improve the quality of reconstructed samples, we present IberianVoxel, a novel 3D Autoencoder Generative Adversarial Network (3D AE-GAN) framework tested on an extensive database with complete and fragmented references. We generated a collection of 1001 3D voxelized samples and their fragmented references from Iberian wheel-made pottery profiles. The fragments generated are stratified into different size groups and across multiple pottery classes. Lastly, we provide quantitative and qualitative assessments to measure the quality of the reconstructed voxelized samples by our proposed method and archaeologists' evaluation.


#9 Discrete Diffusion Probabilistic Models for Symbolic Music Generation [PDF] [Copy] [Kimi] [REL]

Authors: Matthias Plasser, Silvan Peter, Gerhard Widmer

Denoising Diffusion Probabilistic Models (DDPMs) have made great strides in generating high-quality samples in both discrete and continuous domains. However, Discrete DDPMs (D3PMs) have yet to be applied to the domain of Symbolic Music. This work presents the direct generation of Polyphonic Symbolic Music using D3PMs. Our model exhibits state-of-the-art sample quality, according to current quantitative evaluation metrics, and allows for flexible infilling at the note level. We further show, that our models are accessible to post-hoc classifier guidance, widening the scope of possible applications. However, we also cast a critical view on quantitative evaluation of music sample quality via statistical metrics, and present a simple algorithm that can confound our metrics with completely spurious, non-musical samples.


#10 Learn and Sample Together: Collaborative Generation for Graphic Design Layout [PDF] [Copy] [Kimi1] [REL]

Authors: Haohan Weng, Danqing Huang, Tong Zhang, Chin-Yew Lin

In the process of graphic layout generation, user specifications including element attributes and their relationships are commonly used to constrain the layouts (e.g.,"put the image above the button''). It is natural to encode spatial constraints between elements using a graph. This paper presents a two-stage generation framework: a spatial graph generator and a subsequent layout decoder which is conditioned on the previous output graph. Training the two highly dependent networks separately as in previous work, we observe that the graph generator generates out-of-distribution graphs with a high frequency, which are unseen to the layout decoder during training and thus leads to huge performance drop in inference. To coordinate the two networks more effectively, we propose a novel collaborative generation strategy to perform round-way knowledge transfer between the networks in both training and inference. Experiment results on three public datasets show that our model greatly benefits from the collaborative generation and has achieved the state-of-the-art performance. Furthermore, we conduct an in-depth analysis to better understand the effectiveness of graph condition modeling.


#11 DiffuseStyleGesture: Stylized Audio-Driven Co-Speech Gesture Generation with Diffusion Models [PDF] [Copy] [Kimi] [REL]

Authors: Sicheng Yang, Zhiyong Wu, Minglei Li, Zhensong Zhang, Lei Hao, Weihong Bao, Ming Cheng, Long Xiao

The art of communication beyond speech there are gestures. The automatic co-speech gesture generation draws much attention in computer animation. It is a challenging task due to the diversity of gestures and the difficulty of matching the rhythm and semantics of the gesture to the corresponding speech. To address these problems, we present DiffuseStyleGesture, a diffusion model based speech-driven gesture generation approach. It generates high-quality, speech-matched, stylized, and diverse co-speech gestures based on given speeches of arbitrary length. Specifically, we introduce cross-local attention and self-attention to the gesture diffusion pipeline to generate better speech matched and realistic gestures. We then train our model with classifier-free guidance to control the gesture style by interpolation or extrapolation. Additionally, we improve the diversity of generated gestures with different initial gestures and noise. Extensive experiments show that our method outperforms recent approaches on speech-driven gesture generation. Our code, pre-trained models, and demos are available at https://github.com/YoungSeng/DiffuseStyleGesture.


#12 NAS-FM: Neural Architecture Search for Tunable and Interpretable Sound Synthesis Based on Frequency Modulation [PDF] [Copy] [Kimi] [REL]

Authors: Zhen Ye, Wei Xue, Xu Tan, Qifeng Liu, Yike Guo

Developing digital sound synthesizers is crucial to the music industry as it provides a low-cost way to produce high-quality sounds with rich timbres. Existing traditional synthesizers often require substantial expertise to determine the overall framework of a synthesizer and the parameters of submodules. Since expert knowledge is hard to acquire, it hinders the flexibility to quickly design and tune digital synthesizers for diverse sounds. In this paper, we propose ``NAS-FM'', which adopts neural architecture search (NAS) to build a differentiable frequency modulation (FM) synthesizer. Tunable synthesizers with interpretable controls can be developed automatically from sounds without any prior expert knowledge and manual operating costs. In detail, we train a supernet with a specifically designed search space, including predicting the envelopes of carriers and modulators with different frequency ratios. An evolutionary search algorithm with adaptive oscillator size is then developed to find the optimal relationship between oscillators and the frequency ratio of FM. Extensive experiments on recordings of different instrument sounds show that our algorithm can build a synthesizer fully automatically, achieving better results than handcrafted synthesizers. Audio samples are available at https://nas-fm.github.io/


#13 Q&A: Query-Based Representation Learning for Multi-Track Symbolic Music re-Arrangement [PDF] [Copy] [Kimi] [REL]

Authors: Jingwei Zhao, Gus Xia, Ye Wang

Music rearrangement is a common music practice of reconstructing and reconceptualizing a piece using new composition or instrumentation styles, which is also an important task of automatic music generation. Existing studies typically model the mapping from a source piece to a target piece via supervised learning. In this paper, we tackle rearrangement problems via self-supervised learning, in which the mapping styles can be regarded as conditions and controlled in a flexible way. Specifically, we are inspired by the representation disentanglement idea and propose Q&A, a query-based algorithm for multi-track music rearrangement under an encoder-decoder framework. Q&A learns both a content representation from the mixture and function (style) representations from each individual track, while the latter queries the former in order to rearrange a new piece. Our current model focuses on popular music and provides a controllable pathway to four scenarios: 1) re-instrumentation, 2) piano cover generation, 3) orchestration, and 4) voice separation. Experiments show that our query system achieves high-quality rearrangement results with delicate multi-track structures, significantly outperforming the baselines.


#14 Fairness and Representation in Satellite-Based Poverty Maps: Evidence of Urban-Rural Disparities and Their Impacts on Downstream Policy [PDF] [Copy] [Kimi] [REL]

Authors: Emily Aiken, Esther Rolf, Joshua Blumenstock

Poverty maps derived from satellite imagery are increasingly used to inform high-stakes policy decisions, such as the allocation of humanitarian aid and the distribution of government resources. Such poverty maps are typically constructed by training machine learning algorithms on a relatively modest amount of ``ground truth" data from surveys, and then predicting poverty levels in areas where imagery exists but surveys do not. Using survey and satellite data from ten countries, this paper investigates disparities in representation, systematic biases in prediction errors, and fairness concerns in satellite-based poverty mapping across urban and rural lines, and shows how these phenomena affect the validity of policies based on predicted maps. Our findings highlight the importance of careful error and bias analysis before using satellite-based poverty maps in real-world policy decisions.


#15 Forecasting Soil Moisture Using Domain Inspired Temporal Graph Convolution Neural Networks To Guide Sustainable Crop Management [PDF] [Copy] [Kimi] [REL]

Authors: Muneeza Azmat, Malvern Madondo, Arun Bawa, Kelsey Dipietro, Raya Horesh, Michael Jacobs, Raghavan Srinivasan, Fearghal O'Donncha

Agriculture faces unprecedented challenges due to climate change, population growth, and water scarcity. These challenges highlight the need for efficient resource usage to optimize crop production. Conventional techniques for forecasting hydrological response features, such as soil moisture, rely on physics-based and empirical hydrological models, which necessitate significant time and domain expertise. Drawing inspiration from traditional hydrological modeling, a novel temporal graph convolution neural network has been constructed. This involves grouping units based on their time-varying hydrological properties, constructing graph topologies for each cluster based on similarity using dynamic time warping, and utilizing graph convolutions and a gated recurrent neural network to forecast soil moisture. The method has been trained, validated, and tested on field-scale time series data spanning 40 years in northeastern United States. Results show that using domain-inspired clustering with time series graph neural networks is more effective in forecasting soil moisture than existing models. This framework is being deployed as part of a pro bono social impact program that leverages hybrid cloud and AI technologies to enhance and scale non-profit and government organizations. The trained models are currently being deployed on a series of small-holding farms in central Texas.


#16 Toward Job Recommendation for All [PDF] [Copy] [Kimi] [REL]

Authors: Guillaume Bied, Solal Nathan, Elia Perennes, Morgane Hoffmann, Philippe Caillou, Bruno Crépon, Christophe Gaillac, Michèle Sebag

This paper presents a job recommendation algorithm designed and validated in the context of the French Public Employment Service. The challenges, owing to the confidential data policy, are related with the extreme sparsity of the interaction matrix and the mandatory scalability of the algorithm, aimed to deliver recommendations to millions of job seekers in quasi real-time, considering hundreds of thousands of job ads. The experimental validation of the approach shows similar or better performances than the state of the art in terms of recall, with a gain in inference time of 2 orders of magnitude. The study includes some fairness analysis of the recommendation algorithm. The gender-related gap is shown to be statistically similar in the true data and in the counter-factual data built from the recommendations.


#17 Fast and Differentially Private Fair Clustering [PDF] [Copy] [Kimi] [REL]

Authors: Junyoung Byun, Jaewook Lee

This study presents the first differentially private and fair clustering method, built on the recently proposed density-based fair clustering approach. The method addresses the limitations of fair clustering algorithms that necessitate the use of sensitive personal information during training or inference phases. Two novel solutions, the Gaussian mixture density function and Voronoi cell, are proposed to enhance the method's performance in terms of privacy, fairness, and utility compared to previous methods. The experimental results on both synthetic and real-world data confirm the compatibility of the proposed method with differential privacy, achieving a better fairness-utility trade-off than existing methods when privacy is not considered. Moreover, the proposed method requires significantly less computation time, being at least 3.7 times faster than the state-of-the-art.


#18 Supporting Sustainable Agroecological Initiatives for Small Farmers through Constraint Programming [PDF] [Copy] [Kimi] [REL]

Authors: Margot Challand, Philippe Vismara, Dimitri Justeau-Allaire, Stéphane de Tourdonnet

Meeting the UN's objective of developing sustainable agriculture requires, in particular, accompanying small farms in their agroecological transition. This transition often requires making the agrosystem more complex and increasing the number of crops to increase biodiversity and ecosystem services. This paper introduces a flexible model based on Constraint Programming (CP) to address the crop allocation problem. This problem takes a cropping calendar as input and aims at allocating crops to respect several constraints. We have shown that it is possible to model both agroecological and operational constraints at the level of a small farm. Experiments on an organic micro-farm have shown that it is possible to combine these constraints to design very different cropping scenarios and that our approach can apply to real situations. Our promising results in this case study also demonstrate the potential of AI-based tools to address small farmers' challenges in the context of the sustainable agriculture transition.


#19 Towards Gender Fairness for Mental Health Prediction [PDF] [Copy] [Kimi] [REL]

Authors: Jiaee Cheong, Selim Kuzucu, Sinan Kalkan, Hatice Gunes

Mental health is becoming an increasingly prominent health challenge. Despite a plethora of studies analysing and mitigating bias for a variety of tasks such as face recognition and credit scoring, research on machine learning (ML) fairness for mental health has been sparse to date. In this work, we focus on gender bias in mental health and make the following contributions. First, we examine whether bias exists in existing mental health datasets and algorithms. Our experiments were conducted using Depresjon, Psykose and D-Vlog. We identify that both data and algorithmic bias exist. Second, we analyse strategies that can be deployed at the pre-processing, in-processing and post-processing stages to mitigate for bias and evaluate their effectiveness. Third, we investigate factors that impact the efficacy of existing bias mitigation strategies and outline recommendations to achieve greater gender fairness for mental health. Upon obtaining counter-intuitive results on D-Vlog dataset, we undertake further experiments and analyses, and provide practical suggestions to avoid hampering bias mitigation efforts in ML for mental health.


#20 Addressing Weak Decision Boundaries in Image Classification by Leveraging Web Search and Generative Models [PDF] [Copy] [Kimi] [REL]

Authors: Preetam Prabhu Srikar Dammu, Yunhe Feng, Chirag Shah

Machine learning (ML) technologies are known to be riddled with ethical and operational problems, however, we are witnessing an increasing thrust by businesses to deploy them in sensitive applications. One major issue among many is that ML models do not perform equally well for underrepresented groups. This puts vulnerable populations in an even disadvantaged and unfavorable position. We propose an approach that leverages the power of web search and generative models to alleviate some of the shortcomings of discriminative models. We demonstrate our method on an image classification problem using ImageNet's People Subtree subset, and show that it is effective in enhancing robustness and mitigating bias in certain classes that represent vulnerable populations (e.g., female doctor of color). Our new method is able to (1) identify weak decision boundaries for such classes; (2) construct search queries for Google as well as text for generating images through DALL-E 2 and Stable Diffusion; and (3) show how these newly captured training samples could alleviate population bias issue. While still improving the model's overall performance considerably, we achieve a significant reduction (77.30%) in the model's gender accuracy disparity. In addition to these improvements, we observed a notable enhancement in the classifier's decision boundary, as it is characterized by fewer weakspots and an increased separation between classes. Although we showcase our method on vulnerable populations in this study, the proposed technique is extendable to a wide range of problems and domains.


#21 Limited Resource Allocation in a Non-Markovian World: The Case of Maternal and Child Healthcare [PDF1] [Copy] [Kimi] [REL]

Authors: Panayiotis Danassis, Shresth Verma, Jackson A. Killian, Aparna Taneja, Milind Tambe

The success of many healthcare programs depends on participants' adherence. We consider the problem of scheduling interventions in low resource settings (e.g., placing timely support calls from health workers) to increase adherence and/or engagement. Past works have successfully developed several classes of Restless Multi-armed Bandit (RMAB) based solutions for this problem. Nevertheless, all past RMAB approaches assume that the participants' behaviour follows the Markov property. We demonstrate significant deviations from the Markov assumption on real-world data on a maternal health awareness program from our partner NGO, ARMMAN. Moreover, we extend RMABs to continuous state spaces, a previously understudied area. To tackle the generalised non-Markovian RMAB setting we (i) model each participant's trajectory as a time-series, (ii) leverage the power of time-series forecasting models to learn complex patterns and dynamics to predict future states, and (iii) propose the Time-series Arm Ranking Index (TARI) policy, a novel algorithm that selects the RMAB arms that will benefit the most from an intervention, given our future state predictions. We evaluate our approach on both synthetic data, and a secondary analysis on real data from ARMMAN, and demonstrate significant increase in engagement compared to the SOTA, deployed Whittle index solution. This translates to 16.3 hours of additional content listened, 90.8% more engagement drops prevented, and reaching more than twice as many high dropout-risk beneficiaries.


#22 Disentangling Societal Inequality from Model Biases: Gender Inequality in Divorce Court Proceedings [PDF] [Copy] [Kimi] [REL]

Authors: Sujan Dutta, Parth Srivastava, Vaishnavi Solunke, Swaprava Nath, Ashiqur R. KhudaBukhsh

Divorce is the legal dissolution of a marriage by a court. Since this is usually an unpleasant outcome of a marital union, each party may have reasons to call the decision to quit which is generally documented in detail in the court proceedings. Via a substantial corpus of 17,306 court proceedings, this paper investigates gender inequality through the lens of divorce court proceedings. To our knowledge, this is the first-ever large-scale computational analysis of gender inequality in Indian divorce, a taboo-topic for ages. While emerging data sources (e.g., public court records made available on the web) on sensitive societal issues hold promise in aiding social science research, biases present in cutting-edge natural language processing (NLP) methods may interfere with or affect such studies. A thorough analysis of potential gaps and limitations present in extant NLP resources is thus of paramount importance. In this paper, on the methodological side, we demonstrate that existing NLP resources required several non-trivial modifications to quantify societal inequalities. On the substantive side, we find that while a large number of court cases perhaps suggest changing norms in India where women are increasingly challenging patriarchy, AI-powered analyses of these court proceedings indicate striking gender inequality with women often subjected to domestic violence.


#23 Sign Language-to-Text Dictionary with Lightweight Transformer Models [PDF] [Copy] [Kimi] [REL]

Authors: Jérôme Fink, Pierre Poitier, Maxime André, Loup Meurice, Benoît Frénay, Anthony Cleve, Bruno Dumas, Laurence Meurant

The recent advances in deep learning have been beneficial to automatic sign language recognition (SLR). However, free-to-access, usable, and accessible tools are still not widely available to the deaf community. The need for a sign language-to-text dictionary was raised by a bilingual deaf school in Belgium and linguist experts in sign languages (SL) in order to improve the autonomy of students. To meet that need, an efficient SLR system was built based on a specific transformer model. The proposed system is able to recognize 700 different signs, with a top-10 accuracy of 83%. Those results are competitive with other systems in the literature while using 10 times less parameters than existing solutions. The integration of this model into a usable and accessible web application for the dictionary is also introduced. A user-centered human-computer interaction (HCI) methodology was followed to design and implement the user interface. To the best of our knowledge, this is the first publicly released sign language-to-text dictionary using video captured by a standard camera.


#24 Find Rhinos without Finding Rhinos: Active Learning with Multimodal Imagery of South African Rhino Habitats [PDF] [Copy] [Kimi] [REL]

Authors: Lucia Gordon, Nikhil Behari, Samuel Collier, Elizabeth Bondi-Kelly, Jackson A. Killian, Catherine Ressijac, Peter Boucher, Andrew Davies, Milind Tambe

Much of Earth's charismatic megafauna is endangered by human activities, particularly the rhino, which is at risk of extinction due to the poaching crisis in Africa. Monitoring rhinos' movement is crucial to their protection but has unfortunately proven difficult because rhinos are elusive. Therefore, instead of tracking rhinos, we propose the novel approach of mapping communal defecation sites, called middens, which give information about rhinos' spatial behavior valuable to anti-poaching, management, and reintroduction efforts. This paper provides the first-ever mapping of rhino midden locations by building classifiers to detect them using remotely sensed thermal, RGB, and LiDAR imagery in passive and active learning settings. As existing active learning methods perform poorly due to the extreme class imbalance in our dataset, we design MultimodAL, an active learning system employing a ranking technique and multimodality to achieve competitive performance with passive learning models with 94% fewer labels. Our methods could therefore save over 76 hours in labeling time when used on a similarly-sized dataset. Unexpectedly, our midden map reveals that rhino middens are not randomly distributed throughout the landscape; rather, they are clustered. Consequently, rangers should be targeted at areas with high midden densities to strengthen anti-poaching efforts, in line with UN Target 15.7.


#25 CGS: Coupled Growth and Survival Model with Cohort Fairness [PDF] [Copy] [Kimi] [REL]

Authors: Erhu He, Yue Wan, Benjamin H. Letcher, Jennifer H. Fair, Yiqun Xie, Xiaowei Jia

Fish modeling in complex environments is critical for understanding drivers of population dynamics in aquatic systems. This paper proposes a Bayesian network method for modeling fish survival and growth over multiple connected rivers. Traditional fish survival models capture the effect of multiple environmental drivers (e.g., stream temperature, stream flow) by adding different variables, which increases model complexity and results in very long and impractical run times (i.e., weeks). We propose a coupled survival-growth model that leverages the observations from both sources simultaneously. It also integrates the Bayesian process into the neural network model to efficiently capture complex variable relationships in the system while also conforming to known survival processes used in existing fish models. To further reduce the performance disparity of fish body length across cohorts, we propose two approaches for enforcing fairness by the adjustment of training priorities and data augmentation. The results based on a real-world fish dataset collected in Massachusetts, US demonstrate that the proposed method can greatly improve prediction accuracy in modeling survival and body length compared to independent models on survival and growth, and effectively reduce the performance disparity across cohorts. The fish growth and movement patterns discovered by the proposed model are also consistent with prior studies in the same region, while vastly reducing run times and memory requirements.