Date: Fri, 19 Jul 2024 | Total: 5

#1 Streetscapes: Large-scale Consistent Street View Generation Using Autoregressive Video Diffusion [PDF1] [Copy] [Kimi1]

Authors: Boyang Deng ; Richard Tucker ; Zhengqi Li ; Leonidas Guibas ; Noah Snavely ; Gordon Wetzstein

We present a method for generating Streetscapes-long sequences of views through an on-the-fly synthesized city-scale scene. Our generation is conditioned by language input (e.g., city name, weather), as well as an underlying map/layout hosting the desired trajectory. Compared to recent models for video generation or 3D view synthesis, our method can scale to much longer-range camera trajectories, spanning several city blocks, while maintaining visual quality and consistency. To achieve this goal, we build on recent work on video diffusion, used within an autoregressive framework that can easily scale to long sequences. In particular, we introduce a new temporal imputation method that prevents our autoregressive approach from drifting from the distribution of realistic city imagery. We train our Streetscapes system on a compelling source of data-posed imagery from Google Street View, along with contextual map data-which allows users to generate city views conditioned on any desired city layout, with controllable camera poses. Please see more results at our project page at

Subjects: Computer Vision and Pattern Recognition ; Graphics

Publish: 2024-07-18 17:56:30 UTC

#2 PASTA: Controllable Part-Aware Shape Generation with Autoregressive Transformers [PDF2] [Copy] [Kimi]

Authors: Songlin Li ; Despoina Paschalidou ; Leonidas Guibas

The increased demand for tools that automate the 3D content creation process led to tremendous progress in deep generative models that can generate diverse 3D objects of high fidelity. In this paper, we present PASTA, an autoregressive transformer architecture for generating high quality 3D shapes. PASTA comprises two main components: An autoregressive transformer that generates objects as a sequence of cuboidal primitives and a blending network, implemented with a transformer decoder that composes the sequences of cuboids and synthesizes high quality meshes for each object. Our model is trained in two stages: First we train our autoregressive generative model using only annotated cuboidal parts as supervision and next, we train our blending network using explicit 3D supervision, in the form of watertight meshes. Evaluations on various ShapeNet objects showcase the ability of our model to perform shape generation from diverse inputs \eg from scratch, from a partial object, from text and images, as well size-guided generation, by explicitly conditioning on a bounding box that defines the object's boundaries. Moreover, as our model considers the underlying part-based structure of a 3D object, we are able to select a specific part and produce shapes with meaningful variations of this part. As evidenced by our experiments, our model generates 3D shapes that are both more realistic and diverse than existing part-based and non part-based methods, while at the same time is simpler to implement and train.

Subjects: Computer Vision and Pattern Recognition ; Artificial Intelligence ; Graphics ; Machine Learning

Publish: 2024-07-18 16:52:45 UTC

#3 SurroFlow: A Flow-Based Surrogate Model for Parameter Space Exploration and Uncertainty Quantification [PDF] [Copy] [Kimi]

Authors: Jingyi Shen ; Yuhan Duan ; Han-Wei Shen

Existing deep learning-based surrogate models facilitate efficient data generation, but fall short in uncertainty quantification, efficient parameter space exploration, and reverse prediction. In our work, we introduce SurroFlow, a novel normalizing flow-based surrogate model, to learn the invertible transformation between simulation parameters and simulation outputs. The model not only allows accurate predictions of simulation outcomes for a given simulation parameter but also supports uncertainty quantification in the data generation process. Additionally, it enables efficient simulation parameter recommendation and exploration. We integrate SurroFlow and a genetic algorithm as the backend of a visual interface to support effective user-guided ensemble simulation exploration and visualization. Our framework significantly reduces the computational costs while enhancing the reliability and exploration capabilities of scientific surrogate models.

Subjects: Machine Learning ; Artificial Intelligence ; Computer Vision and Pattern Recognition ; Graphics ; Human-Computer Interaction

Publish: 2024-07-16 19:08:49 UTC

#4 Digital Storytelling for Competence Development in Games [PDF] [Copy] [Kimi]

Authors: Edgar Santos ; Claudia Ribeiro ; Manuel Fradinho ; João Pereira

The acquisition of complex knowledge and competences raises difficult challenges for the supporting tools within the corporate environment, which digital storytelling presents a potential solution. Traditionally, a driving goal of digital storytelling is the generation of dramatic stories with human significance, but for learning purposes, the need for drama is complemented by the requirement of achieving particular learning outcomes. This paper presents a narrative engine that supports emergent storytelling to support the development of complex competences in the learning domains of project management and innovation. The approach is based on the adaptation on the Fabula model combined with cases representing situated contexts associated to particular competences. These cases are then triggered to influence the unfolding of the story such that a learner encounters dramatic points in the narrative where the associated competences need to be used. In addition to the description of the approach and corresponding narrative engine, an illustration is presented of how the competence 'conflict management' influences a story.

Subjects: Human-Computer Interaction ; Graphics

Publish: 2024-06-21 17:06:58 UTC

#5 Cube2Pipes : Investigating Hybrid Gameplay Using AR and a Tangible 3D Puzzle [PDF] [Copy] [Kimi]

Authors: Sukanya Bhattacharjee ; Parag Chaudhuri

We present our game, Cube2Pipes, as an attempt to investigate a unique gameplay design where we use a tangible 3D spatial puzzle, in the form of a 2X2 Rubik's Cube, as an interface to a tabletop mobile augmented reality (AR) game. The game interface adapts to user movement and interaction with both virtual and tangible elements via computer vision based tracking. This game can be seen as an instance of generic interactive hybrid systems as it involves interaction with both virtual and real, tangible elements. We present a thorough user evaluation about various aspects of the gameplay in order to answer the question as to whether hybrid gameplay involving both real and virtual interfaces and elements is more captivating and preferred by users, than standard (baseline) gameplay with only virtual elements. We use multiple industry standard user study questionnaires to try and answer this question. We also try to determine whether the game facilitates understanding of the spatial moves required to solve a Rubik's Cube, and the efficacy of a tangible puzzle interface to a tabletop AR game.

Subjects: Human-Computer Interaction ; Graphics

Publish: 2024-06-15 20:00:56 UTC