Graphics

2024-10-22 | | Total: 10

#1 TexPro: Text-guided PBR Texturing with Procedural Material Modeling [PDF] [Copy] [Kimi] [REL]

Authors: Ziqiang Dang ; Wenqi Dong ; Zesong Yang ; Bangbang Yang ; Liang Li ; Yuewen Ma ; Zhaopeng Cui

In this paper, we present TexPro, a novel method for high-fidelity material generation for input 3D meshes given text prompts. Unlike existing text-conditioned texture generation methods that typically generate RGB textures with baked lighting, TexPro is able to produce diverse texture maps via procedural material modeling, which enables physical-based rendering, relighting, and additional benefits inherent to procedural materials. Specifically, we first generate multi-view reference images given the input textual prompt by employing the latest text-to-image model. We then derive texture maps through a rendering-based optimization with recent differentiable procedural materials. To this end, we design several techniques to handle the misalignment between the generated multi-view images and 3D meshes, and introduce a novel material agent that enhances material classification and matching by exploring both part-level understanding and object-aware material reasoning. Experiments demonstrate the superiority of the proposed method over existing SOTAs and its capability of relighting.

Subjects: Graphics ; Computer Vision and Pattern Recognition

Publish: 2024-10-21 11:10:07 UTC

#2 Agent-to-Sim: Learning Interactive Behavior Models from Casual Longitudinal Videos [PDF3] [Copy] [Kimi] [REL]

Authors: Gengshan Yang ; Andrea Bajcsy ; Shunsuke Saito ; Angjoo Kanazawa

We present Agent-to-Sim (ATS), a framework for learning interactive behavior models of 3D agents from casual longitudinal video collections. Different from prior works that rely on marker-based tracking and multiview cameras, ATS learns natural behaviors of animal and human agents non-invasively through video observations recorded over a long time-span (e.g., a month) in a single environment. Modeling 3D behavior of an agent requires persistent 3D tracking (e.g., knowing which point corresponds to which) over a long time period. To obtain such data, we develop a coarse-to-fine registration method that tracks the agent and the camera over time through a canonical 3D space, resulting in a complete and persistent spacetime 4D representation. We then train a generative model of agent behaviors using paired data of perception and motion of an agent queried from the 4D reconstruction. ATS enables real-to-sim transfer from video recordings of an agent to an interactive behavior simulator. We demonstrate results on pets (e.g., cat, dog, bunny) and human given monocular RGBD videos captured by a smartphone.

Subjects: Computer Vision and Pattern Recognition ; Graphics ; Robotics

Publish: 2024-10-21 17:57:50 UTC

#3 Learning to Synthesize Graphics Programs for Geometric Artworks [PDF] [Copy] [Kimi] [REL]

Authors: Qi Bing ; Chaoyi Zhang ; Weidong Cai

Creating and understanding art has long been a hallmark of human ability. When presented with finished digital artwork, professional graphic artists can intuitively deconstruct and replicate it using various drawing tools, such as the line tool, paint bucket, and layer features, including opacity and blending modes. While most recent research in this field has focused on art generation, proposing a range of methods, these often rely on the concept of artwork being represented as a final image. To bridge the gap between pixel-level results and the actual drawing process, we present an approach that treats a set of drawing tools as executable programs. This method predicts a sequence of steps to achieve the final image, allowing for understandable and resolution-independent reproductions under the usage of a set of drawing commands. Our experiments demonstrate that our program synthesizer, Art2Prog, can comprehensively understand complex input images and reproduce them using high-quality executable programs. The experimental results evidence the potential of machines to grasp higher-level information from images and generate compact program-level descriptions.

Subjects: Computer Vision and Pattern Recognition ; Artificial Intelligence ; Graphics

Publish: 2024-10-21 08:28:11 UTC

#4 Fully Explicit Dynamic Gaussian Splatting [PDF1] [Copy] [Kimi] [REL]

Authors: Junoh Lee ; Chang-Yeon Won ; Hyunjun Jung ; Inhwan Bae ; Hae-Gon Jeon

3D Gaussian Splatting has shown fast and high-quality rendering results in static scenes by leveraging dense 3D prior and explicit representations. Unfortunately, the benefits of the prior and representation do not involve novel view synthesis for dynamic motions. Ironically, this is because the main barrier is the reliance on them, which requires increasing training and rendering times to account for dynamic motions. In this paper, we design a Explicit 4D Gaussian Splatting(Ex4DGS). Our key idea is to firstly separate static and dynamic Gaussians during training, and to explicitly sample positions and rotations of the dynamic Gaussians at sparse timestamps. The sampled positions and rotations are then interpolated to represent both spatially and temporally continuous motions of objects in dynamic scenes as well as reducing computational cost. Additionally, we introduce a progressive training scheme and a point-backtracking technique that improves Ex4DGS's convergence. We initially train Ex4DGS using short timestamps and progressively extend timestamps, which makes it work well with a few point clouds. The point-backtracking is used to quantify the cumulative error of each Gaussian over time, enabling the detection and removal of erroneous Gaussians in dynamic scenes. Comprehensive experiments on various scenes demonstrate the state-of-the-art rendering quality from our method, achieving fast rendering of 62 fps on a single 2080Ti GPU.

Subjects: Computer Vision and Pattern Recognition ; Graphics

Publish: 2024-10-21 04:25:43 UTC

#5 Deep Learning and Machine Learning -- Object Detection and Semantic Segmentation: From Theory to Applications [PDF1] [Copy] [Kimi] [REL]

Authors: Jintao Ren ; Ziqian Bi ; Qian Niu ; Junyu Liu ; Benji Peng ; Sen Zhang ; Xuanhe Pan ; Jinlang Wang ; Keyu Chen ; Caitlyn Heqi Yin ; Pohsun Feng ; Yizhu Wen ; Tianyang Wang ; Silin Chen ; Ming Li ; Jiawei Xu ; Ming Liu

This book offers an in-depth exploration of object detection and semantic segmentation, combining theoretical foundations with practical applications. It covers state-of-the-art advancements in machine learning and deep learning, with a focus on convolutional neural networks (CNNs), YOLO architectures, and transformer-based approaches like DETR. The book also delves into the integration of artificial intelligence (AI) techniques and large language models for enhanced object detection in complex environments. A thorough discussion of big data analysis is presented, highlighting the importance of data processing, model optimization, and performance evaluation metrics. By bridging the gap between traditional methods and modern deep learning frameworks, this book serves as a comprehensive guide for researchers, data scientists, and engineers aiming to leverage AI-driven methodologies in large-scale object detection tasks.

Subjects: Computer Vision and Pattern Recognition ; Graphics

Publish: 2024-10-21 02:10:49 UTC

#6 CLIPtortionist: Zero-shot Text-driven Deformation for Manufactured 3D Shapes [PDF] [Copy] [Kimi] [REL]

Authors: Xianghao Xu ; Srinath Sridhar ; Daniel Ritchie

We propose a zero-shot text-driven 3D shape deformation system that deforms an input 3D mesh of a manufactured object to fit an input text description. To do this, our system optimizes the parameters of a deformation model to maximize an objective function based on the widely used pre-trained vision language model CLIP. We find that CLIP-based objective functions exhibit many spurious local optima; to circumvent them, we parameterize deformations using a novel deformation model called BoxDefGraph which our system automatically computes from an input mesh, the BoxDefGraph is designed to capture the object aligned rectangular/circular geometry features of most manufactured objects. We then use the CMA-ES global optimization algorithm to maximize our objective, which we find to work better than popular gradient-based optimizers. We demonstrate that our approach produces appealing results and outperforms several baselines.

Subjects: Computer Vision and Pattern Recognition ; Graphics

Publish: 2024-10-19 20:11:11 UTC

#7 The discrete charm of iterated function systems. A computer scientist's perspective on approximation of IFS invariant sets and measures [PDF] [Copy] [Kimi] [REL]

Author: Tomasz Martyn

We study invariant sets and measures generated by iterated function systems defined on countable discrete spaces that are uniform grids of a finite dimension. The discrete spaces of this type can be considered as models of spaces in which actual numerical computation takes place. In this context, we investigate the possibility of the application of the random iteration algorithm to approximate these discrete IFS invariant sets and measures. The problems concerning a discretization of hyperbolic IFSs are considered as special cases of this more general setting.

Subjects: Dynamical Systems ; Discrete Mathematics ; Graphics

Publish: 2024-10-19 15:48:17 UTC

#8 A Cycle Ride to HDR: Semantics Aware Self-Supervised Framework for Unpaired LDR-to-HDR Image Translation [PDF1] [Copy] [Kimi] [REL]

Authors: Hrishav Bakul Barua ; Stefanov Kalin ; Lemuel Lai En Che ; Dhall Abhinav ; Wong KokSheik ; Krishnasamy Ganesh

Low Dynamic Range (LDR) to High Dynamic Range (HDR) image translation is an important computer vision problem. There is a significant amount of research utilizing both conventional non-learning methods and modern data-driven approaches, focusing on using both single-exposed and multi-exposed LDR for HDR image reconstruction. However, most current state-of-the-art methods require high-quality paired {LDR,HDR} datasets for model training. In addition, there is limited literature on using unpaired datasets for this task where the model learns a mapping between domains, i.e., LDR to HDR. To address limitations of current methods, such as the paired data constraint , as well as unwanted blurring and visual artifacts in the reconstructed HDR, we propose a method that uses a modified cycle-consistent adversarial architecture and utilizes unpaired {LDR,HDR} datasets for training. The method introduces novel generators to address visual artifact removal and an encoder and loss to address semantic consistency, another under-explored topic. The method achieves state-of-the-art results across several benchmark datasets and reconstructs high-quality HDR images.

Subjects: Computer Vision and Pattern Recognition ; Artificial Intelligence ; Graphics ; Machine Learning ; Robotics

Publish: 2024-10-19 11:11:58 UTC

#9 SYNOSIS: Image synthesis pipeline for machine vision in metal surface inspection [PDF] [Copy] [Kimi] [REL]

Authors: Juraj Fulir ; Natascha Jeziorski ; Lovro Bosnar ; Hans Hagen ; Claudia Redenbach ; Petra Gospodnetić ; Tobias Herrfurth ; Marcus Trost ; Thomas Gischkat

The use of machine learning (ML) methods for development of robust and flexible visual inspection system has shown promising. However their performance is highly dependent on the amount and diversity of training data. This is often restricted not only due to costs but also due to a wide variety of defects and product surfaces which occur with varying frequency. As such, one can not guarantee that the acquired dataset contains enough defect and product surface occurrences which are needed to develop a robust model. Using parametric synthetic dataset generation, it is possible to avoid these issues. In this work, we introduce a complete pipeline which describes in detail how to approach image synthesis for surface inspection - from first acquisition, to texture and defect modeling, data generation, comparison to real data and finally use of the synthetic data to train a defect segmentation model. The pipeline is in detail evaluated for milled and sandblasted aluminum surfaces. In addition to providing an in-depth view into each step, discussion of chosen methods, and presentation of ML results, we provide a comprehensive dual dataset containing both real and synthetic images.

Subjects: Computer Vision and Pattern Recognition ; Computational Engineering, Finance, and Science ; Graphics

Publish: 2024-10-18 19:46:12 UTC

#10 A Survey on Computational Solutions for Reconstructing Complete Objects by Reassembling Their Fractured Parts [PDF] [Copy] [Kimi] [REL]

Authors: Jiaxin Lu ; Yongqing Liang ; Huijun Han ; Jiacheng Hua ; Junfeng Jiang ; Xin Li ; Qixing Huang

Reconstructing a complete object from its parts is a fundamental problem in many scientific domains. The purpose of this article is to provide a systematic survey on this topic. The reassembly problem requires understanding the attributes of individual pieces and establishing matches between different pieces. Many approaches also model priors of the underlying complete object. Existing approaches are tightly connected problems of shape segmentation, shape matching, and learning shape priors. We provide existing algorithms in this context and emphasize their similarities and differences to general-purpose approaches. We also survey the trends from early non-deep learning approaches to more recent deep learning approaches. In addition to algorithms, this survey will also describe existing datasets, open-source software packages, and applications. To the best of our knowledge, this is the first comprehensive survey on this topic in computer graphics.

Subjects: Computer Vision and Pattern Recognition ; Graphics

Publish: 2024-10-18 17:53:07 UTC