Quantcast
Channel: cs.CV updates on arXiv.org
Browsing latest articles
Browse All 137 View Live

Contrastive Multiple Instance Learning for Weakly Supervised Person ReID

The acquisition of large-scale, precisely labeled datasets for person re-identification (ReID) poses a significant challenge. Weakly supervised ReID has begun to address this issue, although its...

View Article



AYDIV: Adaptable Yielding 3D Object Detection via Integrated Contextual...

Combining LiDAR and camera data has shown potential in enhancing short-distance object detection in autonomous driving systems. Yet, the fusion encounters difficulties with extended distance detection...

View Article

GBOT: Graph-Based 3D Object Tracking for Augmented Reality-Assisted Assembly...

Guidance for assemblable parts is a promising field for augmented reality. Augmented reality assembly guidance requires 6D object poses of target objects in real time. Especially in time-critical...

View Article

A Flow-based Credibility Metric for Safety-critical Pedestrian Detection

Safety is of utmost importance for perception in automated driving (AD). However, a prime safety concern in state-of-the art object detection is that standard evaluation schemes utilize safety-agnostic...

View Article

Collaborative Semantic Occupancy Prediction with Hybrid Feature Fusion in...

Collaborative perception in automated vehicles leverages the exchange of information between agents, aiming to elevate perception results. Previous camera-based collaborative 3D perception methods...

View Article


Complete Instances Mining for Weakly Supervised Instance Segmentation

Weakly supervised instance segmentation (WSIS) using only image-level labels is a challenging task due to the difficulty of aligning coarse annotations with the finer task. However, with the...

View Article

Sheet Music Transformer: End-To-End Optical Music Recognition Beyond...

State-of-the-art end-to-end Optical Music Recognition (OMR) has, to date, primarily been carried out using monophonic transcription techniques to handle complex score layouts, such as polyphony, often...

View Article

Morse sequences

We introduce the notion of a Morse sequence, which provides a simple and effective approach to discrete Morse theory. A Morse sequence is a sequence composed solely of two elementary operations, that...

View Article


TriAug: Out-of-Distribution Detection for Robust Classification of Imbalanced...

Different diseases, such as histological subtypes of breast lesions, have severely varying incidence rates. Even trained with substantial amount of in-distribution (ID) data, models often encounter...

View Article


An Empirical Study Into What Matters for Calibrating Vision-Language Models

Vision--Language Models (VLMs) have emerged as the dominant approach for zero-shot recognition, adept at handling diverse scenarios and significant distribution changes. However, their deployment in...

View Article

A Closer Look at the Robustness of Contrastive Language-Image Pre-Training...

Contrastive Language-Image Pre-training (CLIP) models have demonstrated remarkable generalization capabilities across multiple challenging distribution shifts. However, there is still much to be...

View Article

Make it more specific: A novel uncertainty based airway segmentation...

Each medical segmentation task should be considered with a specific AI algorithm based on its scenario so that the most accurate prediction model can be obtained. The most popular algorithms in medical...

View Article

Exploring Perceptual Limitation of Multimodal Large Language Models

Multimodal Large Language Models (MLLMs) have recently shown remarkable perceptual capability in answering visual questions, however, little is known about the limits of their perception. In...

View Article


Unsupervised Discovery of Object-Centric Neural Fields

We study inferring 3D object-centric scene representations from a single image. While recent methods have shown potential in unsupervised 3D object discovery from simple synthetic images, they fail to...

View Article

Real-World Atmospheric Turbulence Correction via Domain Adaptation

Atmospheric turbulence, a common phenomenon in daily life, is primarily caused by the uneven heating of the Earth's surface. This phenomenon results in distorted and blurred acquired images or videos...

View Article


SelfSwapper: Self-Supervised Face Swapping via Shape Agnostic Masked AutoEncoder

Face swapping has gained significant attention for its varied applications. The majority of previous face swapping approaches have relied on the seesaw game training scheme, which often leads to the...

View Article

Exploring Saliency Bias in Manipulation Detection

The social media-fuelled explosion of fake news and misinformation supported by tampered images has led to growth in the development of models and datasets for image manipulation detection. However,...

View Article


Deep Learning for Medical Image Segmentation with Imprecise Annotation

Medical image segmentation (MIS) plays an instrumental role in medical image analysis, where considerable efforts have been devoted to automating the process. Currently, mainstream MIS approaches are...

View Article

The Bias of Harmful Label Associations in Vision-Language Models

Despite the remarkable performance of foundation vision-language models, the shared representation space for text and vision can also encode harmful label associations detrimental to fairness. While...

View Article

Towards Explainable, Safe Autonomous Driving with Language Embeddings for...

This research explores the integration of language embeddings for active learning in autonomous driving datasets, with a focus on novelty detection. Novelty arises from unexpected scenarios that...

View Article

BioNeRF: Biologically Plausible Neural Radiance Fields for View Synthesis

This paper presents BioNeRF, a biologically plausible architecture that models scenes in a 3D representation and synthesizes new views through radiance fields. Since NeRF relies on the network weights...

View Article


LISR: Learning Linear 3D Implicit Surface Representation Using Compactly...

Implicit 3D surface reconstruction of an object from its partial and noisy 3D point cloud scan is the classical geometry processing and 3D computer vision problem. In the literature, various 3D shape...

View Article


Open-ended VQA benchmarking of Vision-Language models by exploiting...

The evaluation of text-generative vision-language models is a challenging yet crucial endeavor. By addressing the limitations of existing Visual Question Answering (VQA) benchmarks and proposing...

View Article

Trade-off Between Spatial and Angular Resolution in Facial Recognition

Ensuring robustness in face recognition systems across various challenging conditions is crucial for their versatility. State-of-the-art methods often incorporate additional information, such as depth,...

View Article

Data Quality Aware Approaches for Addressing Model Drift of Semantic...

In the midst of the rapid integration of artificial intelligence (AI) into real world applications, one pressing challenge we confront is the phenomenon of model drift, wherein the performance of AI...

View Article


PIVOT-Net: Heterogeneous Point-Voxel-Tree-based Framework for Point Cloud...

The universality of the point cloud format enables many 3D applications, making the compression of point clouds a critical phase in practice. Sampled as discrete 3D points, a point cloud approximates...

View Article

A novel spatial-frequency domain network for zero-shot incremental learning

Zero-shot incremental learning aims to enable the model to generalize to new classes without forgetting previously learned classes. However, the semantic gap between old and new sample classes can lead...

View Article

GALA3D: Towards Text-to-3D Complex Scene Generation via Layout-guided...

We present GALA3D, generative 3D GAussians with LAyout-guided control, for effective compositional text-to-3D generation. We first utilize large language models (LLMs) to generate the initial layout...

View Article

Outlier-Aware Training for Low-Bit Quantization of Structural...

Lightweight design of Convolutional Neural Networks (CNNs) requires co-design efforts in the model architectures and compression techniques. As a novel design paradigm that separates training and...

View Article



3D Gaussian as a New Vision Era: A Survey

3D Gaussian Splatting (3D-GS) has emerged as a significant advancement in the field of Computer Graphics, offering explicit scene representation and novel view synthesis without the reliance on neural...

View Article

INSITE: labelling medical images using submodular functions and...

The necessity of large amounts of labeled data to train deep models, especially in medical imaging creates an implementation bottleneck in resource-constrained settings. In Insite (labelINg medical...

View Article

Two-Stage Multi-task Self-Supervised Learning for Medical Image Segmentation

Medical image segmentation has been significantly advanced by deep learning (DL) techniques, though the data scarcity inherent in medical applications poses a great challenge to DL-based segmentation...

View Article

A Benchmark for Multi-modal Foundation Models on Low-level Vision: from...

The rapid development of Multi-modality Large Language Models (MLLMs) has navigated a paradigm shift in computer vision, moving towards versatile foundational models. However, evaluating MLLMs in...

View Article


A Highlight Removal Method for Capsule Endoscopy Images

The images captured by Wireless Capsule Endoscopy (WCE) always exhibit specular reflections, and removing highlights while preserving the color and texture in the region remains a challenge. To address...

View Article

Domain Adaptable Fine-Tune Distillation Framework For Advancing Farm...

In this study, we propose an automated framework for camel farm monitoring, introducing two key contributions: the Unified Auto-Annotation framework and the Fine-Tune Distillation framework. The...

View Article

A Change Detection Reality Check

In recent years, there has been an explosion of proposed change detection deep learning architectures in the remote sensing literature. These approaches claim to offer state-of the-art performance on...

View Article


Reciprocal Visibility

We propose a guidance strategy to optimize real-time synthetic aperture sampling for occlusion removal with drones by pre-scanned point-cloud data. Depth information can be used to compute visibility...

View Article


OSSAR: Towards Open-Set Surgical Activity Recognition in Robot-assisted Surgery

In the realm of automated robotic surgery and computer-assisted interventions, understanding robotic surgical activities stands paramount. Existing algorithms dedicated to surgical activity recognition...

View Article

Treatment-wise Glioblastoma Survival Inference with Multi-parametric...

In this work, we aim to predict the survival time (ST) of glioblastoma (GBM) patients undergoing different treatments based on preoperative magnetic resonance (MR) scans. The personalized and precise...

View Article

Synthesizing CTA Image Data for Type-B Aortic Dissection using Stable...

Stable Diffusion (SD) has gained a lot of attention in recent years in the field of Generative AI thus helping in synthesizing medical imaging data with distinct features. The aim is to contribute to...

View Article

Semantic Object-level Modeling for Robust Visual Camera Relocalization

Visual relocalization is crucial for autonomous visual localization and navigation of mobile robotics. Due to the improvement of CNN-based object detection algorithm, the robustness of visual...

View Article


Latent Enhancing AutoEncoder for Occluded Image Classification

Large occlusions result in a significant decline in image classification accuracy. During inference, diverse types of unseen occlusions introduce out-of-distribution data to the classification model,...

View Article

Gyroscope-Assisted Motion Deblurring Network

Image research has shown substantial attention in deblurring networks in recent years. Yet, their practical usage in real-world deblurring, especially motion blur, remains limited due to the lack of...

View Article


Neural Rendering based Urban Scene Reconstruction for Autonomous Driving

Dense 3D reconstruction has many applications in automated driving including automated annotation validation, multimodal data augmentation, providing ground truth annotations for systems lacking LiDAR,...

View Article

Domain Adaptation Using Pseudo Labels

In the absence of labeled target data, unsupervised domain adaptation approaches seek to align the marginal distributions of the source and target domains in order to train a classifier for the target....

View Article


Event-to-Video Conversion for Overhead Object Detection

Collecting overhead imagery using an event camera is desirable due to the energy efficiency of the image sensor compared to standard cameras. However, event cameras complicate downstream image...

View Article

Fingerprinting New York City's Scaffolding Problem with Longitudinal Dashcam...

Scaffolds, also called sidewalk sheds, are intended to be temporary structures to protect pedestrians from construction and repair hazards. However, some sidewalk sheds are left up for years. Long-term...

View Article

Is it safe to cross? Interpretable Risk Assessment with GPT-4V for...

Safely navigating street intersections is a complex challenge for blind and low-vision individuals, as it requires a nuanced understanding of the surrounding context - a task heavily reliant on visual...

View Article

Transfer learning with generative models for object detection on limited...

The availability of data is limited in some fields, especially for object detection tasks, where it is necessary to have correctly labeled bounding boxes around each object. A notable example of such...

View Article


Oriented-grid Encoder for 3D Implicit Representations

Encoding 3D points is one of the primary steps in learning-based implicit scene representation. Using features that gather information from neighbors with multi-resolution grids has proven to be the...

View Article

Browsing latest articles
Browse All 137 View Live




Latest Images