Yuki M. Asano | VIS Lab

Latest

Scaling Backwards: Minimal Synthetic Pretraining?
GeneralAD: Anomaly Detection Across Domains by Attending to Distorted Features
Object-Centric Diffusion for Efficient Video Editing
SIGMA: Sinkhorn-Guided Masked Video Modeling
Learning to Count without Annotations
Is ImageNet worth 1 video? Learning strong image encoders from 1 long unlabelled video
VeRA: Vector-based Random Matrix Adaptation
Self-Ordering Point Clouds
Time Does Tell: Self-Supervised Time-Tuning of Dense Image Representations
Self-Guided Diffusion Models
BISCUIT: Causal Representation Learning from Binary Interactions
The Augmented Image Prior: Distilling 1000 Classes by Extrapolating from Single Image
Less than Few: Self-Shot Video Instance Segmentation
VTC: Improving Video-Text Retrieval with User Comments
CITRIS - Causal Identifiability from Temporal Intervened Sequences
Self-Supervised Learning of Object Parts for Semantic Segmentation
Self-supervised object detection from audio-visual correspondence
Measuring the Interpretability of Unsupervised Representations via Quantized Reversed Probing
Bias Out-of-the-Box: An Empirical Analysis of Intersectional Occupational Biases in Popular Generative Language Models
Keeping Your Eye On the Ball: Trajectory Attention in Video Transformers
PASS: An ImageNet replacement for self-supervised pretraining without humans
Space-Time Crop & Attend: Improving Cross-modal Video Representation Learning
On Compositions of Transformations in Contrastive Self-Supervised Learning
Emergent inequality and business cycles in a simple behavioral macroeconomic model
Support-set bottlenecks for video-text representation learning