Video analysis and understanding

Towards Open-Vocabulary Video Instance Segmentation

Video Instance Segmentation (VIS) aims at segmenting and categorizing objects in videos from a closed set of training categories, lacking the generalization ability to handle novel categories in real-world videos. To address this limitation, we make …

Multi-Task Edge Prediction in Temporally-Dynamic Video Graphs

Graph neural networks have shown to learn effective node representations, enabling node-, link-, and graph-level inference. Conventional graph networks assume static relations between nodes, while relations between entities in a video often evolve …

Teaching a New Dog Old Tricks: Contrastive Random Walks in Videos with Unsupervised Priors

This work proposes codebook encodings for graph networks that operate on hyperbolic manifolds. Where graph networks commonly learn node representations in Euclidean space, recent work has provided a generalization to Riemannian manifolds, with a …

Less than Few: Self-Shot Video Instance Segmentation

The goal of this paper is to bypass the need for labelled examples in few-shot video understanding at run time. While proven effective, in many practical video settings even labelling a few examples appears unrealistic. This is especially true as the …

How Do You Do It? Fine-Grained Action Understanding with Pseudo-Adverbs

We aim to understand how actions are performed and identify subtle differences, such as ‘fold firmly’ vs. ‘fold gently’. To this end, we propose a method which recognizes adverbs across different actions. However, such fine-grained annotations are …

Variational Abnormal Behavior Detection with Motion Consistency

Abnormal crowd behavior detection has recently attracted increasing attention due to its wide applications in computer vision research areas. However, it is still an extremely challenging task due to the great variability of abnormal behavior coupled …

Video analysis and understanding

Towards Open-Vocabulary Video Instance Segmentation

Multi-Task Edge Prediction in Temporally-Dynamic Video Graphs

Teaching a New Dog Old Tricks: Contrastive Random Walks in Videos with Unsupervised Priors

Less than Few: Self-Shot Video Instance Segmentation

How Do You Do It? Fine-Grained Action Understanding with Pseudo-Adverbs

Variational Abnormal Behavior Detection with Motion Consistency

Learning Hierarchical Embedding for Video Instance Segmentation

Motion-Augmented Self-Training for Video Recognition at Smaller Scale

Few-Shot Transformation of Common Actions into Time and Space

On Semantic Similarity in Video Retrieval