Diversely-Supervised Visual Product Search

Abstract

This paper strives for a diversely-supervised visual product search, where queries specify a diverse set of labels to search for. Where previous works have focused on representing attribute, instance or category labels individually, we consider them together to create a diverse set of labels for visually describing products. We learn an embedding from the supervisory signal provided by every label to encode their interrelationships. Once trained, every label has a corresponding visual representation in the embedding space, which is an aggregation of selected items from the training set. At search time, composite query representations retrieve images that match a specific set of diverse labels. We form composite query representations by averaging over the aggregated representations of each diverse label in the specific set. For evaluation, we extend existing product datasets of cars and clothes with a diverse set of labels. Experiments show the benefits of our embedding for diversely-supervised visual product search in seen and unseen product combinations, and for discovering product design styles.

Publication
ACM Transactions on Multimedia Computing, Communications, and Application 2022

Related