How humans perceive and understand real-world scenes is a long-standing question in neuroscience, cognitive psychology, and artificial intelligence. Initially, it was thought that scenes are constructed and represented by their component objects. An …