Poster at VSS 2022

VSS is the annual conference of the Vision Sciences Society held in St. Pete Beach, Florida USA. This poster is an opportunity to present my personal database, the associated deep learning model and preliminary experimental results on compositional similarity perception.

Abstract

Pictorial composition (the structural organization of graphical elements) is typically characterized by qualitative rules and heuristics; although informative, these tools do not support quantitative measures of global similarity/interaction of its constituent elements. The sequential non-stationary nature of the compositional process, together with the complex and evolving definition of its underlying functional units, coalesce into a perceptual phenomenon that cannot be readily modeled through pixel-based approaches such as CNNs.

We adopt a different strategy, constructed around a parametric definition of stroke execution and two hierarchically nested RNN-VAEs, enabling our network to tackle art material by aligning its behavior to the artistic gesture. More specifically, this network architecture extracts compositional regularities by compressing inputs to a reduced number of independent dimensions; within this framework, visual stimuli project to a continuous space that permits quantitative investigation of relevant perceptual mechanisms. Our neural network is trained on >5k small abstract vectorial compositions created by the first author over years of compositional efforts. Although this dataset is large for a single artist, its scale remains relatively small for training large networks. We address this issue by introducing constraints that support a compact representation that is both cohesive and expressive at the same time.

We then study the smoothness and continuity of the resulting latent space by measuring the perceptual scale of sample similarities generated by human participants. To avoid the curse of dimensionality, we restricted exploration to circular slices of a hypersphere by extending MLDS methods to cyclic `physical' spaces. To leverage the number of trials per participants, we demonstrate that adequate empirical exploration can be restricted to local pairs of normal triplet comparisons. Our approach serves to validate a novel modeling framework for pictorial composition, alongside psychophysical tools for measuring the quality of its representation and associated metrics.

Resampled Animations

Circle a

Circle b

Circle c

Norm a

Norm b

Norm c