CLEVR’s power lies not only in its scenes or questions, but in its semantic functional language. This symbolic language encodes each question as an executable logic program, enabling a precise and controllable parallel to natural language.


A compositional grammar with formal semantics

Every CLEVR question is generated from a sequence of functional operators acting over a structured scene graph. These operators behave like pure logic functions: they receive inputs (filters, objects, relations) and return new sets, attributes, or truth values.

For example, filter_color(red, scene()) returns all red objects. When combined with count(), it produces a question like: “How many red objects are there?”

Structural parallel to natural language

CLEVR’s language was designed to map directly and transparently to English syntax. Consider the following program:

Functional program:
equal_size(filter_shape(cube), filter_shape(sphere))

Generated question:
“Is the cube the same size as the sphere?”

This mapping enables each sentence to have a precise, symbolic trace. Every word and phrase corresponds to a logic step, and errors can be analyzed with fine semantic granularity.

Core CLEVR operators
  • filter_color, filter_shape, filter_size: filter by attributes
  • relate: spatial relations (left, right, front, behind)
  • query_*: extract properties (color, size, shape, material)
  • count: count objects after filtering
  • equal_*, compare_integer: compare quantities or attributes
Cognitive benefits

This symbolic approach eliminates ambiguity and makes semantic complexity explicit. It supports exact supervision, structured evaluation, and controlled linguistic variation for generalization testing.

Accepted values for CLEVR operators

The following table summarizes valid inputs for CLEVR’s most frequent operators. These define the full semantic universe of the synthetic scenes.

Operator Accepted values Description
filter_color red, blue, green, gray, brown, purple, cyan, yellow Filters objects by color. Eight possible values.
filter_shape cube, sphere, cylinder Filters by geometric shape.
filter_size small, large Filters by relative object scale.
filter_material rubber, metal Differentiates soft vs. hard materials.
relate left, right, front, behind Selects objects in spatial relation to a reference.
query_color, query_size, etc. Inherits from attribute domain Returns a property (e.g., color) of a single filtered object.
equal_* , compare_integer Binary comparison Returns true/false based on equality or magnitude comparison.
count Returns integer Counts the number of objects matching a filter chain.

CLEVR’s functional language acts as an intermediate formalism: structured like code, expressive like language.