CLEVR’s power lies not only in its scenes or questions, but in its semantic functional language. This symbolic language encodes each question as an executable logic program, enabling a precise and controllable parallel to natural language.
A compositional grammar with formal semantics
Every CLEVR question is generated from a sequence of functional operators acting over a structured scene graph. These operators behave like pure logic functions: they receive inputs (filters, objects, relations) and return new sets, attributes, or truth values.
For example, filter_color(red, scene()) returns all red objects. When combined with count(), it produces a question like:
“How many red objects are there?”
Structural parallel to natural language
CLEVR’s language was designed to map directly and transparently to English syntax. Consider the following program:
equal_size(filter_shape(cube), filter_shape(sphere))Generated question:
“Is the cube the same size as the sphere?”
This mapping enables each sentence to have a precise, symbolic trace. Every word and phrase corresponds to a logic step, and errors can be analyzed with fine semantic granularity.
Core CLEVR operators
filter_color,filter_shape,filter_size: filter by attributesrelate: spatial relations (left, right, front, behind)query_*: extract properties (color, size, shape, material)count: count objects after filteringequal_*,compare_integer: compare quantities or attributes
Cognitive benefits
This symbolic approach eliminates ambiguity and makes semantic complexity explicit. It supports exact supervision, structured evaluation, and controlled linguistic variation for generalization testing.
Accepted values for CLEVR operators
The following table summarizes valid inputs for CLEVR’s most frequent operators. These define the full semantic universe of the synthetic scenes.
| Operator | Accepted values | Description |
|---|---|---|
filter_color |
red, blue, green, gray, brown, purple, cyan, yellow | Filters objects by color. Eight possible values. |
filter_shape |
cube, sphere, cylinder | Filters by geometric shape. |
filter_size |
small, large | Filters by relative object scale. |
filter_material |
rubber, metal | Differentiates soft vs. hard materials. |
relate |
left, right, front, behind | Selects objects in spatial relation to a reference. |
query_color, query_size, etc. |
Inherits from attribute domain | Returns a property (e.g., color) of a single filtered object. |
equal_* , compare_integer |
Binary comparison | Returns true/false based on equality or magnitude comparison. |
count |
Returns integer | Counts the number of objects matching a filter chain. |
CLEVR’s functional language acts as an intermediate formalism: structured like code, expressive like language.
0 Comments