CausalCircuit is a dataset designed to guide research into causal representation learning – the problem of identifying the high-level causal variables in an image together with the causal structure between them.

The dataset consists of images that show a robotic arm interacting with a system of buttons and lights. In this system, there are four causal variables describing the position of the robot arm along an arc as well as the intensities of red, green, and blue lights. The data are rendered as 512x512 images with MuJoCO, an open-source physics engine. For the robotic arm we use a model of the TriFinger platform, an open-source robot for learning dexterity.

The data consists of pairs of data: before and after an intervention took place. Each sample contains pairs of data corresponding to before and after the intervention.

Dataset format details

The training, validation and testing files, (namely train.npz, val.npz and test.npz), respectively contain N=100,000, 10,000 and 10,000 pairs of images. The files are NumPy archives and can be read with `numpy.load`. Contained in the archives are the following arrays, listed with their shapes, data types and meaning.

  • `original_latents` (N, 2, 4), float64. The causal variables of the structural causal model (SCM) before and after intervention. The variables are the red, green, and blue light state and the arm position, respectively.
  • `epsilon` (N, 2, 4), float64. Before intervention, these are noise variables of the SCM. After intervention, these are the noise encodings, which are the inverse of the data generating process given the intervened causal variables.
  • `imgs` (N, 2), bytes. The bytes of JPEG encoding of a rendering of the scene before and after intervention.
  • `pressed` (N, 2, 3), float64. A number between 0 and 1 indicating whether the arm pressed the red, green, and blue button, before and after intervention. 0 means the arm does not touch the button and 1 that the arm presses the button in the center. Other values are given by linearly interpolating the distance to the center.
  • `intervention_labels` (N,) int64. Taking values between 0 and 4, meaning the intervention was either trivial (no variables affected), on the red, green, or blue button or on the position of the arm.
  • `intervention_masks` (N, 4) bool. Indicates with a binary vector whether an intervention has been performed on each of the 4 variables.

Dataset generation details

The unintervened data is generated according to the following generative model:

  • The arm position is sampled uniformly between 0 and 1, corresponding to an arc over the buttons.
  • The Mojoco simulator is executed, and computes how much each button is pressed.
  • The lights are sampled in the following way:

    where z_A is the arm position, and b_R, b_G and b_B are the button press states. For each light, this is represented in the structural causal model as a monotonic transformation of a noise variable, distributed uniformly between 0 and 1, to a sample of the Beta distribution.

For the intervened data, the noise variables of all unintervened variables are kept fixed, while the noise of the intervened causal variable is sampled uniformly between 0 and 1. For all non-intervened variables, the causal variables are computed as before via the SCM.

1. Sample Image

Before intervention on the arm position.

After intervention on the arm. The blue button is pressed, so the green and right light turn on.

Dataset license

The CausalCircuit dataset is available for research purposes.

Dataset download

You will need to be logged in with your Qualcomm OneID account in order to download. If you do not have an active account, please click the "Register" button at the top of the page to get started.

Please download ALL files, including the download instructions.

NOTE: Download speeds may be slower than usual due to increased traffic.


If you use this dataset, please cite:

  title = {Weakly supervised causal representation learning},
  author = {Brehmer, Johann and De Haan, Pim and Lippe, Phillip and Cohen, Taco},
  booktitle = {Advances in Neural Information Processing Systems},
  year = {2022},
  volume = {35},
  eprint = {2203.16437},
  url = {},

Qualcomm AI Research

At Qualcomm AI Research, we are advancing AI to make its core capabilities – perception, reasoning, and action – ubiquitous across devices. Our mission is to make breakthroughs in fundamental AI research and scale them across industries. By bringing together some of the best minds in the field, we’re pushing the boundaries of what’s possible and shaping the future of AI.

Qualcomm AI Research continues to invest in and support deep-learning research in computer vision. The publication of the CausalCircuit dataset for use by the AI research community is one of our many initiatives.

Find out more about Qualcomm AI Research.

For any questions or technical support, please contact us at [email protected]

Qualcomm AI Research is an initiative of Qualcomm Technologies, Inc.