CausalPlayground

Overview

The CausalPlayground library serves as a tool for causality research, focusing on the interactive exploration of structural causal models (SCMs). It provides extensive functionality for creating, manipulating and sampling SCMs, seamlessly integrating them with the Gymnasium framework. Users have complete control over SCMs, enabling precise manipulation and interaction with causal mechanisms. Additionally, CausalPlayground offers a range of useful helper functions for generating diverse instances of SCMs and DAGs, facilitating quantitative experimentation and evaluation. Notably, the library is optimized for (but not limited to) easy integration with reinforcement learning methods, enhancing its utility in active inference and learning settings. This documentation presents the complete API documentation and a quickstart guide. The GitHub repository can be found here.

Installation guide

In your python environment pip install causal-playground.

Structural Causal Models (SCM)

SCMs are a powerful model to express a data-generating process that is governed by clear causal relations represented as functions. More formally, an SCM models the causal relationship between endogenous variables, describing the main random variables of intrerest, and exogenous variables that, roughly speaking, model the context of the process. The causal effects are described as functional relations between variables, and the exogenous variables are modeled through distributions. Taken together, an SCM defines the joint probability distribution over its variables.

Defining an SCM

Creating a Structural Causal Model (SCM) with CausalPlayground is straightforward. First, instantiate an object of the StructuralCausalModel class. Then, add endogenous and exogenous variables to the SCM using the add_endogenous_var() and add_exogenous_var() methods, respectively. For each endogenous variable, specify its name, a function that determines its value based on its dependencies, and a dictionary mapping the function's arguments to the names of the variables it depends on. Similarly, for each exogenous variable, provide its name, a function/distribution for generating its values, and a dictionary of keyword arguments to pass to the function. The functions can be defined as lambda functions. Alternatively, any callable function is admissible. Note that the variable names will always be converted to upper-case.

An example for creating the SCM $\mathcal{M} = (\mathcal{X}, \mathcal{U}, \mathcal{F}, \mathcal{P})$ with $\mathcal{X}= \{ A, Effect \}$, $\mathcal{U}=\{U\}$, $\mathcal{P}=\{Uniform(3, 8)\}$, and $\mathcal{F} = \{A\leftarrow 5+U, Effect \leftarrow A*2\}$

>>> scm = StructuralCausalModel()
>>> scm.add_endogenous_var('A', lambda noise: noise+5, {'noise': 'U'})
>>> scm.add_exogenous_var('U', random.randint, {'a': 3, 'b': 8})
>>> scm.add_endogenous_var('Effect', lambda x: x*2, {'x': 'A'})

Sampling the SCM

To sample from this SCM, use the get_next_sample() method of the StructuralCausalModel object. This method returns a tuple $(x, u)$, where $x$ is a dictionary containing the sampled values of the endogenous variables, and $u$ is a dictionary containing the sampled values of the exogenous variables. For example:

>>> x, u = scm.get_next_sample()
>>> x
{'A': 10, 'EFFECT': 20}
>>> u
{'U': 5}

Intervening in an SCM

Interventions on an SCM can be performed using the do_interventions() method of the StructuralCausalModel object. This method takes a list of tuples, where each tuple represents an intervention on a specific variable. The first element of the tuple is the name of the variable to intervene on, and the second element is a tuple containing the intervention function and a dictionary mapping the function's arguments to the names of the variables it depends on. To perform an intervention $do(X_0 \leftarrow 5, X_1 \leftarrow X_0+1)$, the following can be implemented:

    >>> scm.do_interventions([("X0", (lambda: 5, {})),
                              ("X1", (lambda x0: x0+1, {'x0':'X0'})])

The do_interventions() method is called with a list of two interventions. The first intervention sets the value of the variable X0 to a constant value of 5, using a lambda function with no arguments. The second intervention sets the value of the variable X1 to the value of X0+1, using a lambda function that takes x0 as an argument and a dictionary that maps x0 to the variable name X0. After applying these interventions, the SCM will use the new causal mechanisms for the intervened variables when sampling. To undo all interventions that are currently applied to the SCM, you can call scm.undo_interventions(), which will return the SCM to its original state with $do(\emptyset)$.

SCM Environments

Creating Interactive Environments

To create an interactive environment for working with an SCM using the Gymnasium framework, use the SCMEnvironment class from the CausalPlayground module. Instantiate an SCMEnvironment object by passing the StructuralCausalModel object and a list of possible interventions. These interventions correspond to the actions in this environment. The list of possible interventions follows the same format as the do_interventions() method, where each intervention is represented by a tuple containing the variable name, the intervention function, and a dictionary mapping the function's arguments to variable names. In the example code snippet:

>>> env = SCMEnvironment(scm, possible_interventions=[  ("X0", (lambda: 5, {})),
                                                        ("X1", (lambda x0: x0+1, {'x0':'X0'}))])

The first intervention sets the value of X0 to a constant value of 5, while the second intervention sets the value of X1 to the value of X0+1. The resulting env object can be used to interact with the SCM using the standard Gymnasium environment interface, allowing for interactive exploration and experimentation with the causal model.

Interacting with SCMs

Calling the step(action) function applies the interventions defined in action, samples the intervened SCM, determines the new observation, termination flag, truncated flag and reward, and, finally undoes the interventions.

The action is a list of indices corresponding to the indices of possible_interventions defined upon the initialization of the environment. This allows for multiple interventions simultaneously. For example, simultaneously performing both possible interventions in one step can be done with env.step([1, 2]). Whereas performing a step without intervention can be done by either invoking the first (empty) intervention env.step([0]), or no intervention env.step([]).

Generating data from the SCM

In its basic implementation, SCMEnvironment always returns the current values of the variables as observation and 0 reward regardless of the action. Furthermore, the terminated and truncated flag are always False, leading to an episode never terminating. The basic implementation can be used to generate samples (e.g. 1000) from an SCM either with no interventions (observational data):

>>> for i in range(1000):
>>>     env.step([0])

with some fixed interventions:

>>> for i in range(1000):
>>>     env.step([1, 2])

or even with random interventions:

>>> for i in range(1000):
>>>     env.step(env.action_space.sample())

The collected data can be found in env.samples_so_far and the corresponding intervention targets in env.targets_so_far. These buffers can be flushed on demand with clear_samples() and clear_intervention_targets(), respectively.

Extending SCMEnvironment

While the basic implementation is useful for collecting data generated by an SCM, it holds potential for more sophisticated applications to reinforcement learning by inheriting from SCMEnvironment. To adapt the environment to a specific task, override the neccessary functions determining observation, reward, truncated, and terminated. (Note that the interventions in a step are undone after these quantities are determined). Below we provide an example for an environment that rewards interventions:

>>> from CausalPlayground import SCMEnvironment

>>> class NoIntervEnv(SCMEnvironment):
        def __init__(self, scm, possible_interventions):
            super(NoIntervEnv, self).__init__(scm, possible_interventions)

        def determine_reward(self):
            if len(self.scm.get_intervention_targets()) > 0:
                return 1
            else:
                return 0
>>> env = NoIntervEnv(scm, possible_interventions=[ ("A", (lambda: 5, {})),
                                                    ("EFFECT", (lambda x0: x0+1, {'x0': 'A'}))])

Generators

In some scenarios it might be useful to automatically generate graphs and SCMs. To this end, we implemented some helper classes.

Generating Graphs

The CausalGraphGenerator class allows you to quickly generate random DAGs with a specified number of endogenous and exogenous variables. To create a random graph with 7 endogenous and 5 exogenous variables, you can use the following code:

>>> gen = CausalGraphGenerator(7, 5)
>>> graph = gen.generate_random_graph()

By default, the generated graphs do not include exogenous confounders. However, you can allow for the presence of exogenous confounders by setting the allow_exo_confounders parameter to True when instantiating the CausalGraphGenerator:

>>> gen = CausalGraphGenerator(7, 5, allow_exo_confounders=True)

If you need to generate a large set of distinct DAGs, you can use the CausalGraphSetGenerator class. This class ensures that each generated graph is unique within the set. To generate 1000 distinct DAGs with 7 endogenous nodes and 5 exogenous nodes, you can use the following code:

>>> gen = CausalGraphSetGenerator(7, 5)
>>> gen.generate(1000)

the generated DAGs can be accessed through the gen.graphs attribute.

Generating SCMs

The CausalPlayground.SCMGenerator class provides a convenient way to generate random SCMs with specified properties, such as the number of endogenous and exogenous variables, causal relationships, and exogenous variable distributions. This feature allows you to quickly create datasets of SCMs for testing and experimentation. This is exemplified by the following code:

>>> from tests.functions import f_linear

>>> gen = SCMGenerator(all_functions={'linear': f_linear})
>>> scm_unconfounded = gen.create_random(possible_functions=["linear"], n_endo=5, n_exo=4,
                                         exo_distribution=random.random,
                                         exo_distribution_kwargs={},
                                         allow_exo_confounders=False)[0]

In this example, the SCMGenerator is instantiated with a dictionary of functions, where the key 'linear' is associated with the f_linear function that returns a linear combination of inputs with random weights. The create_random method is then used to generate an SCM with 5 endogenous variables and 4 exogenous variables, using the f_linear function for the causal relationships and random.random for the exogenous variable distribution. The allow_exo_confounders parameter determines whether the generated SCM allows for exogenous confounders.

You can also generate a random SCM based on a given causal structure by using the create_scm_from_graph method. The code snipped below shows and example for generating and SCM based on a given GRAPH structure, the 'linear' function defined above for endogenous variables, and a uniform distribution between 2 and 5 for the exogenous variables:

>>> generator.create_scm_from_graph(graph=GRAPH, possible_functions=['linear'],
                                    exo_distribution=random.randint,
                                    exo_distribution_kwargs={'a': 2, 'b': 5})

Custom Causal Relationships

To define your own data-generating function, you need to create an outer function that takes a list of strings (the parent variable names) as input and returns an inner function that determines the value of the associated variable. The inner function should take **kwargs as parameters to access the parent variable values. An example of a custom data-generating function, f_linear, is provided in the code snippet below.

def f_linear(parents: List[str]):
    weights = {p: random.uniform(0.5, 2.0) for p in parents}
    default_value = 0.0

    def f(**kwargs):
        if len(kwargs) == 0:
            mu = default_value
        else:
            mu = 0.0

        for p in parents:
            mu += weights[p] * kwargs[p]
        return mu
    return f

By using the SCMGenerator class and custom data-generating functions, you can easily create datasets of SCMs with desired properties, enabling you to test and evaluate various causal inference algorithms and techniques.

  1"""
  2# Overview
  3The CausalPlayground library serves as a tool for causality research,
  4focusing on the interactive exploration of structural
  5causal models (SCMs). It provides extensive functionality for creating, manipulating and sampling SCMs, seamlessly
  6integrating them with the Gymnasium framework. Users have complete control over SCMs, enabling precise manipulation and
  7interaction with causal mechanisms. Additionally, CausalPlayground offers a range of useful helper functions for generating
  8diverse instances of SCMs and DAGs, facilitating quantitative experimentation and evaluation. Notably, the library is
  9optimized for (but not limited to) easy integration with reinforcement learning methods, enhancing its utility in active inference and
 10learning settings. This documentation presents the complete API documentation and a quickstart guide. The GitHub
 11repository can be found [here](https://github.com/sa-and/CausalPlayground).
 12
 13# Installation guide
 14In your python environment `pip install causal-playground`.
 15
 16# Structural Causal Models (SCM)
 17SCMs are a powerful model to express a data-generating process that is governed by clear causal relations represented
 18as functions. More formally, an SCM models the causal relationship between endogenous variables, describing the main
 19random variables of intrerest, and exogenous variables that, roughly speaking, model the context of the process. The
 20causal effects are described as functional relations between variables, and the exogenous variables are modeled through
 21distributions. Taken together, an SCM defines the joint probability distribution over its variables.
 22
 23## Defining an SCM
 24Creating a Structural Causal Model (SCM) with CausalPlayground is straightforward. First, instantiate an object of the
 25`StructuralCausalModel` class. Then, add endogenous and exogenous variables to the SCM using the `add_endogenous_var()`
 26 and `add_exogenous_var()` methods, respectively. For each endogenous variable, specify its name, a function that
 27 determines its value based on its dependencies, and a dictionary mapping the function's arguments to the names of the
 28 variables it depends on. Similarly, for each exogenous variable, provide its name, a function/distribution for
 29 generating its values, and a dictionary of keyword arguments to pass to the function. The functions can be defined as
 30 lambda functions. Alternatively, any callable function is admissible. Note that the variable names will
 31always be converted to upper-case.
 32
 33An example for creating the SCM $\mathcal{M} = (\mathcal{X}, \mathcal{U}, \mathcal{F}, \mathcal{P})$ with
 34$\mathcal{X}= \\\{ A, Effect \\\}$, $\mathcal{U}=\\\{U\\\}$, $\mathcal{P}=\\\{Uniform(3, 8)\\\}$, and
 35$\mathcal{F} = \\\{A\leftarrow 5+U, Effect \leftarrow A*2\\\}$
 36
 37```Python
 38>>> scm = StructuralCausalModel()
 39>>> scm.add_endogenous_var('A', lambda noise: noise+5, {'noise': 'U'})
 40>>> scm.add_exogenous_var('U', random.randint, {'a': 3, 'b': 8})
 41>>> scm.add_endogenous_var('Effect', lambda x: x*2, {'x': 'A'})
 42```
 43
 44## Sampling the SCM
 45To sample from this SCM, use the `get_next_sample()` method of the `StructuralCausalModel` object. This method returns a
 46 tuple $(x, u)$, where $x$ is a dictionary containing the sampled values of the endogenous variables, and $u$ is a
 47 dictionary containing the sampled values of the exogenous variables. For example:
 48```Python
 49>>> x, u = scm.get_next_sample()
 50>>> x
 51{'A': 10, 'EFFECT': 20}
 52>>> u
 53{'U': 5}
 54```
 55
 56## Intervening in an SCM
 57Interventions on an SCM can be performed using the `do_interventions()` method of the `StructuralCausalModel` object.
 58This method takes a list of tuples, where each tuple represents an intervention on a specific variable. The first
 59element of the tuple is the name of the variable to intervene on, and the second element is a tuple containing the
 60intervention function and a dictionary mapping the function's arguments to the names of the variables it depends on. To
 61perform an intervention $do(X_0 \leftarrow 5, X_1 \leftarrow X_0+1)$, the following can be implemented:
 62```Python
 63    >>> scm.do_interventions([("X0", (lambda: 5, {})),
 64                              ("X1", (lambda x0: x0+1, {'x0':'X0'})])
 65```
 66The `do_interventions()` method is called with a list of two interventions. The first intervention sets the value of the
 67variable `X0` to a constant value of 5, using a lambda function with no arguments. The second intervention sets the
 68value of the variable `X1` to the value of `X0+1`, using a lambda function that takes `x0` as an argument and a
 69dictionary that maps  `x0` to the variable name `X0`. After  applying these interventions, the SCM will use the new
 70causal mechanisms for the intervened variables when sampling. To undo all interventions
 71that are currently applied to the SCM, you can call `scm.undo_interventions()`, which will return the SCM to its
 72original state with $do(\emptyset)$.
 73
 74# SCM Environments
 75## Creating Interactive Environments
 76To create an interactive environment for working with an SCM using the Gymnasium framework, use the `SCMEnvironment`
 77class from the `CausalPlayground` module. Instantiate an `SCMEnvironment` object by passing the `StructuralCausalModel`
 78object and a list of possible interventions. These interventions correspond to the actions in this environment. The list
 79 of possible interventions follows the same format as the `do_interventions()` method, where each intervention is
 80 represented by a tuple containing the variable name, the intervention function, and a dictionary mapping the function's
 81  arguments to variable names. In the example code snippet:
 82```Python
 83>>> env = SCMEnvironment(scm, possible_interventions=[  ("X0", (lambda: 5, {})),
 84                                                        ("X1", (lambda x0: x0+1, {'x0':'X0'}))])
 85```
 86The first intervention sets the value of `X0` to a constant value of 5, while the second intervention sets the value of
 87`X1` to the value of `X0+1`. The resulting `env` object can be used to interact with the SCM using the standard
 88Gymnasium environment interface, allowing for interactive exploration and experimentation with the causal model.
 89
 90## Interacting with SCMs
 91Calling the `step(action)` function applies the interventions defined in `action`, samples the
 92intervened SCM, determines the new observation, termination flag, truncated flag and reward, and, finally undoes the
 93interventions.
 94
 95The action is a list of indices corresponding to the indices of `possible_interventions` defined upon the initialization
 96 of the environment. This allows for multiple interventions simultaneously. For example, simultaneously performing both
 97 possible interventions in one step can be done with `env.step([1, 2])`. Whereas performing a step without intervention
 98 can be done by either invoking the first (empty) intervention `env.step([0])`, or no intervention `env.step([])`.
 99
100## Generating data from the SCM
101In its basic implementation, `SCMEnvironment` always returns the current values of the variables as observation and 0
102reward regardless of the action. Furthermore, the terminated and truncated flag are always `False`, leading to an
103episode never terminating. The basic implementation can be used to generate samples (e.g. 1000) from an SCM either with
104no interventions (observational data):
105```Python
106>>> for i in range(1000):
107>>>     env.step([0])
108```
109
110with some fixed interventions:
111
112```Python
113>>> for i in range(1000):
114>>>     env.step([1, 2])
115```
116
117or even with random interventions:
118
119```Python
120>>> for i in range(1000):
121>>>     env.step(env.action_space.sample())
122```
123
124The collected data can be found in `env.samples_so_far` and the corresponding intervention targets in `env.targets_so_far`. These
125buffers can be flushed on demand with `clear_samples()` and `clear_intervention_targets()`, respectively.
126
127## Extending SCMEnvironment
128While the basic implementation is useful for collecting data generated by an SCM, it holds potential for more
129sophisticated applications to reinforcement learning by inheriting from `SCMEnvironment`. To adapt the environment to a
130specific task, override the neccessary functions determining observation, reward, truncated, and terminated. (Note that
131the interventions in a step are undone *after* these quantities are determined). Below we provide an example for an
132environment that rewards interventions:
133```Python
134>>> from CausalPlayground import SCMEnvironment
135
136>>> class NoIntervEnv(SCMEnvironment):
137        def __init__(self, scm, possible_interventions):
138            super(NoIntervEnv, self).__init__(scm, possible_interventions)
139
140        def determine_reward(self):
141            if len(self.scm.get_intervention_targets()) > 0:
142                return 1
143            else:
144                return 0
145>>> env = NoIntervEnv(scm, possible_interventions=[ ("A", (lambda: 5, {})),
146                                                    ("EFFECT", (lambda x0: x0+1, {'x0': 'A'}))])
147```
148
149# Generators
150In some scenarios it might be useful to automatically generate graphs and SCMs. To this end, we implemented some helper
151classes.
152
153## Generating Graphs
154The `CausalGraphGenerator` class allows you to quickly generate random DAGs with a specified number of endogenous and
155exogenous variables. To create a random graph with 7 endogenous and 5 exogenous variables, you can use the following
156code:
157```Python
158>>> gen = CausalGraphGenerator(7, 5)
159>>> graph = gen.generate_random_graph()
160```
161By default, the generated graphs do not include exogenous confounders. However, you can allow for the presence of
162exogenous confounders by setting the  `allow_exo_confounders` parameter to `True` when instantiating the
163`CausalGraphGenerator`:
164```Python
165>>> gen = CausalGraphGenerator(7, 5, allow_exo_confounders=True)
166```
167
168If you need to generate a large set of distinct DAGs, you can use the `CausalGraphSetGenerator` class. This class
169ensures that each generated graph is unique within the set. To generate 1000 distinct DAGs with 7 endogenous nodes and 5
170 exogenous nodes, you can use the following code:
171```Python
172>>> gen = CausalGraphSetGenerator(7, 5)
173>>> gen.generate(1000)
174```
175the generated DAGs can be accessed through the `gen.graphs` attribute.
176
177## Generating SCMs
178The  `CausalPlayground.SCMGenerator` class provides a convenient way to generate random SCMs with specified properties,
179such as the number of endogenous and exogenous variables, causal relationships, and exogenous variable distributions.
180This feature allows you to quickly create datasets of SCMs for testing and experimentation. This is exemplified by the
181following code:
182```Python
183>>> from tests.functions import f_linear
184
185>>> gen = SCMGenerator(all_functions={'linear': f_linear})
186>>> scm_unconfounded = gen.create_random(possible_functions=["linear"], n_endo=5, n_exo=4,
187                                         exo_distribution=random.random,
188                                         exo_distribution_kwargs={},
189                                         allow_exo_confounders=False)[0]
190```
191In this example, the  `SCMGenerator` is instantiated with a dictionary of functions, where the key `'linear'` is
192associated with the `f_linear` function that returns a linear combination of inputs with random weights. The
193`create_random` method is then used to generate an SCM with 5 endogenous variables and 4 exogenous variables, using the
194 `f_linear` function for the causal relationships and `random.random` for the exogenous variable distribution. The
195`allow_exo_confounders` parameter determines whether the generated SCM allows for exogenous confounders.
196
197You can also generate a random SCM based on a given causal structure by using the `create_scm_from_graph` method. The
198code snipped below shows and example for generating and SCM based on a given GRAPH structure, the `'linear'` function
199defined above for endogenous variables, and a uniform distribution between 2 and 5 for the exogenous variables:
200```Python
201>>> generator.create_scm_from_graph(graph=GRAPH, possible_functions=['linear'],
202                                    exo_distribution=random.randint,
203                                    exo_distribution_kwargs={'a': 2, 'b': 5})
204```
205
206## Custom Causal Relationships
207To define your own data-generating function, you need to create an outer function that takes a list of strings (the
208parent variable names) as input and returns an inner function that determines the value of the associated variable. The
209inner function should take `**kwargs` as parameters to access the parent variable values. An example of a custom
210data-generating function, `f_linear`, is provided in the code snippet below.
211
212```Python
213def f_linear(parents: List[str]):
214    weights = {p: random.uniform(0.5, 2.0) for p in parents}
215    default_value = 0.0
216
217    def f(**kwargs):
218        if len(kwargs) == 0:
219            mu = default_value
220        else:
221            mu = 0.0
222
223        for p in parents:
224            mu += weights[p] * kwargs[p]
225        return mu
226    return f
227```
228By using the `SCMGenerator` class and custom data-generating functions, you can easily create datasets of SCMs with
229desired properties, enabling you to test and evaluate various causal inference algorithms and techniques.
230
231"""
232from .scm import StructuralCausalModel
233from .generators import SCMGenerator, CausalGraphGenerator, CausalGraphSetGenerator
234from .scm_environment import SCMEnvironment