CausalPlayground
Overview
The CausalPlayground library serves as a tool for causality research, focusing on the interactive exploration of structural causal models (SCMs). It provides extensive functionality for creating, manipulating and sampling SCMs, seamlessly integrating them with the Gymnasium framework. Users have complete control over SCMs, enabling precise manipulation and interaction with causal mechanisms. Additionally, CausalPlayground offers a range of useful helper functions for generating diverse instances of SCMs and DAGs, facilitating quantitative experimentation and evaluation. Notably, the library is optimized for (but not limited to) easy integration with reinforcement learning methods, enhancing its utility in active inference and learning settings. This documentation presents the complete API documentation and a quickstart guide. The GitHub repository can be found here.
Installation guide
In your python environment pip install causal-playground
.
Structural Causal Models (SCM)
SCMs are a powerful model to express a data-generating process that is governed by clear causal relations represented as functions. More formally, an SCM models the causal relationship between endogenous variables, describing the main random variables of intrerest, and exogenous variables that, roughly speaking, model the context of the process. The causal effects are described as functional relations between variables, and the exogenous variables are modeled through distributions. Taken together, an SCM defines the joint probability distribution over its variables.
Defining an SCM
Creating a Structural Causal Model (SCM) with CausalPlayground is straightforward. First, instantiate an object of the
StructuralCausalModel
class. Then, add endogenous and exogenous variables to the SCM using the add_endogenous_var()
and add_exogenous_var()
methods, respectively. For each endogenous variable, specify its name, a function that
determines its value based on its dependencies, and a dictionary mapping the function's arguments to the names of the
variables it depends on. Similarly, for each exogenous variable, provide its name, a function/distribution for
generating its values, and a dictionary of keyword arguments to pass to the function. The functions can be defined as
lambda functions. Alternatively, any callable function is admissible. Note that the variable names will
always be converted to upper-case.
An example for creating the SCM $\mathcal{M} = (\mathcal{X}, \mathcal{U}, \mathcal{F}, \mathcal{P})$ with $\mathcal{X}= \{ A, Effect \}$, $\mathcal{U}=\{U\}$, $\mathcal{P}=\{Uniform(3, 8)\}$, and $\mathcal{F} = \{A\leftarrow 5+U, Effect \leftarrow A*2\}$
>>> scm = StructuralCausalModel()
>>> scm.add_endogenous_var('A', lambda noise: noise+5, {'noise': 'U'})
>>> scm.add_exogenous_var('U', random.randint, {'a': 3, 'b': 8})
>>> scm.add_endogenous_var('Effect', lambda x: x*2, {'x': 'A'})
Sampling the SCM
To sample from this SCM, use the get_next_sample()
method of the StructuralCausalModel
object. This method returns a
tuple $(x, u)$, where $x$ is a dictionary containing the sampled values of the endogenous variables, and $u$ is a
dictionary containing the sampled values of the exogenous variables. For example:
>>> x, u = scm.get_next_sample()
>>> x
{'A': 10, 'EFFECT': 20}
>>> u
{'U': 5}
Intervening in an SCM
Interventions on an SCM can be performed using the do_interventions()
method of the StructuralCausalModel
object.
This method takes a list of tuples, where each tuple represents an intervention on a specific variable. The first
element of the tuple is the name of the variable to intervene on, and the second element is a tuple containing the
intervention function and a dictionary mapping the function's arguments to the names of the variables it depends on. To
perform an intervention $do(X_0 \leftarrow 5, X_1 \leftarrow X_0+1)$, the following can be implemented:
>>> scm.do_interventions([("X0", (lambda: 5, {})),
("X1", (lambda x0: x0+1, {'x0':'X0'})])
The do_interventions()
method is called with a list of two interventions. The first intervention sets the value of the
variable X0
to a constant value of 5, using a lambda function with no arguments. The second intervention sets the
value of the variable X1
to the value of X0+1
, using a lambda function that takes x0
as an argument and a
dictionary that maps x0
to the variable name X0
. After applying these interventions, the SCM will use the new
causal mechanisms for the intervened variables when sampling. To undo all interventions
that are currently applied to the SCM, you can call scm.undo_interventions()
, which will return the SCM to its
original state with $do(\emptyset)$.
SCM Environments
Creating Interactive Environments
To create an interactive environment for working with an SCM using the Gymnasium framework, use the SCMEnvironment
class from the CausalPlayground
module. Instantiate an SCMEnvironment
object by passing the StructuralCausalModel
object and a list of possible interventions. These interventions correspond to the actions in this environment. The list
of possible interventions follows the same format as the do_interventions()
method, where each intervention is
represented by a tuple containing the variable name, the intervention function, and a dictionary mapping the function's
arguments to variable names. In the example code snippet:
>>> env = SCMEnvironment(scm, possible_interventions=[ ("X0", (lambda: 5, {})),
("X1", (lambda x0: x0+1, {'x0':'X0'}))])
The first intervention sets the value of X0
to a constant value of 5, while the second intervention sets the value of
X1
to the value of X0+1
. The resulting env
object can be used to interact with the SCM using the standard
Gymnasium environment interface, allowing for interactive exploration and experimentation with the causal model.
Interacting with SCMs
Calling the step(action)
function applies the interventions defined in action
, samples the
intervened SCM, determines the new observation, termination flag, truncated flag and reward, and, finally undoes the
interventions.
The action is a list of indices corresponding to the indices of possible_interventions
defined upon the initialization
of the environment. This allows for multiple interventions simultaneously. For example, simultaneously performing both
possible interventions in one step can be done with env.step([1, 2])
. Whereas performing a step without intervention
can be done by either invoking the first (empty) intervention env.step([0])
, or no intervention env.step([])
.
Generating data from the SCM
In its basic implementation, SCMEnvironment
always returns the current values of the variables as observation and 0
reward regardless of the action. Furthermore, the terminated and truncated flag are always False
, leading to an
episode never terminating. The basic implementation can be used to generate samples (e.g. 1000) from an SCM either with
no interventions (observational data):
>>> for i in range(1000):
>>> env.step([0])
with some fixed interventions:
>>> for i in range(1000):
>>> env.step([1, 2])
or even with random interventions:
>>> for i in range(1000):
>>> env.step(env.action_space.sample())
The collected data can be found in env.samples_so_far
and the corresponding intervention targets in env.targets_so_far
. These
buffers can be flushed on demand with clear_samples()
and clear_intervention_targets()
, respectively.
Extending SCMEnvironment
While the basic implementation is useful for collecting data generated by an SCM, it holds potential for more
sophisticated applications to reinforcement learning by inheriting from SCMEnvironment
. To adapt the environment to a
specific task, override the neccessary functions determining observation, reward, truncated, and terminated. (Note that
the interventions in a step are undone after these quantities are determined). Below we provide an example for an
environment that rewards interventions:
>>> from CausalPlayground import SCMEnvironment
>>> class NoIntervEnv(SCMEnvironment):
def __init__(self, scm, possible_interventions):
super(NoIntervEnv, self).__init__(scm, possible_interventions)
def determine_reward(self):
if len(self.scm.get_intervention_targets()) > 0:
return 1
else:
return 0
>>> env = NoIntervEnv(scm, possible_interventions=[ ("A", (lambda: 5, {})),
("EFFECT", (lambda x0: x0+1, {'x0': 'A'}))])
Generators
In some scenarios it might be useful to automatically generate graphs and SCMs. To this end, we implemented some helper classes.
Generating Graphs
The CausalGraphGenerator
class allows you to quickly generate random DAGs with a specified number of endogenous and
exogenous variables. To create a random graph with 7 endogenous and 5 exogenous variables, you can use the following
code:
>>> gen = CausalGraphGenerator(7, 5)
>>> graph = gen.generate_random_graph()
By default, the generated graphs do not include exogenous confounders. However, you can allow for the presence of
exogenous confounders by setting the allow_exo_confounders
parameter to True
when instantiating the
CausalGraphGenerator
:
>>> gen = CausalGraphGenerator(7, 5, allow_exo_confounders=True)
If you need to generate a large set of distinct DAGs, you can use the CausalGraphSetGenerator
class. This class
ensures that each generated graph is unique within the set. To generate 1000 distinct DAGs with 7 endogenous nodes and 5
exogenous nodes, you can use the following code:
>>> gen = CausalGraphSetGenerator(7, 5)
>>> gen.generate(1000)
the generated DAGs can be accessed through the gen.graphs
attribute.
Generating SCMs
The CausalPlayground.SCMGenerator
class provides a convenient way to generate random SCMs with specified properties,
such as the number of endogenous and exogenous variables, causal relationships, and exogenous variable distributions.
This feature allows you to quickly create datasets of SCMs for testing and experimentation. This is exemplified by the
following code:
>>> from tests.functions import f_linear
>>> gen = SCMGenerator(all_functions={'linear': f_linear})
>>> scm_unconfounded = gen.create_random(possible_functions=["linear"], n_endo=5, n_exo=4,
exo_distribution=random.random,
exo_distribution_kwargs={},
allow_exo_confounders=False)[0]
In this example, the SCMGenerator
is instantiated with a dictionary of functions, where the key 'linear'
is
associated with the f_linear
function that returns a linear combination of inputs with random weights. The
create_random
method is then used to generate an SCM with 5 endogenous variables and 4 exogenous variables, using the
f_linear
function for the causal relationships and random.random
for the exogenous variable distribution. The
allow_exo_confounders
parameter determines whether the generated SCM allows for exogenous confounders.
You can also generate a random SCM based on a given causal structure by using the create_scm_from_graph
method. The
code snipped below shows and example for generating and SCM based on a given GRAPH structure, the 'linear'
function
defined above for endogenous variables, and a uniform distribution between 2 and 5 for the exogenous variables:
>>> generator.create_scm_from_graph(graph=GRAPH, possible_functions=['linear'],
exo_distribution=random.randint,
exo_distribution_kwargs={'a': 2, 'b': 5})
Custom Causal Relationships
To define your own data-generating function, you need to create an outer function that takes a list of strings (the
parent variable names) as input and returns an inner function that determines the value of the associated variable. The
inner function should take **kwargs
as parameters to access the parent variable values. An example of a custom
data-generating function, f_linear
, is provided in the code snippet below.
def f_linear(parents: List[str]):
weights = {p: random.uniform(0.5, 2.0) for p in parents}
default_value = 0.0
def f(**kwargs):
if len(kwargs) == 0:
mu = default_value
else:
mu = 0.0
for p in parents:
mu += weights[p] * kwargs[p]
return mu
return f
By using the SCMGenerator
class and custom data-generating functions, you can easily create datasets of SCMs with
desired properties, enabling you to test and evaluate various causal inference algorithms and techniques.
1""" 2# Overview 3The CausalPlayground library serves as a tool for causality research, 4focusing on the interactive exploration of structural 5causal models (SCMs). It provides extensive functionality for creating, manipulating and sampling SCMs, seamlessly 6integrating them with the Gymnasium framework. Users have complete control over SCMs, enabling precise manipulation and 7interaction with causal mechanisms. Additionally, CausalPlayground offers a range of useful helper functions for generating 8diverse instances of SCMs and DAGs, facilitating quantitative experimentation and evaluation. Notably, the library is 9optimized for (but not limited to) easy integration with reinforcement learning methods, enhancing its utility in active inference and 10learning settings. This documentation presents the complete API documentation and a quickstart guide. The GitHub 11repository can be found [here](https://github.com/sa-and/CausalPlayground). 12 13# Installation guide 14In your python environment `pip install causal-playground`. 15 16# Structural Causal Models (SCM) 17SCMs are a powerful model to express a data-generating process that is governed by clear causal relations represented 18as functions. More formally, an SCM models the causal relationship between endogenous variables, describing the main 19random variables of intrerest, and exogenous variables that, roughly speaking, model the context of the process. The 20causal effects are described as functional relations between variables, and the exogenous variables are modeled through 21distributions. Taken together, an SCM defines the joint probability distribution over its variables. 22 23## Defining an SCM 24Creating a Structural Causal Model (SCM) with CausalPlayground is straightforward. First, instantiate an object of the 25`StructuralCausalModel` class. Then, add endogenous and exogenous variables to the SCM using the `add_endogenous_var()` 26 and `add_exogenous_var()` methods, respectively. For each endogenous variable, specify its name, a function that 27 determines its value based on its dependencies, and a dictionary mapping the function's arguments to the names of the 28 variables it depends on. Similarly, for each exogenous variable, provide its name, a function/distribution for 29 generating its values, and a dictionary of keyword arguments to pass to the function. The functions can be defined as 30 lambda functions. Alternatively, any callable function is admissible. Note that the variable names will 31always be converted to upper-case. 32 33An example for creating the SCM $\mathcal{M} = (\mathcal{X}, \mathcal{U}, \mathcal{F}, \mathcal{P})$ with 34$\mathcal{X}= \\\{ A, Effect \\\}$, $\mathcal{U}=\\\{U\\\}$, $\mathcal{P}=\\\{Uniform(3, 8)\\\}$, and 35$\mathcal{F} = \\\{A\leftarrow 5+U, Effect \leftarrow A*2\\\}$ 36 37```Python 38>>> scm = StructuralCausalModel() 39>>> scm.add_endogenous_var('A', lambda noise: noise+5, {'noise': 'U'}) 40>>> scm.add_exogenous_var('U', random.randint, {'a': 3, 'b': 8}) 41>>> scm.add_endogenous_var('Effect', lambda x: x*2, {'x': 'A'}) 42``` 43 44## Sampling the SCM 45To sample from this SCM, use the `get_next_sample()` method of the `StructuralCausalModel` object. This method returns a 46 tuple $(x, u)$, where $x$ is a dictionary containing the sampled values of the endogenous variables, and $u$ is a 47 dictionary containing the sampled values of the exogenous variables. For example: 48```Python 49>>> x, u = scm.get_next_sample() 50>>> x 51{'A': 10, 'EFFECT': 20} 52>>> u 53{'U': 5} 54``` 55 56## Intervening in an SCM 57Interventions on an SCM can be performed using the `do_interventions()` method of the `StructuralCausalModel` object. 58This method takes a list of tuples, where each tuple represents an intervention on a specific variable. The first 59element of the tuple is the name of the variable to intervene on, and the second element is a tuple containing the 60intervention function and a dictionary mapping the function's arguments to the names of the variables it depends on. To 61perform an intervention $do(X_0 \leftarrow 5, X_1 \leftarrow X_0+1)$, the following can be implemented: 62```Python 63 >>> scm.do_interventions([("X0", (lambda: 5, {})), 64 ("X1", (lambda x0: x0+1, {'x0':'X0'})]) 65``` 66The `do_interventions()` method is called with a list of two interventions. The first intervention sets the value of the 67variable `X0` to a constant value of 5, using a lambda function with no arguments. The second intervention sets the 68value of the variable `X1` to the value of `X0+1`, using a lambda function that takes `x0` as an argument and a 69dictionary that maps `x0` to the variable name `X0`. After applying these interventions, the SCM will use the new 70causal mechanisms for the intervened variables when sampling. To undo all interventions 71that are currently applied to the SCM, you can call `scm.undo_interventions()`, which will return the SCM to its 72original state with $do(\emptyset)$. 73 74# SCM Environments 75## Creating Interactive Environments 76To create an interactive environment for working with an SCM using the Gymnasium framework, use the `SCMEnvironment` 77class from the `CausalPlayground` module. Instantiate an `SCMEnvironment` object by passing the `StructuralCausalModel` 78object and a list of possible interventions. These interventions correspond to the actions in this environment. The list 79 of possible interventions follows the same format as the `do_interventions()` method, where each intervention is 80 represented by a tuple containing the variable name, the intervention function, and a dictionary mapping the function's 81 arguments to variable names. In the example code snippet: 82```Python 83>>> env = SCMEnvironment(scm, possible_interventions=[ ("X0", (lambda: 5, {})), 84 ("X1", (lambda x0: x0+1, {'x0':'X0'}))]) 85``` 86The first intervention sets the value of `X0` to a constant value of 5, while the second intervention sets the value of 87`X1` to the value of `X0+1`. The resulting `env` object can be used to interact with the SCM using the standard 88Gymnasium environment interface, allowing for interactive exploration and experimentation with the causal model. 89 90## Interacting with SCMs 91Calling the `step(action)` function applies the interventions defined in `action`, samples the 92intervened SCM, determines the new observation, termination flag, truncated flag and reward, and, finally undoes the 93interventions. 94 95The action is a list of indices corresponding to the indices of `possible_interventions` defined upon the initialization 96 of the environment. This allows for multiple interventions simultaneously. For example, simultaneously performing both 97 possible interventions in one step can be done with `env.step([1, 2])`. Whereas performing a step without intervention 98 can be done by either invoking the first (empty) intervention `env.step([0])`, or no intervention `env.step([])`. 99 100## Generating data from the SCM 101In its basic implementation, `SCMEnvironment` always returns the current values of the variables as observation and 0 102reward regardless of the action. Furthermore, the terminated and truncated flag are always `False`, leading to an 103episode never terminating. The basic implementation can be used to generate samples (e.g. 1000) from an SCM either with 104no interventions (observational data): 105```Python 106>>> for i in range(1000): 107>>> env.step([0]) 108``` 109 110with some fixed interventions: 111 112```Python 113>>> for i in range(1000): 114>>> env.step([1, 2]) 115``` 116 117or even with random interventions: 118 119```Python 120>>> for i in range(1000): 121>>> env.step(env.action_space.sample()) 122``` 123 124The collected data can be found in `env.samples_so_far` and the corresponding intervention targets in `env.targets_so_far`. These 125buffers can be flushed on demand with `clear_samples()` and `clear_intervention_targets()`, respectively. 126 127## Extending SCMEnvironment 128While the basic implementation is useful for collecting data generated by an SCM, it holds potential for more 129sophisticated applications to reinforcement learning by inheriting from `SCMEnvironment`. To adapt the environment to a 130specific task, override the neccessary functions determining observation, reward, truncated, and terminated. (Note that 131the interventions in a step are undone *after* these quantities are determined). Below we provide an example for an 132environment that rewards interventions: 133```Python 134>>> from CausalPlayground import SCMEnvironment 135 136>>> class NoIntervEnv(SCMEnvironment): 137 def __init__(self, scm, possible_interventions): 138 super(NoIntervEnv, self).__init__(scm, possible_interventions) 139 140 def determine_reward(self): 141 if len(self.scm.get_intervention_targets()) > 0: 142 return 1 143 else: 144 return 0 145>>> env = NoIntervEnv(scm, possible_interventions=[ ("A", (lambda: 5, {})), 146 ("EFFECT", (lambda x0: x0+1, {'x0': 'A'}))]) 147``` 148 149# Generators 150In some scenarios it might be useful to automatically generate graphs and SCMs. To this end, we implemented some helper 151classes. 152 153## Generating Graphs 154The `CausalGraphGenerator` class allows you to quickly generate random DAGs with a specified number of endogenous and 155exogenous variables. To create a random graph with 7 endogenous and 5 exogenous variables, you can use the following 156code: 157```Python 158>>> gen = CausalGraphGenerator(7, 5) 159>>> graph = gen.generate_random_graph() 160``` 161By default, the generated graphs do not include exogenous confounders. However, you can allow for the presence of 162exogenous confounders by setting the `allow_exo_confounders` parameter to `True` when instantiating the 163`CausalGraphGenerator`: 164```Python 165>>> gen = CausalGraphGenerator(7, 5, allow_exo_confounders=True) 166``` 167 168If you need to generate a large set of distinct DAGs, you can use the `CausalGraphSetGenerator` class. This class 169ensures that each generated graph is unique within the set. To generate 1000 distinct DAGs with 7 endogenous nodes and 5 170 exogenous nodes, you can use the following code: 171```Python 172>>> gen = CausalGraphSetGenerator(7, 5) 173>>> gen.generate(1000) 174``` 175the generated DAGs can be accessed through the `gen.graphs` attribute. 176 177## Generating SCMs 178The `CausalPlayground.SCMGenerator` class provides a convenient way to generate random SCMs with specified properties, 179such as the number of endogenous and exogenous variables, causal relationships, and exogenous variable distributions. 180This feature allows you to quickly create datasets of SCMs for testing and experimentation. This is exemplified by the 181following code: 182```Python 183>>> from tests.functions import f_linear 184 185>>> gen = SCMGenerator(all_functions={'linear': f_linear}) 186>>> scm_unconfounded = gen.create_random(possible_functions=["linear"], n_endo=5, n_exo=4, 187 exo_distribution=random.random, 188 exo_distribution_kwargs={}, 189 allow_exo_confounders=False)[0] 190``` 191In this example, the `SCMGenerator` is instantiated with a dictionary of functions, where the key `'linear'` is 192associated with the `f_linear` function that returns a linear combination of inputs with random weights. The 193`create_random` method is then used to generate an SCM with 5 endogenous variables and 4 exogenous variables, using the 194 `f_linear` function for the causal relationships and `random.random` for the exogenous variable distribution. The 195`allow_exo_confounders` parameter determines whether the generated SCM allows for exogenous confounders. 196 197You can also generate a random SCM based on a given causal structure by using the `create_scm_from_graph` method. The 198code snipped below shows and example for generating and SCM based on a given GRAPH structure, the `'linear'` function 199defined above for endogenous variables, and a uniform distribution between 2 and 5 for the exogenous variables: 200```Python 201>>> generator.create_scm_from_graph(graph=GRAPH, possible_functions=['linear'], 202 exo_distribution=random.randint, 203 exo_distribution_kwargs={'a': 2, 'b': 5}) 204``` 205 206## Custom Causal Relationships 207To define your own data-generating function, you need to create an outer function that takes a list of strings (the 208parent variable names) as input and returns an inner function that determines the value of the associated variable. The 209inner function should take `**kwargs` as parameters to access the parent variable values. An example of a custom 210data-generating function, `f_linear`, is provided in the code snippet below. 211 212```Python 213def f_linear(parents: List[str]): 214 weights = {p: random.uniform(0.5, 2.0) for p in parents} 215 default_value = 0.0 216 217 def f(**kwargs): 218 if len(kwargs) == 0: 219 mu = default_value 220 else: 221 mu = 0.0 222 223 for p in parents: 224 mu += weights[p] * kwargs[p] 225 return mu 226 return f 227``` 228By using the `SCMGenerator` class and custom data-generating functions, you can easily create datasets of SCMs with 229desired properties, enabling you to test and evaluate various causal inference algorithms and techniques. 230 231""" 232from .scm import StructuralCausalModel 233from .generators import SCMGenerator, CausalGraphGenerator, CausalGraphSetGenerator 234from .scm_environment import SCMEnvironment