Probabilistic networks for biological complex systems

Presented by: Debora Slanzi, Michele Forlin, Davide De March, Irene Poli

To understand the function of a cell or of higher units of biological organization, often it is beneficial to conceptualize them as systems of interacting elements. For this reason it is necessary to identify (1) the components, or factors, that constitute the biological system, (2) the behaviour of these components i.e. how their level or activity changes in various conditions, and (3) the interactions among these components.

Biochemical experiments are generally characterized by a large number of factors with a little prior knowledge on the factorial effects. We deal with the problem of choosing a set of factors and their interactions to achieve a particular functionality or response for the system and formulate accurate predictions for unknown possible compositions. Probabilistic graphical models offer a common conceptual architecture where biological and mathematical objects can be expressed with a common, intuitive formalism. We adopt a class of probabilistic graphical models, namely the class of Bayesian Networks (Cowell et al., 1999; Jensen, 2001) which defines a family of probability distributions that can be represented in term of graph.

Nodes in the graph correspond to random variables, the factors of the system; its structure translates into statistical dependencies among the variables that drive the computation of joint, conditional and marginal probabilities of interest. In application, most of the random variables are chosen to express the variability of an observed quantity, such as the level of a particular molecule in the biochemical system. The (directed or undirected) arcs of the graph specify the biological hypothesis about how the variables influence one another. While the presence of an arc in the graph describes the direct dependence between variables, the lack of an arc represents their conditional independence.

Graphical models combine probability theory and graph theory. They provide a well-structured tool in order to deal with uncertainty and complexity. Fundamental to the theory of graphical models is the notion of modularity, i.e. a complex system is built by combining simpler parts. Probability theory provides the glue whereby the parts are combined, ensuring that the system as a whole is consistent, and providing ways to interface models to data. The graph theoretic side of graphical models provides an intuitively appealing interface by which humans can model highly-interacting sets of variables.

We analyse real data from biochemical experiments, conducted according to an evolutionary approach and dedicated to study the formation of vesicles (Forlin et al., 2008). We evaluate data from 540 experiments concerning 180 different mixtures with 3 replications, by learning the multivariate probability distribution described by the Bayesian network.

The analysis shows that the structure of the network can be divided in subgraphs and the variables which are part of the subgraph containing the response variable represent its most affecting variables.

We measure the dependence relations among the variables by their strength of influence, i.e. an Euclidean measure which determines the amount of difference between the probability distributions with and without the arc representing the dependency.

Through a sensitivity analysis, we identify the most relevant variable interactions. We also use an Entropy measure, calculated as the ratio between the Entropy reductions of the selected variables and the response, to detect the informative power that each factor has on it. The results highlight the dominant role of a restricted set of variables both in the main effect and in the interactions with the factors.

We compare the accuracy of reconstructing biochemical networks with different modelling and inference paradigms in order to investigate whether the application of a more complex score-based approach is of any practical benefit for extracting new biological insights and predictions from the results. In particularly we use relevance networks i.e. graphical models identified by pairwise associations between variables and undirected graphical models with constraint-based inference.

References

R. G. Cowell, A. P. Dawid, S. L. Lauritzen and D. J. Spiegelhalter (1999) Probabilistic Networks and Expert Systems. Springer-Verlag.

M. Forlin, I. Poli, D. De March, N. Packard, G. Gazzola and R. Serra (2008) Evolutionary experiments for self-assembling amphiphilic systems. Chemometrics and Intelligent Laboratory Systems, 90 (2), 153-160.

F.V. Jensen (2001) Bayesian Networks and Decision Graphs. Springer-Verlag.

No assets have been submitted for this session.