A Test Case of Correlation Metric Construction of a Reaction Pathway from Measurements

See allHide authors and affiliations

Science  29 Aug 1997:
Vol. 277, Issue 5330, pp. 1275-1279
DOI: 10.1126/science.277.5330.1275


A method for the prediction of the interactions within complex reaction networks from experimentally measured time series of the concentration of the species composing the system has been tested experimentally on the first few steps of the glycolytic pathway. The reconstituted reaction system, containing eight enzymes and 14 metabolic intermediates, was kept away from equilibrium in a continuous-flow, stirred-tank reactor. Input concentrations of adenosine monophosphate and citrate were externally varied over time, and their concentrations in the reactor and the response of eight other species were measured. Multidimensional scaling analysis and heuristic algorithms applied to two-species time-lagged correlation functions derived from the time series yielded a diagram from which the interactions among all of the species could be deduced. The diagram predicts essential features of the known reaction network in regard to chemical reactions and interactions among the measured species. The approach is applicable to many complex reaction systems.

Traditionally, chemical kinetics reveals the mechanism of a chemical reaction by establishing the sequence and rates of individual elementary steps. In multistep chemical reactions, such as combustion or metabolic pathways, complex networks of reactants and products may be set up that involve numerous elementary steps and multiple feedbacks whose significance varies at different overall concentrations. Thus, the reductionist approach of identifying and isolating individual reaction steps can be experimentally daunting in that it requires purifying the individual components and finding all of the interactions that constitute the complete mechanism. These problems are especially prevalent in enzymatic and genetic reaction systems in which the catalyst for a single reaction can have tens of interacting effectors.

We recently presented a method, called correlation metric construction (CMC) (1), that takes a different approach to the analysis of complex chemical kinetics. A chemical reaction system operating near a steady state (but far from equilibrium) is subjected to random perturbations in the concentrations of a set of input species. Given that we can measure chemical concentrations of species in situ, we analyze the response of these concentrations to the input variation. From these time series data, the method is used to deduce the reaction pathway underlying the response dynamics. It has been tested successfully on several numerical model reaction systems, but several issues need to be addressed to prove experimental feasibility. These issues include the determination of (i) the number of data points and time resolution necessary for analysis, both in regard to the number of time points per chemical concentration series and the number of chemical species that can be measured simultaneously, (ii) the effects of measurement error, and (iii) the presence of unmeasured species.

Although it is possible to address the issues theoretically, the most convincing argument for feasibility is substantiation by experiment. We present an analysis of data for an enzymatic reaction network, the initial steps of glycolysis (Fig. 1). Glycolysis is central in intermediary metabolism and has a high degree of regulation (2). The reaction pathway has been well studied and thus is a good test for the theory. Further, the reaction mechanism of this part of glycolysis has been modeled extensively (3).

Figure 1

The first few reaction steps of glycolysis (7). Lines ending in circles denote regulatory interaction; minus signs indicate a negative effector; plus signs, a positive effector. Creatine-P and CK keep the concentrations of ATP and ADP constant.

The quantity and precision of the measurements reported here are sufficient to determine the matrix of correlation functions and, from this, a reaction pathway that is qualitatively consistent with the reaction mechanism established previously. The existence of unmeasured species did not compromise the analysis. The quantity and precision of the data were not excessive, and thus, we expect the method to be generally applicable.

This CMC experiment was carried out in a continuous-flow, stirred-tank reactor (CSTR). The reaction network considered consists of eight enzymes, which catalyze the conversion of glucose into dihydroxyacetone phosphate and glyceraldehyde phosphate. Enzymes were confined to the reactor by an ultrafiltration membrane at the top of the reactor (4). The membrane was permeable to all low–molecular weight species (5).

The inputs are (i) a reaction buffer, which provides starting material for the reaction network to process, maintains pH and pMg, and contains any other species that act as constant constraints on the system dynamics, and (ii) a set of “control species” (at least one), whose input concentrations are changed randomly every sampling period over the course of the experiment. The sampling period is chosen such that the system almost, but not quite, relaxes to its nonequilibrium steady state. The system is kept near enough to its steady state to minimize trending (caused by the relaxation) in the time series, but far enough from the steady state that the time-lagged autocorrelation functions for each species decays to zero over three to five sampling periods. This long decay is necessary if temporal ordering in the network is to be analyzed.

We used capillary zone electrophoresis (CZE) (6) to measure accurately the concentrations of most of the small molecular species that are passed through the membrane. Because the enzymes are not present, the concentrations in aliquots taken from the reactor outflow just before the end of each sampling period reflect the concentrations inside the reactor. The set of perturbing input species concentrations and system responses form an ordered time series suitable for analysis by CMC.

We measured the concentrations of Pi, G6P, F6P, F16BP, F26BP, and DHAP, as well as the input and reactor concentrations of citrate and AMP (7). We developed a CZE detection method specifically for the separation and quantification of the metabolites in glycolysis (8). Using 5.5 mM 4-hydroxybenzoate (pH 11.8) as background electrolyte and indirect ultraviolet detection, we analyzed simultaneously the components mentioned above and several others in an enzyme buffer (9, 10). The analysis of a single sample took less than 6 min to complete. The detection limit for each species was defined as a signal-to-noise ratio of 3:1 and was determined to be 0.003 mM for Pi, 0.001 mM for F16BP and F26BP, and 0.002 mM for G6P, F6P, DHAP, citrate, and AMP. The error in repeated measurements of standard concentrations of these species was 2% for Pi, 1% for F16BP, 7% for F26BP, 2% for G6P, 3% for F6P, 1% for DHAP, 2% for citrate, and 3% for AMP.

To begin a CMC experiment, the CSTR was primed by flowing in the glycolytic enzymes to the following total activities: 4.0 units of HK, 10 units of PHI, 0.96 units of PFK1, 0.25 units of F16BPase, 0.01 units of PFK2::F26BPase, 0.86 units of aldolase, 10 units of TPI, and 75 units of CK (9). These activities were chosen to yield physiologically relevant concentrations of the intermediates, with the exception of F26BP, the concentration of which was set high enough to be measurable by our CZE method.

The input flow into the reactor (held constant at 0.2 ml/min) contained enzyme buffer (9), which includes, among other things, 0.6 mM glucose and 2.5 mM creatine-P. Glucose is the feed-reactant from which the process of glycolysis is initiated. The combination of CK and creatine-P was included to fix the ATP:ADP ratio during the course of the experiment. The reactor steady state reached when the inflow contains only these species is defined as the reference state; the intermediate concentrations were 1.48 mM Pi, 0.0086 mM F26BP, 0.20 mM F16BP, 0.072 mM F6P, 0.31 mM G6P, and 0.23 mM DHAP.

As input species, we chose two well-known indicators of cell nutrition level, AMP and citrate. AMP is an activator of PFK1 and promotes glycolysis (11, 12), whereas citrate is an inhibitor of both PFK1 and PFK2 and inhibits glycolysis (11, 13,14). Two 55-point, uniformly random, uncorrelated time series of concentration were generated by computer: one series for AMP over the range 0 to 0.1 mM, and one for citrate over the range 0 to 1.2 mM. These ranges represent the extreme “physiological” concentrations attained by these species.

At the start of the experiment, the system was placed in the reference state. While conserving the volume of the inflow, we then flowed the first set of input concentrations into the reactor. For each set of inputs in the time series, the system was allowed to relax for 10 min, and an aliquot (0.1 ml) was collected from the outflow. Hydrolysis of the sugars, including F26BP, was minimal as long as the samples were either immediately analyzed or kept frozen. The system was allowed to relax for an additional 3 min before the input concentrations were changed and the process repeated (Fig.2). This 13-min sampling period was chosen to be much longer than the residence time of the reactor (∼6 min) but, as determined empirically, not enough time for the system to relax completely to steady state after an average perturbation (15).

Figure 2

The time courses of measured concentration of the inputs, AMP and citrate, in the experiments, with the responses of the concentrations of Pi and of the species F16BP, DHAP, F6P, G6P, and F26BP. The dotted curves are the known input concentrations for AMP and citrate.

The measurements were analyzed according to the theory presented in (1). Briefly, the time-lagged correlation matrixR(τ) = [r ij(τ)] is first calculated with the equationsEmbedded Image(1) Embedded Image(2)here the angle brackets denote a time average over all of the measurements,x i(t) is the tth time point of the time series generated for species i, andx i is the time average of the ith time series. The indices i andj range over all species. Other methods of kinetic analysis have also used a correlation measure of relatedness between species (16). Next, R(τ) is converted into a Euclidean distance matrix with the canonical transformationEmbedded Image(3) Embedded Image(4)where Eq. 4 defines c ij to be the absolute value of the maximum correlation between the time series for species i and that for species j, regardless of the value of τ. We define D = (d ij) to be the distance matrix. Finally, a classical multidimensional scaling (MDS) method is applied toD to find a consistent configuration of points representing each of the species as well as the dimension Δ of the system. This calculation is accomplished by finding the eigenvalues λi and eigenvectorsz i of the centered inner product matrixB defined byEmbedded Image(5) Embedded Image(6)where H is the centering matrix, Iis the M × M identity matrix, and 11′is the M × M unit matrix. Each eigenvector ofB corresponds to a principal coordinate axis for the points implied by the distance matrix. The eigenvectors are normalized such that z i z i = λi. A point represents a particular time series generated for a given chemical species. The eigenvalue for an eigenvector is a measure of the amount of the total object (the set of all of the points) projected onto the particular eigenvector. The dimension of the system Δ is defined to be the minimal number of eigenvectors such that greater than 99% of the object is projected onto those eigenvectors. For graphical purposes, it is often most useful to plot the projection of the object on the first two (largest eigenvalue) principal axes so that the configuration of points is easy to visualize. The analysis outlined here is merely the simplest; many improvements are possible.

A representative three-dimensional (3D) section of the entire correlation function calculated from Eq. 2 is shown in Fig.3A. This section shows the lagged correlations of the time series of all species in the system with G6P. It shows strong positive and negative dependencies of G6P variation on a number of other species and the inputs. The smooth decline, with increasing absolute lag, of the autocorrelation of G6P from its maximum at τ = 0 implies that the system is not completely at steady state. Thus, some temporal ordering of variation in each of the variables may be assigned (Fig. 3B). If the time series for a given species has a maximum correlation at negative lags compared to a reference time series, then that species receives the input signals after the reference series, and vice versa. Similarly, if the two series are maximally correlated at zero lag but correlation tails to negative (positive) lags, variation in the given species closely follows (precedes) variation in the reference species. For example, the tailing to positive lags of the correlation of G6P with AMP-I implies that variations in AMP-I precede variation in G6P, that is, variations in G6P are responses to variations in AMP.

Figure 3

(A) Plot of the time-lagged correlation function R of G6P with all other species. Each lag represents a time of 13 min. (B) Graphs of the lagged correlation functions in (A) offset to distinguish individual correlation functions. Each successive correlation function, starting with F26BP, is offset by 0.2 unit on the correlation axis. The graph clarifies the temporal ordering data inherent in the correlation functions. For example, the maximum correlation with F26BP occurs at significantly positive lags, implying that variation in F26BP precedes variation in G6P.

A graphical matrix summary of the distance, connection, and temporal data derived from the correlation functions is shown in Fig.4. A network hypothesis for the system is derived simply by drawing arrows, in the directions indicated by the + and − symbols, between all connected species. The matrix also gives a rough idea of how “central” each species is to the dynamics of the network. For example, as indicated by the number of dark matrix elements in their rows and columns, AMP is a much stronger effector of network dynamics than is citrate.

Figure 4

Matrix summary of distance matrix, connection algorithm, and temporal ordering algorithm. The shade of each matrix element represents the distance between two species, calculated from the correlation function matrix by means of Eqs. 3 and 4. The darker the shade, the smaller the distance (white →d ij = 1.3, black →d ij = 0.2; linear gray scale). A + or − within a matrix element denotes that the connection algorithm has assigned a significant connection between these two species: a + (−) indicates that response of the row species follows (precedes) variation in the column species.

The annotated MDS diagram is automatically constructed from the data graphically displayed in Fig. 4. The diagram summarizes the strength of interaction and predicts mechanistic connections between species. To derive the positions of the species on the diagram, one performs MDS on the distance matrix. The distance between points on the diagram represents the strength of correlation between the two species. About 85% of the MDS object is contained in two dimensions. The 2D projection of the object (Fig. 5A) is thus a fairly good representation of the correlation distances among all the species in the network.

Figure 5

(A) The 2D projection of the MDS diagram for the time series shown in Fig. 2. Each point represents the time series of a given species. The closer two points are, the higher the correlation between the respective time series. Lines were drawn with the connection algorithm described in (1). Black (gray) lines indicate negative (positive) correlation between the respective species. Arrows indicate temporal ordering among species based on the lagged correlations between the their time series. (B) The predicted reaction pathway derived from the CMC diagram. Its correspondence to the known mechanism (Fig. 1) is high.

We can analyze the MDS diagram in Fig. 5A as if we had no knowledge of the glycolytic pathway and then compare the results to the known network. First, as expected, the measured concentrations of AMP and citrate in the reactor are tightly correlated to their own input concentrations (AMP-I and citrate-I; see Fig. 2), and variation in the concentrations of these species is preceded by variation in their inputs (Fig. 4). Citrate evidently strongly affects the concentration of F26BP: Variation in citrate precedes variation in F26BP, and increase of citrate in the system leads to decrease of F26BP. Citrate is positively correlated with F6P and is unlikely to react directly with F26BP; it is therefore predicted that citrate either inhibits the conversion of F6P to F26BP or activates the reverse reaction, or both.

The species F6P and G6P are almost too tightly correlated to be distinguished. Given their strong correlation, and because F6P is a simple isomer of G6P, they probably interconvert. The positive correlation between them implies that when material flows into either species, it is quickly repartitioned between them. Therefore, they are likely connected by a quasi-equilibrium reaction. According to temporal ordering, however, fluctuation in F6P closely precedes response in G6P, and thus, G6P is mechanistically further from the input species than F6P; that is, variation in the inputs must travel through more reaction steps to get to G6P than to get to F6P. Thus, although G6P is connected to AMP by the connection algorithm, temporal ordering, the strength of the correlation between G6P and F6P, and the chemical argument below imply that that connection from AMP may be to F6P.

The F6P/G6P cluster has a positive correlation with Pi and negative correlations with F26BP and AMP. This relation implies that F6P and G6P are either converted to these species or are involved in reactions controlled by them. Thus, because it is easy to imagine that F6P is converted to F26BP (and back again), and because phosphate is produced when F6P/G6P is produced, the prediction that F6P is interconverted with F26BP is supported. The time ordering of the F26BP-F6P/G6P-Pi part of the diagram may indicate that this reaction pathway favors formation of F6P/G6P under the present conditions.

Chemically, there is no clear route between F6P/G6P and AMP. However, an obvious route exists between F6P and F16BP. The F6P/G6P cluster is negatively correlated with F16BP, implying that the predicted reaction is largely unidirectional; that is, F16BP is formed at the expense of F6P or vice versa. On average, a fluctuation in AMP temporally precedes a response in F16BP and F6P/G6P, but examination of the correlation function reveals that fluctuations in F16BP closely precede (although are nearly contemporaneous with) fluctuations in F6P. Because AMP is positively correlated with F16BP and negatively correlated with F6P/G6P, the prediction of the analysis is that AMP activates the conversion of F6P to F16BP (or that AMP inhibits conversion of F16BP to F6P).

Finally, F16BP is strongly positively correlated with DHAP. The positive correlation implies that if the two are chemically interconverted, then the reaction is most likely near a point of quasi-equilibrium. If chemical conversion of F16BP to DHAP is assumed, then there is a three-carbon fragment we have not measured in this system (this is glyceraldehyde-3-phosphate). Variation in F16BP closely precedes variation in DHAP, implying that DHAP is further, mechanistically, from the inputs than from F16BP.

The pathway as predicted from this analysis (Fig. 5B) can be compared to the network in Fig. 1. Both pathways contain the rapid interconversion of F6P and G6P, interconversion of F6P with F26BP and F16BP, and the rapid interconversion of F16BP with DHAP, all of which form the backbone of the partial glycolytic pathway. Citrate is correctly predicted to effect negatively the conversion of F6P to F26BP, and AMP is correctly predicted to activate the conversion of F6P to F16BP.

The choice of specific connections into the F6P/G6P cluster made by the connection algorithm was not expected. One might expect G6P to be less strongly correlated to AMP and F16BP than is F6P, and there is no particular reason that Pi should be the most highly correlated with G6P. There are a number of reasons why these assignments were made, but the most important is that the high correlation of G6P with F6P makes distinguishing between them statistically nearly impossible. Therefore, relatively small amounts of measurement error in the time series data may lead to reassignment of the connections to F6P and G6P. Inorganic phosphate is most closely associated with the F6P/G6P group because F6P is a product in both reactions that produce the phosphate.

The inhibition of the conversion of F6P to F16BP by citrate (Fig.1) was not predicted. In our experimental system, this inhibition is not present. Most early studies (11, 13) of the inhibitory effect of citrate on PFK1 were performed in the absence of F26BP. Later studies have shown that F26BP combined with AMP decreases the inhibitory effect of citrate on PFK1 dramatically (12, 17). In our system, we used an excess amount of PFK2 so that we could measure F26BP by CZE. The concentration of F26BP in the system is about five times higher than its physiological level. This high level of F26BP may be responsible for diminishing the inhibition of PFK1 by citrate.

The CMC theory shows a good prediction of the reaction pathway from the measurements in this much-studied biochemical system. Both the MDS diagram itself (Fig. 5A) and the predicted reaction pathway (Fig.5B) resemble the classically determined reaction pathway (Fig. 1). In addition, CMC measurements yield information about the underlying kinetics of the network. For example, species connected by small numbers of fast reactions will have smaller distances between them than species connected by a slow reaction. We therefore conclude that the CMC method is useful and predict that it will be applicable to many other complex reaction systems.

There are a number of caveats. The predictions about the reaction pathway provide many essentials of the sequence of steps and types of interactions that exist among the measured species of the network, but do not provide all of the details of the chemical mechanism. Furthermore, because not all species in the network were measured, we do not predict the entire pathway but only the part specifically involving the measured species. The fact that the calculation yields a reasonable pathway in the absence of full information is a strength of the analysis. Analyses currently under development are concerned with better estimates of the dependence of variation in one variable on variation in the others and detecting the presence of members of the network that are missing from the time series.

Finally, the interactions predicted by the CMC analysis are always in reference to the reaction conditions, including pH, temperature, and inflow species concentrations. In our experiment, for example, the inhibitory effect of citrate for conversion of F6P to F16BP was missed, probably because of the nonphysiological concentration of F26BP. If we had seeded the reactor with a different concentration of PFK2::F26BPase, although we would not have been able to measure F26BP, we might have seen this interaction.

Even in the face of these difficulties and constraints, CMC provides a powerful method for analyzing multivariate chemical and biochemical time series in order to produce pathway hypotheses and predict likely points of control. Such analyses are becoming increasingly important with the advent of analytical methods for multivariate time-resolved measurements of chemical and biochemical concentrations such as CZE and parallel gene expression monitoring from DNA chips.


View Abstract

Navigate This Article