«OPTIMIZATION AND MONITORING OF GEOLOGICAL CARBON STORAGE OPERATIONS A DISSERTATION SUBMITTED TO THE PROGRAM IN EARTH, ENERGY AND ENVIRONMENTAL ...»
2.2.2 Particle Swarm Optimization Particle Swarm Optimization (PSO) is a heuristic, global stochastic search method originally developed by Kennedy & Eberhart (1995). The method entails a cooperative search that is modeled on the behavior of animal swarms. PSO is population based (in this regard it is similar to GA and other stochastic evolution procedures).
At every iteration of the algorithm a function evaluation must be performed for each ‘particle’ (a particle is an individual representing a potential solution) within the swarm. PSO has been used previously for well location and well control optimizations in oil ﬁeld problems. See Owunalu (2010) and Isebor et al. (2013) for detailed formulations and results using PSO, and Fernandez Martinez et al. (2008) for a theoretical assessment of the algorithm.
As speciﬁed previously, our optimizations involve n = 24 to 47 optimization variables. In PSO runs, the number of particles used (Np ) is given by the heuristic (from √ Fernandez Martinez et al., 2008): Np = 10 + 2⌈ n ⌉, which gives Np = 20 to 24.
Clearly this number of particles cannot provide an exhaustive global search, but the method does enable some degree of global exploration.
At each iteration of PSO, particles in the swarm move through the search space according to a set of particle ‘velocities.’ There are several PSO variants and the detailed velocity calculation depends on the method used. Here we apply the socalled ‘local best PSO’ (Clerc, 2010), described below.
At iteration k, the position of particle i in the search space is denoted by ui (k) ∈
26 CHAPTER 2. OPTIMIZATION METHODOLOGY
where vi (k + 1) is the particle velocity at iteration k + 1 and ∆t is a (conceptual) time step. Here we take ∆t = 1, but other choices are possible and may be beneﬁcial in some cases, as explained by Fernandez Martinez et al. (2008).
The particle velocity is determined using a randomized linear combination of velocity components (an inertia component, a cognitive component and a social component):
where ω, c1 and c2 are tunable constants and r1 and r2 are random variables distributed uniformly on [0, 1]. Here we use ω = 0.721, and c1 = c2 = 1.193, which were found to perform well on a suite of test problems (Clerc, 2006). Figure 2.3 depicts the velocity components for a single particle in a two-dimensional search (n = 2).
The inertia component provides a degree of continuity in particle motion from one iteration to the next, while the cognitive component moves the particle toward its previous best solution and the social component moves it toward the best solution achieved by any particle in its neighborhood. In ‘global best PSO,’ all particles are considered to be in the same neighborhood. This procedure can lead to rapid convergence, but it is susceptible to getting trapped in a local minimum. In the ‘local best PSO’ method used here, the neighborhood is varied randomly at each iteration.
Onwunalu & Durlofsky (2010) found that this approach appropriately balanced the global search characteristics of PSO with reasonable convergence rates in oil reservoir optimizations. We expect PSO to perform similarly in the context of carbon storage operations.
2.2. OPTIMIZATION METHODS
Figure 2.3: PSO particle velocity components in a two-dimensional search space When the updated particle position computed in Equation 2.
9 falls outside the feasible domain (i.e., ui (k + 1) ∈ Ω), we apply the ‘absorb’ method described by / Clerc (2010). With this method, infeasible particles are projected back into the feasible region and the corresponding velocity component is adjusted to be consistent with the particle position, i.e., vi (k + 1) = ui (k + 1) − ui (k). The PSO algorithm proceeds until one or more termination criteria are met. These could include the norm of particle velocities decreasing to below a given tolerance, or reaching a speciﬁed maximum number of iterations. In this work, we simply run PSO for a speciﬁed number of iterations (100 or 200), which generally results in particles ‘converging’ to a particular region of the search space.
The PSO algorithm parallelizes very naturally, and this represents an important advantage of this approach. Speciﬁcally, if we have as many processors and software licenses as particles (Np ), then all particles can be evaluated (simulated) simultaneously. In Chapter 4, we will use PSO to optimize over multiple (R) geological models. In that case, each PSO iteration requires Np × R simulations to evaluate the objective function J for each particle and geological model. Again, these runs can be performed in parallel, though now Np × R processors are required to achieve optimal parallel eﬃciency. We note ﬁnally that, as discussed earlier in the context of HJDS, PSO can be hybridized through linkage to a local optimization algorithm.
28 CHAPTER 2. OPTIMIZATION METHODOLOGYAlong these lines, Isebor et al. (2013) presented a general approach for optimizing oil well placement and control problems that combines PSO and Mesh Adaptive Direct Search (MADS). This approach, which includes global search as well as convergence to a local optimum, was shown to provide beneﬁts over standalone PSO and could be similarly applied for the optimizations considered here.
2.2.3 Use of Generalized Pattern Search for data assimilation The data assimilation procedure applied in this work involves ﬁnding an estimate of the uncertain model parameters. Determining this estimate requires us to solve an optimization problem to minimize a misﬁt function. Before the optimization is performed, a Karhunen-Lo`ve (K-L) parameterization of the uncertain model parameters e is applied to reduce the number of optimization variables and guarantee that realizations honor a speciﬁed prior model (details on the K-L procedure will be provided in Chapter 4). The optimization problem is of the form ˆ min S = [f (m(ξ)) − dobs ]T W[f (m(ξ)) − dobs ], (2.11) ξ ˆ where S is the misﬁt function, dobs is a vector of observed injection and monitoring well data, f is the forward model, which maps the input model parameters to the output model data via ﬂow simulation, m is the vector of unknown model parameters (e.g., porosity and permeability in every grid block), ξ ∈ Rℓ is the vector of K-L parameters with ℓ Nb, where Nb is the number of grid blocks in the model, and W is a weighting matrix (in a Bayesian setting, W = C−1, where CD is the data D covariance matrix).
Because the K-L parameterization provides a diﬀerentiable representation of the geological model m, the optimization problem in Equation 2.11 is well suited to gradient-based and direct search optimization methods that converge to local minima.
See, e.g., Sarma et al. (2006) for the application of the K-L representation with gradient-based optimization. Since we are using a commercial simulator and do not have access to gradients, in this study we apply a local search procedure, Generalized Pattern Search (GPS). We also investigated the use of PSO for the data assimilation problem, but we found GPS to consistently provide better matches than PSO for a
2.2. OPTIMIZATION METHODS given computational eﬀort. In addition, the more systematic local search performed by GPS (as opposed to the global stochastic search using PSO) may be beneﬁcial in closed-loop applications. Speciﬁcally, GPS can use the history-matched model from the previous update step as the initial guess, and the new model will then be in some sense ‘close’ to the previous model. We note, however, that PSO with the K-L parameterization was successfully used for history matching by Suman (2013).
GPS is a derivative-free optimization method that uses a stencil-based search (similar to HJDS) to ﬁnd the minimum in the parameter space. While GPS is not as opportunistic as HJDS (meaning it takes more function evaluations to converge), it has the advantage of easy parallelization. Thus, when multiple processors and licenses are available, the elapsed time using GPS will be considerably less than for HJDS. Refer to Kolda et al. (2003), Conn et al. (2009) and Audet & Dennis Jr (2002) for details on GPS (and related techniques), including convergence results and descriptions of constraint-handling procedures.
The basic GPS procedure, illustrated in Figure 2.4 for a search in two dimensions, is quite intuitive. At each GPS iteration, a stencil centered at the current best solution (dark gray circle) deﬁnes the points to be evaluated. The deﬁnition and evaluation of all of the points in the stencil (black circles) is referred to as ‘polling.’ For a search in ℓ dimensions, as is required for our data assimilation problem, this entails 2ℓ function evaluations at each iteration, if all points in the stencil are considered.
If the polling results in an improvement to the objective function, the stencil is shifted to the best solution, as illustrated in Figure 2.4(a)-(d). Otherwise, the size of the stencil is reduced and polling proceeds with the new stencil (Figure 2.4(e)). This process is continued until the stencil size reaches a prescribed tolerance, at which point the optimization has converged (Figure 2.4(f)). If a polling point lies outside the feasible region, we simply project it back to the value at the boundary of the feasible region. Because the simulations associated with each of the poll points are independent, the algorithm naturally parallelizes.
For GPS to satisfy the convergence criteria given in Kolda et al. (2003), the stencil must be a generating set, meaning that it must be guaranteed to contain at least one descent direction. For optimizations with simple bound constraints, a stencil requires at least ℓ + 1 polling points to be a generating set, though the use of more points may
30 CHAPTER 2. OPTIMIZATION METHODOLOGY
Figure 2.4: GPS for a two-dimensional search.
Contours depict the objective function, and the star indicates the local minimum (from Kolda et al., 2003) lead to faster convergence. In this work, we use a stencil consisting of points in the positive and negative directions in each dimension, giving 2ℓ polling points. However, rather than performing all 2ℓ function evaluations at each iteration, the polling can be terminated if one or more stencil points (at the current GPS iteration) improves the objective function. This approach is beneﬁcial when the number of available computational nodes is less than 2ℓ, which is the case in our problem (we typically access at most 50 nodes, and 2ℓ = 200). Otherwise, all 2ℓ function evaluations could be performed simultaneously, and results for all stencil points would then be available.
In addition, we can prescribe the sequence used to evaluate the stencil points such that we increase the chances of a successful polling with only one use of the available nodes in the cluster (i.e., at a given iteration, our goal is to call the cluster only once).
We deﬁne the order in which the polling points are evaluated using a stack. Once a point has been evaluated, it is usually sent to the bottom of the stack. If a polling direction is successful, however, we have the option of keeping it at the top of the stack for the next iteration. We must however ensure that a large portion of polling
2.3. AQUIFER MODEL directions move to the bottom of the stack or the search may become biased and convergence slowed. Thus there is a balance between maintaining successful search directions and incorporating new directions.
Through numerical experimentation, we found that the fastest convergence in our history matching computations was achieved by maintaining a maximum of 20% of the points polled in iteration k at the top of the stack for iteration k + 1. This generally enabled us to call the cluster only once at each GPS iteration. Note that this 20% value will in general depend on the speciﬁc problem, the dimension of the search space, and the number of cores available.
2.3 Aquifer model The subsurface model used in this work is intended to represent a sandstone aquifer initially containing only brine, at a temperature of 40 ◦ C and initial pressure of 17 MPa. A total of 5 MT of CO2 (1 MT = 109 kg) is injected into the aquifer every year for 30 years. This is approximately equivalent to the amount of CO2 produced by a 700 MW coal-ﬁred power plant. The central portion of the model, which is intended to store all of the injected CO2, is 10.9 km × 10.9 km areally and 97.5 m (320 ft) thick. The total injected CO2 corresponds to about 2.5% of the pore volume of this central region (this percentage is in the middle of the 1–4% range speciﬁed by the Intergovernmental Panel on Climate Change; Solomon, 2007). Essentially all of the injected CO2 remains in the central region of the model, and any CO2 that leaves this zone is counted as mobile CO2 regardless of how it is stored. Thus the optimization penalizes CO2 leaving the central region.
The overall model, shown in Figure 2.5, contains the central region described above and a large enclosing area (of total area 230 km × 230 km, though our results are not sensitive to the precise size of this model as long as it is large relative to the central region) which provides pressure support. The overall model is represented on a 39 × 39 × 8 grid. The central portion of the model, shown in Figure 2.6, contains 25 × 25 × 8 grid blocks. Wells are constrained to lie in the so-called core region, which contains 15 × 15 × 8 grid blocks. Note that the grid spacing increases away from the central region to facilitate faster simulation runs.