Table Of ContentThe Principle of Energetic Consistency:
Application to the Shallow-Water Equations
Stephen E. Cohn
Global Modeling and Assimilation O(cid:14)ce
NASA Goddard Space Flight Center
Greenbelt, Maryland 20771
Popular Summary
It has often been said that the reason we cannot predict tomorrow’s
weather is that we don’t know today’s weather. More precisely, if the com-
plete state of the earth’s atmosphere (e.g., pressure, temperature, winds and
humidity, everywhere throughout the atmosphere) were known at any par-
ticular initial time, then solving the equations that govern the dynamical
behavior of the atmosphere would give the complete state at all subsequent
times. Part of the di(cid:14)culty of weather prediction is that the governing equa-
tions can only be solved approximately, which is what weather prediction
models do. But weather forecasts would still be far from perfect even if
the equations could be solved exactly, because the atmospheric state is not
and cannot be known completely at any initial forecast time. Rather, the
initial state for a weather forecast can only be estimated from incomplete
observations taken near the initial time, through a process known as data
assimilation.
Weather prediction models carry out their computations on a grid of
points covering the earth’s atmosphere. The formulation of these models is
guided by a mathematical convergence theory which guarantees that, given
the exact initial state, the model solution approaches the exact solution of
the governing equations as the computational grid is made more (cid:12)ne. For
thedataassimilation process, however, theredoes notyet exist aconvergence
theory. Forinstance, itisnotyet known how toformulate adataassimilation
method in such a way that increasing the number of observations available
to estimate the initial state is guaranteed to improve the accuracy of the
estimated state, even in a statistical sense. Instead, the development of
data assimilation methods has proceeded on the basis of a number of ad hoc
assumptions and approximations.
This book chapter represents an e(cid:11)ort to begin establishing a conver-
gence theory for data assimilation methods. The main result, which is called
1
the principle of energetic consistency, provides a necessary condition that
a convergent method must satisfy. Current methods violate this principle,
as shown in earlier work of the author, and therefore are not convergent.
The principle is illustrated by showing how to apply it as a simple test of
convergence for proposed methods.
2
The Principle of Energetic Consistency:
Application to the Shallow-Water Equations1
Stephen E. Cohn
Global Modeling and Assimilation O(cid:14)ce
NASA Goddard Space Flight Center
Greenbelt, Maryland 20771
Draft of February 26, 2009
1prepared for Data Assimilation: Making Sense of Observations, W. Lahoz,
B. Khattatovand R.M(cid:19)enard (eds.), Springer
1 Introduction
Thestatementofconservationoftotalenergyfornonlinearstochasticdynamical
systems,whenexpressedinthe naturalenergyvariablesof the system, provides
an exact dynamical link between just the (cid:12)rst two moments of the state of
thesystem. Thisstatementiswhatwillbecalledheretheprinciple of energetic
consistency. Thisprincipleshouldbeusefultothedataassimilationcommunity,
because most current four-dimensional data assimilation methods are in fact
based on an approximate evolution of the (cid:12)rst two moments, conditioned on
the observations. In particular, the principle provides one simple test of how
well current methods approximate the actual evolution.
Suppose that the system state s = s(t) is a vector governed by a nonlinear
conservative system of ordinary di(cid:11)erential equations
ds
+f(s;t)=0;
dt
where t is time, and suppose that the state variables have been chosen in such
awaythat the conservedquantityis E =sTs, where the superscriptT denotes
transposition. Thusthe statementofenergyconservationisE(t)=E(t ). Now
0
supposethattheinitialstates(t )isavector-valuedrandomvariable,withmean
0
s(t ) and covariance matrix P(t ). Then the principle of energetic consistency
0 0
says that, under certain hypotheses,
sT(t)s(t)+trP(t)=sT(t )s(t )+trP(t );
0 0 0
where trA is the trace, or sum of the diagonal elements, of a matrix A. The
trace of the covariance matrix P(t) is sometimes called the total variance of
the system state. Thus the principle of energetic consistency says that any
increase (decrease) in the uncertainty in the state of the system, as measured
by the total variance, is compensated for exactly by a corresponding decrease
(increase)intheenergyofthemeanstate. Somerami(cid:12)cationsoftheprincipleof
energetic consistency for ordinary di(cid:11)erential equations, in the context of both
dataassimilationschemesandpredictabilitytheory,weregiveninarecentpaper
of Cohn (2008). The principle holds also for nonlinear, conservative discrete-
time systems.
The purpose of this chapter is to establish the principle of energetic consis-
tency for a class of hyperbolic partial di(cid:11)erential equations, and in particular,
todeterminepreciseconditionsunderwhichitholds. Ultimatelyarigorouscon-
vergence theory for data assimilation methods will be needed. Given the role
that energy considerations play in the convergencetheory for discretizations of
partial di(cid:11)erential equations, the principle of energetic consistency is likely to
play a role in a convergencetheory for data assimilation methods.
Theprincipleofenergeticconsistencyforin(cid:12)nite-dimensionalspacesisgiven
as Theorem 1 in Section 2.3. It states that under appropriate hypotheses,
s 2+tr = s 2+tr ;
k tk Pt k t0k Pt0
1
where the norm is a Hilbert space norm and is the covariance operator of
t
P
the system state. Hypotheses needed for the principle of energetic consistency
to hold, for ordinary di(cid:11)erential equations and for classical solutions of sym-
metric hyperbolic partial di(cid:11)erential equations, are examined in Section 3. A
main result is that for the latter case, the system state cannot be Gaussian-
distributed, and the classof probability distributions that the system state can
have is identi(cid:12)ed. The global nonlinear shallow-water equations are treated as
an example in Section 4. There hypotheses are given so that the principle of
energeticconsistencyholdswhenthe statevariablesaretakentobe thenatural
energy variables. It is shown that tr takes the simple form
t
P
tr = trP (x;x)acos(cid:30)d(cid:30)d(cid:21);
t t
P
Z
where P (x;y) is the covariance matrix of the shallow-watersystem state.
t
The rigoroustheory established in this chapter requiressome mathematical
machinery. The natural framework for the principle of energetic consistency
is the theory of Hilbert space-valued random variables, which is covered in
Appendix A. Appendix B covers the theory of families of Hilbert spaces which
is needed to handle spherical geometry conveniently. Appendix C summarizes
mathematical basics needed throughout the text.
2 The principle of energetic consistency
2.1 Problem setting
Let be areal,separableHilbert space,with inner product andcorresponding
H
normdenotedby(; )and ,respectively. RecallthateveryseparableHilbert
(cid:1) (cid:1) k(cid:1)k
spacehasacountable orthonormalbasis,andthateveryorthonormalbasisofa
separableHilbertspacehasthesamenumberofelementsN ,thedimension
(cid:20)1
ofthespace. Let h N beanorthonormalbasisfor ,whereN =dim
f igi=1 H H(cid:20)1
is the dimension of .
H
Let be any nonempty set in ( ), where ( ) denotes the Borel (cid:12)eld
S B H B H
generated by the open sets in , i.e., ( ) is the smallest (cid:27)-algebra of subsets
H B H
of containing all the sets that are open in . In particular, , can be
H H S (cid:26)H S
all of , and can be any open or closed set in .
H S H
Let t and T be two times with < t < T < , and let be a time
0 0
(cid:0)1 1 T
set bounded by and including t and T. For instance, = [t ;T] in the case
0 0
T
of continuous-time dynamics, and = [t ;t ;:::;t = T] in the discrete-time
0 1 K
T
case. The set is allowed to depend on the set , = ( ).
T S T T S
Let N be a map from into (written N : ) for all times
t;t0 S H t;t0 S ! H
t , i.e., for all s and t , N (s ) is de(cid:12)ned and
2T t0 2S 2T t;t0 t0
s =N (s ) (1)
t t;t0 t0
is in , s < . Assume that N is continuous and bounded for all t .
H k tk 1 t;t0 2T
Continuity means that for every t , s and (cid:15) > 0, there is a (cid:14) > 0
2 T t0 2 S
2
such that if s s0 < (cid:14) and s0 , then N (s ) N (s0 ) < (cid:15).
k t0 (cid:0) t0k t0 2 S k t;t0 t0 (cid:0) t;t0 t0 k
Boundedness means that there is a constant M =M such that
t;t0
N (s ) M s
k t;t0 t0 k(cid:20) t;t0k t0k
forall s andt . Continuityand boundednessareequivalent if N is
t0 2S 2T t;t0
a linear operator.
In most applications, N will be a nonlinear operator. Typically it will
t;t0
be the solution operator of a well-posed initial-value problem, for the state
vectorsofanonlinear,deterministicsystemofpartial(dim = )orordinary
H 1
(dim < ) di(cid:11)erential equations( =[t ;T]).1 Recall that continuityof the
0
H 1 T
solution operator is part of the (Hadamard) de(cid:12)nition of well-posedness of the
initial-value problem for continuous-time or discrete-time dynamical systems:
not only must there exist sets and = ( ), taken here to be de(cid:12)ned
S T T S
as above, and a unique solution s for all s and t , which
t 2 H t0 2 S 2 T
taken together de(cid:12)ne the solution operator, but the solution must also depend
continuously on the initial data.
The operator N is called isometric or conservative (in the norm on
t;t0 k(cid:1)k
) if
H
N (s ) = s
k t;t0 t0 k k t0k
for all s and t , and the di(cid:11)erential (or di(cid:11)erence) equations that ex-
t0 2S 2T
pressthe dynamics of a well-posed initial-value problem are called conservative
if the solution operatorof the problem is conservative. With s de(cid:12)ned for
t
2H
all s and t by Eq. (1), the quantity
t0 2S 2T
E = s 2 =(s ;s )< (2)
t t t t
k k 1
satis(cid:12)es E M2 E for all t under the assumption of boundedness, and
t (cid:20) t;t0 t0 2T
is constant in time, E =E for all t , in the conservativecase.
t t0 2T
In essence, the principle of energetic consistency is a statement about con-
tinuous transformations of Hilbert space which are conservative. Applied to
solution operators, it becomes a statement about well-posed initial-value prob-
lems for conservative dynamics. It is important to recognize that the quantity
E de(cid:12)ned in Eq. (2) is quadratic in s . For nonlinear systems of di(cid:11)erential
t t
equations that express physical laws, there is usually a choice of dependent
(state) variables such that E is the physical total energy. Then the dynamics
t
are conservative in the norm on if the physical system is closed, and the
H
principle of energetic consistency applies.
2.2 Scalar and Hilbert space-valued random variables
Before stating the principle of energetic consistency, some probability concepts
will (cid:12)rst be summarized. For details, see Appendices A.1{A.3 and C.3.
1As discussedfurther inSection 3, forpartial di(cid:11)erential equations Hwillusuallybe the
spaceL2(D)ofsquare-integrablevectors onthespatial domainD ofthe problemandS will
be an appropriate Sobolev or Sobolev-like space, while for ordinary di(cid:11)erential equations H
willusuallybeEuclideanspaceRN andS willbeanappropriateopensetinRN.
3
Let ((cid:10); ;P) be a complete probability space, with (cid:10) the sample space,
F
the event space and P the probability measure. The event space consists of
F
subsetsoftheset(cid:10),calledeventsormeasurablesets,whicharethosesubsetson
whichtheprobabilitymeasureisde(cid:12)ned. Denoteby theexpectationoperator.
A (scalar) random variable is a map r :(cid:10) ReEthat is measurable, i.e., an
!
extended real-valued function r, de(cid:12)ned for all ! (cid:10), that satis(cid:12)es
2
! (cid:10):r(!) x
f 2 (cid:20) g2F
for all x R. Thus, if r is a random variable then its probability distribution
2
function
F (x)=P( ! (cid:10):r(!) x )
r
f 2 (cid:20) g
is de(cid:12)ned for all x R. If r is a random variable then r2 is a random variable.
2
Suppose that r is a random variable. Then the expectation r is de(cid:12)ned
Ej j
and r . If r < , then the expectation r is de(cid:12)ned and called the
Ej j (cid:20) 1 Ej j 1 E
mean of r, and r r < . If r2 < , then r is called second-order, the
jE j(cid:20)Ej j 1 E 1
mean r = r and variance (cid:27)2 = (r r)2 of r are de(cid:12)ned, and
E E (cid:0)
r2 =r2+(cid:27)2: (3)
E
An -valued random variable is a map r:(cid:10) such that
H !H
! (cid:10):r(!) B
f 2 2 g2F
for every set B ( ). A map r : (cid:10) is an -valued random variable if,
2 B H ! H H
and only if, (h;r) is a scalar random variable for every h , that is, if and
2 H
only if
! (cid:10):(h;r(!)) x
f 2 (cid:20) g2F
forallh andx R. Ifrisan -valuedrandomvariablethen r isascalar
2H 2 H k k
randomvariable. An -valuedrandomvariableriscalledsecond-orderif r is
H k k
asecond-orderscalarrandomvariable,i.e., if r 2 < . If r isasecond-order
Ek k 1
-valued random variable then (h;r) is a second-orderscalarrandom variable,
H
i.e., (h;r)2 < , for all h .
E 1 2H
Suppose that r is a second-order -valued random variable. Then there
H
exists a unique element r , called the mean of r, such that (h;r) = (h;r)
2H E
forallh . Also,r0 =r r isasecond-order -valued randomvariablewith
2H (cid:0) H
mean 0 , and
2H
r 2 = r 2+ r0 2:
Ek k k k Ek k
Furthermore, there exists a unique bounded linear operator : , called
P H!H
the covarianceoperator of r, such that
(g;r0)(h;r0)=(g; h)
E P
forallg;h . Thecovarianceoperator isself-adjointandpositivesemidef-
2H P
inite, i.e., (g; h) = ( g;h) and (h; h) 0 for all g;h . It is also trace
P P P (cid:21) 2 H
class, i.e., the sum N (h ; h ) is (cid:12)nite and independent of the orthonormal
i=1 i P i
P
4
basis h N , N = dim , chosen for . This sum is called the trace of
f igi=1 H (cid:20) 1 H
:
P N
tr = (h ; h )< :
i i
P P 1
i=1
X
In addition, there exists an orthonormalbasis for which consists of eigenvec-
tors h~ N of , H
f igi=1 P h~ =(cid:21) h~
i i i
P
for i =1;2;:::;N, and the corresponding eigenvalues (cid:21) N are all nonnega-
f igi=1
tive. It follows that
(cid:21) =(h~ ; h~ )= (h~ ;r0)2 =(cid:27)2;
i i P i E i i
where (cid:27)2 is the variance of the second-order scalar random variable (h~ ;r), for
i i
i=1;2;:::;N, and that
N
tr = (cid:27)2 = r0 2:
P i Ek k
i=1
X
Thusthetraceof isalsocalledthetotalvarianceofthesecond-order -valued
P H
random variable r, and
N
r 2 = r 2+ r0 2 = r 2+ (cid:27)2 = r 2+tr : (4)
Ek k k k Ek k k k i k k P
i=1
X
Equation (4) generalizes Eq. (3), which holds for second-order scalar random
variables, to the case of second-order -valued random variables.
H
Suppose that ( ). An -valued random variable is a map r:(cid:10)
R2B H R !R
such that
! (cid:10):r(!) C
f 2 2 g2F
for every set C ( ), where
R
2B H
( )= B ( ):B :
R
B H f 2B H (cid:26)Rg
Every -valued random variable is an -valued random variable, and every
R H
-valuedrandomvariablerwithr(!) forall! (cid:10)isan -valuedrandom
H 2R 2 R
variable. An -valued random variable r is called second-order if r is a
R k k
second-orderscalarrandomvariable. Thuseverysecond-order -valuedrandom
R
variable is a second-order -valued random variable, and every second-order
H
-valued random variable r with r(!) for all ! (cid:10) is a second-order
H 2 R 2
-valued random variable. Finally, if r is an -valued random variable and N
R R
isacontinuousmapfrom into ,then N(r)isan -valuedrandomvariable.
R H H
5
2.3 The principle of energetic consistency in Hilbert space
Referring now back to Section 2.1, consider for s not just a single element of
t0
, but rather a whole collection of elements s (!) indexed by the probability
S t0
variable ! (cid:10). Suppose at (cid:12)rst that s is simply a map s : (cid:10) , i.e.,
2 t0 t0 ! S
that s (!) is de(cid:12)ned for all ! (cid:10) and s (!) for all ! (cid:10). Then since
t0 2 t0 2 S 2
N : forallt ,itfollowsthats =N (s ):(cid:10) forallt ,
t;t0 S !H 2T t t;t0 t0 !H 2T
with
s (!)=N (s (!))
t t;t0 t0
and s (!) < , for all ! (cid:10) and t .
t
k k 1 2 2T
Suppose further that s is an -valued random variable. Then it follows
t0 S
fromthecontinuityassumptiononN thats isan -valuedrandomvariable,
t;t0 t H
and therefore that E = s 2 is a scalar random variable, for all t .
t t
k k 2T
Suppose still further that s is a second-order -valued random variable,
t0 S
E = s 2 < . Then from the boundedness assumption on N ,
E t0 Ek t0k 1 t;t0
s (!) 2 M2 s (!) 2
k t k (cid:20) t;t0k t0 k
for all ! (cid:10) and t , it follows that
2 2T
E = s 2 M2 s 2 <
E t Ek tk (cid:20) t;t0Ek t0k 1
for all t . Therefore, s is a second-order -valued random variable, with
t
2 T H
mean s , covariance operator : , and
t t
2H P H!H
s 2 = s 2+tr ;
t t t
Ek k k k P
for all t . Thus the principle of energetic consistency has been established:
2T
Theorem 1 Let , , and N be as stated in Section 2.1, with N
H S T t;t0 t;t0
continuous and bounded for all t , and let be the expectation operator on
2 T E
a complete probability space ((cid:10); ;P). If s is a second-order -valued random
F t0 S
variable, then for all t , (i) s = N (s ) is a second-order -valued
2 T t t;t0 t0 H
random variable, (ii) E = s 2 is a scalar random variable, (iii) s has mean
t t t
k k
s and covariance operator : , (iv)
t t
2H P H!H
E = s 2+tr ;
t t t
E k k P
and (v)
s 2+tr M2 ( s 2+tr ): (5)
k tk Pt (cid:20) t;t0 k t0k Pt0
If, in addition, N is conservative, then (vi)
t;t0
s 2+tr = s 2+tr (6)
k tk Pt k t0k Pt0
for all t .
2T
It is in the conservative case that the principle of energetic consistency is
most useful, because in that case, Eq. (6) provides an equality against which,
forinstance, approximatemoment evolution schemescan be compared. In case
N is only bounded, for example in the presence of dissipation, or for initial-
t;t0
boundaryvalueproblemswithanet(cid:13)uxofenergyacrosstheboundaries,Eq.(5)
still provides an upper bound on the total variance tr .
t
P
6
2.4 A natural restriction on
S
Supposeforthemomentthats isan -valuedrandomvariable,notnecessarily
t0 S
second-order. When the squarednorm on representsa physical total energy,
H
it is natural to impose the restriction that every possible initial state s (!),
t0
! (cid:10), has total energy less than some (cid:12)nite maximum amount, say E < ,
(cid:3)
2 1
i.e. that , where is de(cid:12)ned for all E >0 as the open set
S (cid:26)HE(cid:3) HE
= s : s 2 <E : (7)
E
H f 2H k k g
Otherwise, given any total energy E, no matter how large, there would be a
nonzero probability that s has total energy greater than or equal to E:
t0
P( ! (cid:10): s (!) 2 E )>0:
f 2 k t0 k (cid:21) g
Of course, it can be argued that since this probability would be very small for
E very large, it may be acceptable as an approximation not to impose this
restriction. On the other hand, as discussed in Section 3.2 and illustrated in
Section 4, for classical solutions of hyperbolic systems of partial di(cid:11)erential
equations, it is necessary to require that for some E < just
S (cid:26) HE(cid:3) (cid:3) 1
to ensure well-posedness. Thus the restriction is often not only natural, but
also necessary. It also simpli(cid:12)es matters, as discussed next, for it makes s
t0
second-order automatically and gives s = N (s ) some additional desirable
t t;t0 t0
properties, and it also yields a convenient characterizationof s and s .
t0 t
Supposethats isan -valuedrandomvariable,andthat forsome
t0 S S (cid:26)HE(cid:3)
E < . Thus s (!) 2 < E for all ! (cid:10), and therefore s 2 < E , i.e.,
(cid:3) 1 k t0 k (cid:3) 2 Ek t0k (cid:3)
s is a second-order -valued random variable. Therefore, for all t , s =
t0 S 2 T t
N (s ) is a second-order -valued random variable, in fact with s (!)
t;t0 t0 H t 2HE
for all ! (cid:10), where E = E in the conservative case and E = M2 E in the
2 (cid:3) t;t0 (cid:3)
merely bounded case. Since is an open set in , ( ). Therefore,
E E
H H H 2 B H
for all t , s is an -valued random variable. Further, for all p > 0 and
t E
2 T H
t , s p <Ep=2. Thus s has (cid:12)nite moments of all orders,for all t .
t t
2T Ek k k k 2T
Now suppose that s is an -valued random variable, for some E < .
E
H 1
Then since s is also an -valued random variable, (h ;s) is a scalar random
i
H
variable for i = 1;:::;N, where h N is any orthonormal basis for and
f igi=1 H
N =dim . Since s(!) for all ! (cid:10), s(!) has the representation
H(cid:20)1 2H 2
N
s(!)= (h ;s(!))h
i i
i=1
X
for each ! (cid:10), and by Parseval’srelation,
2
N
s(!) 2 = (h ;s(!))2 <E
i
k k
i=1
X
for each ! (cid:10).
2
7