Table Of ContentElectronic Journal of Statistics
(2008)
ISSN:1935-7524
Warped Wavelet and Vertical
Thresholding
8 Pierpaolo Brutti
0
0 Dipartimento di Scienze Economiche e Aziendali
2 LUISS, Guido Carli
Viale Romania 32, 00197 Roma, Italy.
n e-mail: [email protected]
a url: http://docenti.luiss.it/brutti
J
2 Abstract: Let {(Xi,Yi)}i∈{1,...,n} be an i.i.d. sample from the random
design regression model Y =f(X)+ε with (X,Y) [0,1] [ M,M]. In
2 ∈ × −
dealingwithsuchamodel,adaptationisnaturallytobeintendedinterms
] of L2([0,1],GX) norm where GX(·) denotes the (known) marginal distri-
T butionof thedesignvariableX.Recently muchworkhas beendevoted to
theconstructionofestimatorsthatadaptsinthissetting(see,forexample,
S
[5,24,25,32]),butonlyafewofthemcomealongwithaeasy–to–implement
.
h computationalscheme.Hereweproposeafamilyofestimatorsbasedonthe
t warpedwavelet basisrecentlyintroducedbyPicardandKerkyacharian[36]
a and a tree-like thresholding rule that takes into account the hierarchical
m (across-scale)structureofthewaveletcoefficients.Weshowthat,ifthere-
gressionfunctionbelongstoacertainclassofapproximationspacesdefined
[
intermsofGX(·),thenourprocedureisadaptiveandconvergetothetrue
1 regressionfunctionwithanoptimalrate.Theresultsarestatedintermsof
v excess probabilitiesasin[19].
9
AMS 2000 subject classifications:Primary 62G07, 60K35; secondary
1
62G20.
3
Keywordsandphrases:Regressionwithrandomdesign,Wavelets,Block
3
thresholding,WarpedWavelets,AdaptiveApproximation,UniversalAlgo-
.
1 rithms,Muckenhoupt weights.
0
8
0 1. Introduction
:
v
i Wavelet bases are ubiquitous in modern nonparametric statistics starting from
X
the 1994 seminal paper by Donoho and Johnstone [27]. What makes them so
r appealing to statisticians is their ability to capture the relevant features of
a
smooth signals in a few “big” coefficients at high scales (low frequencies) so
that zero thresholding the small ones, results in an effective denoising scheme
(see [47]).
Althoughthesewellknownresultsaboutthresholdingtechniqueswereusually
obtainedassumingafixed(andpossiblyequispaced)design[27,28],itwasquite
reassuring to see how they carry over almost unchanged to the irregular design
case. As a matter of fact, in the case of irregular design, various attempts to
solve this problem has been made: see, for instance, the interpolation methods
of Hall and Turlach [34] and Kovac and Silverman [38]; the binning method
of Antoniadis et al. [3]; the transformation method of Cai and Brown [14],
1
imsart-ejs ver. 2008/01/09 file: ejs_2008_175.tex date: February 2, 2008
P. Brutti/Warped Wavelet and VerticalThresholding 2
or its recent refinements by Maxim [40] for a random design; the weighted
wavelet transform of Foster [29]; the isometric method of Sardy et al. [44]; the
penalizationmethod ofAntoniadis and Fan[2]; and the specific constructionof
waveletsadapted to the design of Delouille et al. [21, 22] and Jansen et al. [46].
See also Pensky and Vidakovic [41], and the monograph [32].
The main drawback common to most of the methods just mentioned can be
found,withnosurprise,onthecomputationalside:compared,forinstance,with
theusualthresholdingtechnique,thecalculationsare,ingeneral,lessdirect.To
fix this problem, Kerkyacharian and Picard [36] propose warped wavelet ba-
sis. The idea is as follow. For a signal observed at some design points, Y(t ),
i
i 1,...,2J , if the design is regular (t = k/2J), the standard wavelet de-
k
∈ { }
composition algorithm starts with s = 2J/2Y(k/2J) which approximates the
J,k
scaling coefficient Y(x)φ (x)dx, with φ (x)=2J/2φ(2Jx k) and φ() the
J,k J,k
− ·
so–called scaling function or father wavelet (see [39] for further information).
R
Then the cascade algorithm is employed to obtain the wavelet coefficients d
j,k
forj 6J,whichinturnarethresholded.Ifthedesignisnotregular,andwestill
employ the same algorithm, then for a function H() such that H(k/2J) = t ,
k
·
we have s = 2J/2Y(H(k/2J)). Essentially what we are doing is to decom-
J,k
pose, with respect to a standard wavelet basis, the function Y(H(x)) or, if
G H(x) x,theoriginalfunctionY(x)itselfbutwithrespecttoanewwarped
◦ ≡
basis ψ (G()) . In the regression setting, this means replacing the stan-
j,k (j,k)
{ · }
dard wavelet expansion of the function f() by its expansion on the new basis
·
ψ (G()) , where G() is adapting to the design: it may be the distri-
j,k (j,k)
{ · } ·
bution function of the design, or its estimation, when it is unknown (not our
case). An appealing feature of this method is that it does not need a new al-
gorithm to be implemented: just standard and widespread tools. Of course the
properties of this basis depend on the warping factor G(). In [36] the authors
·
providetheconditionsunderwhichthisnewbasisbehaves,atleastforstatistical
purposes, as well as ordinary wavelet bases with respect to Lp([0,1],dx) norms
withp (0,+ ).Thisconditionproperlyquantifiesthedeparturefromtheuni-
∈ ∞
formdistributionandhappenstobeassociatedwiththenotionofMuckenhoupt
weights (see [31, 45]).
Now the problem is that we do not need good estimators in Lp([0,1],dx).
Whatweneedare(easytocompute)estimatorsthatadaptinL2([0,1],G ).As
X
a matter of fact it is possible to prove that the main results contained in [36]
can be extended to this new setting once we assume G () to be known as in
X
·
[15], the case of an unknown G () being beyond the scope of this work (see
X
·
[35]).
Here we propose a particular variation on the basic thresholding procedure
advanced in [36], that can be motivated as follow. In a variety of real–life sig-
nals, significant wavelet coefficients often occur in clusters at adjacent scales
andlocations.Irregularities,like a discontinuity for example, in generaltend to
affect the whole block of coefficients corresponding to wavelet functions whose
“support” contains them. For this reason it is reasonable to expect that the
risk of “blocked” thresholding rules might compare quite favorably with other
classical estimators based on level–wise or global thresholds. The literature is
imsart-ejs ver. 2008/01/09 file: ejs_2008_175.tex date: February 2, 2008
P. Brutti/Warped Wavelet and VerticalThresholding 3
Fig 1. Examples of thresholding rules: [A] - Original wavelet coefficients; [B] - Linear
thresholding; [C] - Nonlinear (hard) thresholding; [D] - Vertical (hard) thresholding.
filled with successful examples of “horizontally” (within scales) blocked rules
derived from both, purely frequentist arguments [11, 13, 15, 33], or Bayesian
reasoningsofsomeflavor[1,16,48].Recently,anincreasingamountofworkhas
been devoted to study a new class of “vertically” (across scales, see Figure 1)
blocked or treed rules [4, 10, 17, 30, 43], that have proved to be of invaluable
help in at least two settings of great importance: the construction of adaptive
pointwise confidence intervals [42] and the derivation of pointwise estimators
of a regression function that adapt rate optimally under what we could call a
focused performance measure [12].
For this reason, adapting some techniques developed in [5] to the current
(simplified) setting, in Section 2 we show how vertically zero–thresholding the
warped waveletcoefficients actually results in an universal smoother with good
properties in L2([0,1],G ) over reasonably large approximationspaces.
X
2. Tree–Structured Warped Approximations
Weshallnowdiscussingreaterdetailsnonlinearapproximationprocessesbased
onwarpedwaveletbaseswhereatreestructureispre–imposedonthepreserved
coefficients. We will start following closely [5] by reviewing some basic facts
about partitions and how they are related to adaptive approximation. Then
we present the universal algorithm based on adaptive partitions coming from a
warped wavelet decomposition and its theoretical properties.
imsart-ejs ver. 2008/01/09 file: ejs_2008_175.tex date: February 2, 2008
P. Brutti/Warped Wavelet and VerticalThresholding 4
In the spirit of the recent paper by Cucker and Smale [20], we will measure
the performances of our estimator by studying its convergence both in prob-
ability and expectation. More specifically, let P be a – generally unknown
or partially unknown– Borel measure defined on{·} = Rd R, and
Z X ×Y ⊂ ×
consider again a nonparametric regression problem where we want to estimate
the conditional mean f(x) = E(Y X = x) from an i.i.d. sample of size n,
z= z = (x ,y ) , drawn |from P . Assume further that, chosen an
n i i i 1,...,n
{ }∈{ } {·}
hypothesis space from which our candidate estimators fz() comes from, we
shall measure theHapproximation error of fz() in the L2( ,·GX) norm, where
· X
GX() is the (marginal) distribution of the design variable X. Here, as in the
·
previous section, we will assume GX() to be known. So, given fz , the
· ∈ H
quality of its performance is measured by
kf −fzk=kf −fzkL2(X,GX).
Clearlythisquantityisstochasticinnatureand,consequently,itisgenerallynot
possible to say anything about it for a fixed z. Instead we look at the behavior
in probability as measured by
P⊗n z: f fz >η , η >0
{ k − k }
or the expected error
E⊗n f fz = f fz dP⊗n,
k − k k − k
Z
(cid:0) (cid:1)
whereP⊗n denotesthe n–foldtensorproductofP .Clearly,givenabound
for P⊗n z{:·}f fz >η , we can immediately obta{i·n} another bound for the
{ k − k }
expected error since
+
E⊗n f fz = ∞P⊗n z: f fz >η dη. (1)
k − k { k − k }
Z0
(cid:0) (cid:1)
As we will see in Section 4, bounding probabilities like P⊗n usually requires
{·}
some kind of concentration of measure inequalities (see [8]).
Now,suppose that we havechosena reasonablehypothesis space . We still
H
needto addressthe problemof howto find anestimatorfz() for the regression
·
functionf().Oneofthemostwidespreadcriteria(see[20,24,32],andreferences
·
therein) is the so called empirical risk minimization (least–squaredata fitting).
Empiricalriskminimizationismotivatedbythefactthattheregressionfunc-
tion f() is the minimizer of
·
(w)= [w(x) y]2dP.
E −
Z
That is
(f)= inf (w).
E w L2( ,GX)E
∈ X
imsart-ejs ver. 2008/01/09 file: ejs_2008_175.tex date: February 2, 2008
P. Brutti/Warped Wavelet and VerticalThresholding 5
This suggests to consider the problem of minimizing the empirical loss
n
1
z(w)= [w(xi) yi]2,
E n −
i=1
X
overallw .So,intheend,wefoundanimplementableformforourcandidate
∈H
estimator
fz =fz, =argmin z(w),
H w E
∈H
the so–called empirical minimizer. Notice that given a finite ball in a linear or
nonlinear finite dimensional space, the problem of finding fz() is numerically
·
solvable.
Inthefollowingwewillseehowtobuildthehypothesisspace fromrefinable
H
partitionsofthedesignspace andthen,howthisisrelatedto(warped)wavelet
X
basis. Typically = depends on a finite number J(n) of parameters as, for
n
H H
example, the dimension of a linear space or, equivalently, the number of basis
functions we use to generate it. In many cases, this number J is chosen using
some a priori assumption on the regression function. In other procedures, the
number J avoidsany a prioriassumptions by adapting to the data.We shall be
interested in estimators of the latter type.
2.1. Partitions, Adaptive Approximation and Least–Squares Fitting
We will now review some basic facts about partitions and how they are related
to adaptive approximation.The treatment follows closely [5]. A partitions Λ of
[0,1]d is usually built through a refinement strategy. We first describe the
X ⊂
prototypicalexample ofdyadic partitions and then, in the followingsection,we
will make the link with orthonormalexpansions through a waveletbasis. So let
=[0,1]d, and denote by = ( ) the collection of dyadic subcubes of
j j
X D D X X
of=sidel(eng).thE2a−chj nanoddeDof=the∞j=tr0eDe j.TishaesceucbuebIesare.nIfaIturally,athliegnneidtsocnhialdtrreene
j
TarethTeD2d dyadiccubes ofJS T with J I.W∈eDdenot∈eDthe setofchildrenof
j+1
Iby (I).WecallItheparent∈oDfeachsuchc⊂hildJandwriteI= (J).Thecubes
C P
in ( ) form a uniform partition in which every cube has the same measure
j
D X
2 jd.
−
More in general,we saythat a collectionofnodes is a proper subtree of
T T
if:
e
the root node I is in ,
• if I= is in ≡thXen its paTrent (I) is also in .
• 6 X T P T
e
Anyfinite propersubtree is associatedtoa uniquepartitionΛ=Λ( )which
e T e T
consists of its outer leaves, by which we mean those J such that J /
but (J) is in . One waey of generating adaptive parti∈tioTns is throuegh s∈omTe
P T
refinement strategy. One begins at the root and decides whether to refine e
X X
(i.e. subdivide e) based on some refinement criteria. If is subdivided, then
X X
imsart-ejs ver. 2008/01/09 file: ejs_2008_175.tex date: February 2, 2008
P. Brutti/Warped Wavelet and VerticalThresholding 6
oneexamines eachchildanddecides whether ornotto refinesucha childbased
on the refinement strategy.
We could also consider more general refinements. Assume, for instance, that
a>2 is a fixed integer. We assume that if is to be refined, then its children
X
consist of a subsets of which are a partition of . Similarly, for each such
X X
child there is a rule which spells out how this child is refined. We assume that
the child is also refined into a sets which form a partition of the child. Such a
refinementstrategyalsoresultsinatree (calledthemastertree)andchildren,
T
parents, proper trees and partitions are defined as above for the special case of
dyadic partitions. The refinement level j of a node is the smallest number of
refinements (starting at root) to create this node. Note that to describe these
more general refinements in terms of basis functions, we need to introduce the
concept of warped multi–wavelets and wavelet packets, but this is beyond the
scope of the present work.
Wedenoteby thepropersubtreeconsistingofallnodeswithlevel<j and
j
T
we denote by Λ the partition associated to , which coincides with ( ) in
j j j
T D X
theabovedescribeddyadicpartitioncase.Notethatincontrasttothiscase,the
a children may not be similar in which case the partitions Λ are not spatially
j
uniform(wecouldalsoworkwitheveninmoregeneralityandallowthenumber
ofchildrentodependonthecelltoberefined,whileremaininggloballybounded
by some fixed a). It is important to note that the cardinalities of a proper tree
and of its associated partition Λ( ) are equivalent. In fact one easily checks
T T
that
e card Λ( ) =e(a 1)card +1,
T − T
by remarking that each tim(cid:0)e a ne(cid:1)w node gets re(cid:0)fine(cid:1)d in the process of building
an adaptive partition, card( )eis incremented by 1eand card(Λ) by a 1.
T −
Givena partitionΛ,we caneasilyuse it to approximatefunctions supported
on . More specifically, leteus denote by the space of piecewise constant
Λ
funcXtions – normalizedin L2( ,G ) – suboSrdinateto Λ.Eachf canthen
X Λ
X ∈S
be written as
f()= c 1 1(),
· I√GX(I) I ·
I Λ
X∈
where 1() denotes the indicator function of any set I . The best approxi-
I
mation of·a given function f L2( ,G ) by the eleme⊂ntsXof is given by
X Λ
∈ X S
Π (f)()= s 1 1(),
Λ · I√GX(I) I ·
I Λ
X∈
where
s = f, 1 1 , (2)
I (cid:28) √GX(I) I(cid:29)L2(GX)
and s 0 in case G (I) 0.
I X
≡ ≡
In practice, we can consider two types of approximations corresponding to
uniform refinement and adaptive refinement. We first discuss uniform refine-
ment. Let
(f)= f Π (f) , J N ,
EJ k − ΛJ kL2(GX) ∈ 0
imsart-ejs ver. 2008/01/09 file: ejs_2008_175.tex date: February 2, 2008
P. Brutti/Warped Wavelet and VerticalThresholding 7
which is the error for uniform refinement. The decay of this error to zero is
connected with the smoothness of f() as measured in L2( ,GX). We shall
· X
denote by s the approximationspace (see the review in [23]), consisting of all
functions fA L2( ,GX) such that
∈ X
J(f)6M0a−Js, J N0. (3)
E ∈
Notice that card(Λ ) = aJ, so that the decay in Equation (3) is like N s with
J −
N thenumberofelementsinthepartition.ThesmallestM forwhichEquation
0
(3) holds serves to define the semi-norm f on s. The space s can be
s
| |A A A
viewed as a smoothness space of order s > 0 with smoothness measured with
respect to GX(). For example, if GX() is the Lebesgue measure and we use
· ·
dyadic partitioning then s/d = 2,s, s (0,1], with equivalent norms. Here
2,s is the Besov space whAich canBb∞e desc∈ribed in terms of the differences as
B∞
w( +h) w() 6M hs, x,h .
k · − · kL2(dx) 0| | ∈X
Instead of working with a–priori fixed partitions there is a second kind of
approximation where the partition is generated adaptively and will vary with
f() . Adaptive partitions are typically generated by using some refinement
·
criterion that determines whether or not to subdivide a given cell. We shall
consider a refinement criteria that was introduced to build adaptive wavelet
constructionssuchasthosegivenby Cohenet al. in[17]for imagecompression.
This criteria is analogous to thresholding wavelet coefficients. Indeed, it would
be exactly this criteria if we were to construct a wavelet (Haar like) bases for
L2( ,GX). For each cell I in the master tree and any w L2( ,GX) we
X T ∈ X
define
ν =ν(w)= s2 s2, (4)
I I J − I
sJ∈XC(I)
which describes the amount of L2( ,GX) energy which is increased in the pro-
jection of w() onto when theXelement I is refined. It also accounts for the
Λ
decreased pro·jectionSerror when I is refined. If we were in a classical situation
of Lebesgue measure and dyadic refinement, then ν2(w) would be exactly the
I
sum of squares of the (scaling) Haar coefficients of w() corresponding to I.
·
We can use ν(w) to generate an adaptive partition. Given any λ > 0, let
I
(w,λ) be the smallestproper tree that contains allI for whichν(w)>λ.
I
TThis tree can also be described as the set of all J ∈Tsuch that there exists
I J which verifies νI(w)>λ. Note that since w L∈2(T ,GX), the set of nodes
⊂ ∈ X
such that ν(w) > λ is always finite and so is (w,λ). Corresponding to this
I
T
tree we have the partitionΛ(w,λ) consisting of the outer leaves of (w,λ). We
T
shalldefine some new approximationspaces s whichmeasurethe regularityof
B
a given function w() by the size of the tree (w,λ).
Given s > 0, we·let s be the collection oTf all w L2( ,GX) such that the
B ∈ X
following is finite
|w|pBs =λsu>p0 λpcard(T(w,λ)) , withp=(s+ 12)−1. (5)
(cid:8) (cid:9)
imsart-ejs ver. 2008/01/09 file: ejs_2008_175.tex date: February 2, 2008
P. Brutti/Warped Wavelet and VerticalThresholding 8
We obtain the norm for s by adding w to w . One can show that
B k kL2(GX) | |Bs
1 2s
kw−ΠΛ(w,λ)(w)kL2(GX) 6C(s)|w|B2ss+1 λ2s+1 6C(s)|w|BsN−s, (6)
where N =card( (w,λ)) and the constant C(s) depends only on s (see Cohen
T
et al. [17]). It follows that every function w s can be approximatedto order
(N s) by Π (w)() for some partition Λ ∈wiBth card(Λ) = N. This should be
− Λ
O ·
contrasted with s which has the same approximation order for the uniform
A
partition. It is easy to see that s is larger than s. In classical settings, the
B A
class s is well understood. For example, in the case of Lebesgue measure and
B
dyadic partitions we know that each Besov space τ,s with τ > (s/d+1/2) 1
Bq −
andq (0, ]arbitrary,iscontainedin s/d(see[17]).Thisshouldbecompared
∈ ∞ B
with the s where we know that s/d = 2,s as we have noted earlier. In the
nextsectiAonwe willseehowto “visAualize”Bth∞ese approximationspaceswhenwe
use warped wavelet bases to build our partitions.
Untilnow,wehaveonlyconsideredtheproblemofapproximatingelementsof
some smoothness class by approximators associated to (adaptive) partitions of
theirdomain :nodata,nonoise;justfunctions.Here,instead,weassumethat
X
f() denotes,as before,the regressionfunction and we return to the problemof
·
estimatingitfromagivendata-set.Clearly,wecanusethefunctionsin =
Λ
H S
forthis purpose,so thatthe “incarnation”inthis contextofwhatwe calledthe
empirical minimizer, is given by
n
1
fz,Λ =argmin [w(xi) yi]2,
n −
w∈SΛ Xi=1
the orthogonal projection of y = y(x) onto with respect to the empirical
Λ
S
norm
n
1
y 2 = y(x )2,
k kL2( ,δX) n | i |
X
i=1
X
with y(x ) = y , and we can compute it by solving card(Λ) independent prob-
i i
lems, one for each element I Λ. The resulting estimator can than be written
∈
as
fz,Λ(·)= sI(z)√GX1,n(I)1I(·),
I Λ
X∈
where, for each I Λ,
∈
n n
1 1
sI(z)= n yi√GX1,n(I)1I(xi) and GX,n(I)= n 1I(xi),
i=1 i=1
X X
aretheempiricalcounterpartsofthetheoreticalcoefficientsdefinedinEquation
(2).Withthecoefficients sI(z) I Λathand,wecanbuildlinearestimatorsfz()
{ }∈ ·
correspondingto uniformpartitions with cardinalitysuitably chosento balance
the bias and variance of fz() when the true regressionfunction f() belongs to
· ·
imsart-ejs ver. 2008/01/09 file: ejs_2008_175.tex date: February 2, 2008
P. Brutti/Warped Wavelet and VerticalThresholding 9
Algorithm:Least–Squares onAdaptivePartitions
Require:Samplez={(xi,yi)}i∈{1,...,n};thresholdλn,γ>0smoothness index
Output:Anestimatorfz()fortheregressionfunctionf()
· ·
Setup:
1:DefineJ⋆=min j∈N:2j 6λ−n1/γ
Generator:
2:ComputeνI(z)fo(cid:8)rthenodesIatare(cid:9)finementlevelj<J⋆
3:Threshold{νI(z)}I atlevelλn obtainingtheset:Σ(z,n)={I∈TJ⋆ :νI(z)>λn}
4:CompleteΣ(z,n)toatree (z,n)byaddingnodes J I Σ(z,n)
T ⊃ ∈
7:ReturnTheestimatorfz()thatminimizestheempiricalriskonΛ(z,n)
· Table1
Least–squares on adaptivepartitions driven by theempirical residuals νI(z) defined in
Equation (7). Adaptedfrom [5].
some specific smoothness class. Alternatively, defining the empirical versions of
the residuals introduced in Equation (4) as
ν(z)= s2(z) s2(z), (7)
I J − I
sJ∈XC(I)
we can mimic the adaptive procedure introduced in the previous section (see
Table 1) to get universal1 estimators based on adaptive partitions. These par-
titions have the same tree structure as those used in the CART algorithm [9],
yet the selection or the right partition is quite different since it is not based
on an optimization problem but on a thresholding technique applied to to em-
pirical quantities computed at each node of the tree which play a role similar
to wavelet coefficients as we will see in the following (see [26] for a connection
between CART and thresholding in one or several orthonormal bases).
2.2. A Universal Algorithm Based on Warped Wavelets
The choice we made in the previous Section of adopting piecewise constant
functions as approximators,severely limits the optimal convergencerate to ap-
proximation spaces corresponding to smoothness classes of low or no pointwise
regularity (see [6] for an interesting extension based on piecewise polynomial
approximations). A possible way to fix this problem would be to use the com-
plexity regularization approach for which optimal convergence results could be
obtained in the piecewise polynomial context (see for instance Theorem 12.1 in
[32], and the paper by Kohler [37]).
In the present context where the marginal design distribution G () is as-
X
·
sumed to be known, we have another option based on the warped systems in-
troduced in Section 1.
It is worth mentioning that in this section we will concentrate on the
X ≡
[0,1]. The present setting could be generalized to the case where G () is a d–
X
·
1A synonymous of “adaptive”: the estimator does notrequireanypriorknowledge of the
smoothnessoftheregressionfunctionf().
·
imsart-ejs ver. 2008/01/09 file: ejs_2008_175.tex date: February 2, 2008
P. Brutti/Warped Wavelet and VerticalThresholding 10
dimensionaltensorproduct.However,thefullgeneralizationtodimensiond>1
is more involved and will not be discussed here.
To“translate”theconceptshighlightedintheprevioustwosectionsinterms
of warped systems, consider a compactly supported wavelet basis ψ (),j >
j,k
1,k Z ,whereψ ()=φ ()denotesthescalingfunction,an{ditsw·arped
1,k 0,k
v−ersion∈ ψ} (G ())−,j >· 1,k Z· .Then,foreachf L2([0,1],G ),consider
j,k X X
{ · − ∈ } ∈
its expansion in this basis
f(x)= d ψ (G (x)).
j,k j,k X
j,k
X
In this context, a tree is a finite set of indexes (j,k), j N and k
0
T ∈ ∈
0,...,2j 1 , such that (j,k) implies (j 1, k/2 ) , i.e., all “an-
{ − } ∈ T − ⌊ ⌋ ∈ T
cestors” of the point (j,k) in the dyadic grid also belong to the tree.
One can then consider the best tree–structured approximation to f(), by
·
trying to minimize
2
f d ψ (G) ,
(cid:13) − j,k j,k (cid:13)
(cid:13)(cid:13) (j,Xk)∈T (cid:13)(cid:13)L2(GX)
(cid:13) (cid:13)
over all tree having t(cid:13)he same cardinality N,(cid:13)and all choices of d . However
T (cid:13) (cid:13) j,k
the procedure of selecting the optimal tree is costly in computational time, in
comparisonto the simple reorderingthat characterizethe classicalthresholding
procedure described in the previous section. A more reasonable approach is to
use suboptimal tree selection algorithms inspired by the adaptive procedure
introduced before. In detail, we start from an initial tree = (0,0) and let
0
T { }
it “grow” as follow:
1. Givena tree ,define its “leaves” ( ) asthe indexes (j,k) / such
N N N
T L T ∈T
that (j 1, k/2 ) .
N
− ⌊ ⌋ ∈T
2. For (j,k) ( ) define the residual
N
∈L T
ν = d 2,
j,k ℓ,m
| |
sIℓ,mX⊂Ij,k
with I =[2 jk,2 j(k+1)].
j,k − −
3. Choose (j ,k ) ( ) such that
0 0 N
∈L T
ν = max ν ,
j0,k0 j,k
(j,k)∈L(TN)
4. Define = (j ,k ) .
N+1 N 0 0
T T ∪{ }
Note that this algorithm can either be controlled by the cardinality N of the
tree, or by the size of the residuals as in Table 1.
Now, let Λ be the dyadic partition associated to any such tree, and define
Π (f)(x)= dψ(G(x)), with d = f,ψ(G) ,
Λ I I I h I iL2(GX)
I Λ
X∈
imsart-ejs ver. 2008/01/09 file: ejs_2008_175.tex date: February 2, 2008