Nonparametric estimation of a periodic sequence in the by Oliver Linton

Biometrika, pp. 1–20 C 2012 Biometrika Trust Printed in Great Britain

Nonparametric estimation of a periodic sequence in the presence of a smooth trend B Y M ICHAEL VOGT AND O LIVER L INTON Faculty of Economics, University of Cambridge, Austin Robinson Building, Sidgwick Avenue, Cambridge, CB3 9DD, U.K. mv346@cam.ac.uk obl20@cam.ac.uk S UMMARY We investigate a nonparametric regression model including a periodic component, a smooth trend function, and a stochastic error term. We propose a procedure to estimate the unknown period and the function values of the periodic component as well as the nonparametric trend function. The theoretical part of the paper establishes the asymptotic properties of our estimators. In particular, we show that our estimator of the period is consistent. In addition, we derive the convergence rates and the limiting distributions of our estimators of the periodic component and the trend function. The asymptotic results are complemented with a simulation study and an application to global temperature anomaly data.

Some key words: Nonparametric estimation; Penalized least squares; Periodic sequence; Temperature anomaly data.

1. I NTRODUCTION Many time series exhibit a periodic as well as a trending behaviour. Examples come from fields as diverse as astronomy, climatology, population biology and economics. A common way to model such time series is to write them as the sum of a periodic component, a time trend and a noise process. When the periodic model part and the time trend have a parametric form, they can be estimated by a variety of standard methods; see e.g. Brockwell & Davis (1991) and Quinn & Hannan (2001) for overviews. However, usually there is not much known about the structure of the periodic and the trend component. It is thus important to have flexible semi- and nonparametric methods at hand to estimate them. In this paper, we develop estimation theory for the periodic and the trend component in the following framework. Let {Yt,T : t = 1, . . . , T } be the time series under investigation. The observations are assumed to follow the model t + m(t) + εt,T (t = 1, . . . , T ), (1) Yt,T = g T where E[εt,T ] = 0, the function g is a smooth deterministic trend and m is a deterministic periodic component with unknown period θ0 . We do not impose any parametric restrictions on m and g. Moreover, we allow for nonstationarities and short-range dependence in the noise process {εt,T }. As usual in nonparametric regression, the time argument of the trend function g is rescaled to the unit interval. We comment on this in more detail in Section 2 where we discuss the various model components. The m-component in model (1) is assumed to be periodic in the following sense: the values {m(t)}t∈Z form a periodic sequence with some unknown period θ0 , i.e., m(t) = m(t + θ0 ) for

M. VOGT AND O. L INTON

some integer θ0 ≥ 1 and all t ∈ Z. Here and in what follows, θ0 is implicitly assumed to be the smallest period of the sequence. As can be seen from this definition, we think of the periodic component in model (1) as a sequence rather than a function defined on the real line. The reason for taking this point of view is that there is an infinite number of functions on R which take the values m(t) at the points t ∈ Z. The function which generates these values is thus not identified in our framework. Moreover, if this function is periodic, θ0 need not be its smallest period. It could also have θ0 /n for some n ∈ N as its period. Hence, in our design with equidistant observation points, we are in general neither able to identify the function which underlies the sequence values {m(t)}t∈Z nor its smallest period. The best we can do is to work with the sequence {m(t)}t∈Z and extract its periodic behaviour from the data. So far, only a strongly simplified version of model (1) without a trend function has been considered in the literature. The setting is given by the equation Yt = m(t) + εt , where the error terms εt are restricted to be stationary. Indeed, in many studies the errors are even assumed to be independent. The traditional way to estimate the periodic component m in this setup is a trigonometric regression approach. This approach imposes a parametric structure on m. In particular, m is parametrized by a finite number of sinusoids. The underlying function which generates the sequence {m(t)}t∈Z is thus known up to a finite number of coefficients which in particular include the period of the function. The vector of parameters can be estimated by frequency domain methods based on the periodogram. Classical articles proceeding along these lines include Walker (1971), Rice & Rosenblatt (1988) and Quinn & Thomson (1991). Only recently, there has been some work on estimating the periodic component in the setting Yt = m(t) + εt nonparametrically. Sun, Hart & Genton (2012) consider the model with independent and identically distributed residuals εt and investigate the estimation of the unknown period of the sequence {m(t)}t∈Z . They view the issue of estimating the period as a model selection problem and construct a cross-validation based procedure to solve it. Similar to the Akaike information criterion, their method is not consistent. Nevertheless, it enjoys a weakened version of consistency: roughly speaking, its asymptotic probability of selecting the true period is close to one given that the period is not too small. This property is termed virtual consistency. A related strand of the literature is concerned with nonparametrically estimating a periodic function with an unknown period when the observation points are not equally spaced in time. In this case, the model is given by Yt = m(Xt ) + εt , where m now denotes a periodic function defined on the real line, X1 < . . . < XT are the time points of observation and the residuals εt are independent and identically distributed. The design points Xt may for example form a jittered grid, i.e., Xt = t + Ut with variables Ut that are independent and uniformly distributed on (−1/2, 1/2). Even though an equidistant design is the most common situation, such a random design is for example suitable for applications in astronomy, as described in Hall et al. (2000). The random design case is very different from that of equidistant observation points, because the function m can be identified without imposing any parametric restrictions on it. Roughly speaking, this is because the random design points are scattered all over the cycle of m as the sample size increases. Estimating the periodic function m in such a random design can be achieved by kernel-based least squares methods, as shown in Hall et al. (2000). Hall & Yin (2003) and Genton & Hall (2007) investigate some variants and extensions of this method. A periodogram-based approach is presented in Hall & Li (2006). Estimation theory for another possible sampling scheme is developed in Gassiat & L´evy-Leduc (2006). In the models discussed so far, both the trend and the periodic component are deterministic functions of time. A markedly different approach is provided by unobserved components models from the state-space literature; see Harvey (1989) for a comprehensive overview. In these models, both the trend and the periodic component are stochastic in nature. It is hard to compare

Nonparametric estimation of a periodic sequence

this approach with ours in theoretical terms, since they are indeed non-nested. From a practical point of view, the two methods offer alternative ways to flexibly estimate the periodic and trend behaviour of a time series. In the unobserved components model, the flexibility comes through small stochastic innovations in the trend and the cycle. Our model in contrast owes its flexibility to the nonparametric nature of the deterministic component functions. An empirical comparison of the two approaches is provided in Section 8. In the following sections, we develop theory for estimating the unknown period θ0 , the sequence values {m(t)}t∈Z and the trend function g in the general framework (1). Our estimation procedure is introduced in Section 3 and splits into three steps. In the first step, we introduce a time domain approach to estimate the period θ0 . In particular, we combine least squares methods with an l0 -type penalization to construct an estimator of the period. Given our estimator of θ0 , we set up a least squares type estimator of the sequence values {m(t)}t∈Z in the second step. The first two steps of our estimation procedure are complicated by the fact that the model includes a trend component. Importantly, our method is completely robust to the presence of a trend. As explained later on, the trend component g gets ‘smoothed out’ by our procedure. We thus do not have to correct for the trend but can completely ignore it when estimating the periodic model part. In the third step of our procedure, we finally set up a kernel-based estimator of the nonparametric trend function g. The asymptotic properties of our estimators are described in Section 4. Our estimator of the period θ0 is shown to be consistent. Moreover, we derive the convergence rates and asymptotic normality for the estimators of the periodic sequence values and the trend function. As will turn out, our estimator of the periodic sequence values has the same limiting distribution as the estimator in the oracle case where the true period θ0 is known. A similar oracle property is derived for the estimator of the nonparametric trend function g. In Section 6, we investigate the small sample behaviour of our estimators in a simulation study. In addition, we apply our method to a sample of yearly global temperature anomalies from 1850 to 2011 in Section 7. These data exhibit a strong warming trend. As suggested by various articles in climatology, they also contain a cyclical component with a period in the region of 60–70 years. We use our procedure to investigate whether there is in fact evidence for a cyclical component in the data. In addition, we provide estimates of the periodic sequence values and the trend function.

2. M ODEL Before we turn to our estimation procedure, we have a closer look at model (1) and comment on some of its features. As already seen in the introduction, the model equation is t + m(t) + εt,T (t = 1, . . . , T ), Yt,T = g T where E[εt,T ] = 0, the function g is a deterministic trend and {m(t)}t∈Z is a periodic sequence with unknown integer-valued period θR0 . In order to identify the function g and the sequence 1 {m(t)}t∈Z , we normalize g to satisfy 0 g(u)du = 0. As shown in Lemma A2 in the appendix, this uniquely pins down g and {m(t)}t∈Z . The trend function g in model (1) depends on rescaled time t/T rather than on real time t. This rescaling device is quite common in the literature. It is for example used in nonparametric regression and in the analysis of locally stationary processes; see Robinson (1989) and Dahlhaus (1997) among others. The main reason for rescaling time to the unit interval is to obtain a framework for a reasonable asymptotic theory. If we defined g in terms of real time, we would not get additional information on the shape of g locally around a fixed time point t as the sample size

100

105

110

115

120

125

130

135

140

145

M. VOGT AND O. L INTON

increases. Within the framework of rescaled time, in contrast, the function g is observed on a finer and finer grid of rescaled time points on the unit interval as T grows. Thus, we obtain more and more information on the local structure of g around each point in rescaled time. In contrast to g, we let the periodic component m depend on real time t. This allows us to exploit its periodic character when doing asymptotics: let s be a time point in {1, . . . , θ0 }. As m is periodic, it has the same value at s, s + θ0 , s + 2θ0 , s + 3θ0 , and so on. Hence, the number of time points in our sample at which m has the value m(s) increases as the sample size grows. This gives us more and more information about the value m(s) and thus allows us to do asymptotics. Rescaling the time argument of the trend component while letting the periodic part depend on real time is a rather natural way to formulate the model. It captures the fact that the trend function is much smoother and varies more slowly than the periodic component. An analogous model formulation has for example been used in Atak et al. (2011) who apply a model with a rescaled time trend and a seasonal component to a panel of temperature data from the UK. A discussion of different time series models which feature a nonparametric trend and a seasonal component can be found in Chapter 6 of Fan & Yao (2005). Even though we do not impose any parametric restrictions on the sequence {m(t)}t∈Z , it can be represented by a vector of θ0 parameters due to its periodic character. In particular, it is fully determined by the tuple of values β = (β1 , . . . , βθ0 ) = (m(1), . . . , m(θ0 )). As a consequence, we can rewrite model (1) as Yt,T = g

t T

θ0 X

βs Is (t) + εt,T ,

(2)

s=1

where Is (t) = I(t = kθ0 + s for some k) and I(·) is the indicator function. Model (1) can thus be regarded as a semiparametric regression model with indicator functions as regressors and the parameter vector β. In matrix notation, (2) becomes Y = g + Xθ0 β + ε, 150

155

160

(3)

where slightly abusing notation, Y = (Y1,T , . . . , YT,T )T is the vector of observations, g = (g(1/T ), . . . , g(T /T ))T is the trend component, Xθ0 = (Iθ0 , Iθ0 , . . .)T is the design matrix with Iθ0 being the θ0 × θ0 identity matrix, and ε = (ε1,T , . . . , εT,T )T is the vector of residuals.

3. E STIMATION PROCEDURE 3·1. Estimation of the period θ0 Roughly speaking, the period θ0 is estimated as follows: to start with, we construct an estimator of the periodic sequence {m(t)}t∈Z for each candidate period θ with 1 ≤ θ ≤ ΘT . Here, the upper bound ΘT is allowed to grow with the sample size at a rate to be specified later on. Based on a penalized residual sum of squares criterion, we then compare the resulting estimators in terms of how well they fit the data. Finally, the true period θ0 is estimated by the period corresponding to the estimator with the best fit. More formally, for each candidate period θ, define the least squares estimate βˆθ as βˆθ = (XθT Xθ )−1 XθT Y, where Xθ = (Iθ , Iθ , . . .)T is the design matrix with Iθ denoting the θ × θ identity matrix. In addition, let the residual sum of squares for the model with period θ be given by RSS(θ) = kY − Xθ βˆθ k2 ,

Nonparametric estimation of a periodic sequence 5 PT where kxk = ( t=1 x2 )1/2 denotes the usual l2 -norm for vectors x ∈ RT . At first glance, it may appear to be a good idea to take the minimizer of the residual sum of squares RSS(θ) as an estimate of the period θ0 . However, this approach is too naive. In particular, it does not yield a consistent estimate of θ0 . The main reason is that each multiple of θ0 is a period of the sequence m as well. Thus, model (2) may be represented by using a multiple of θ0 parameters and a corresponding number of indicator functions. Intuitively, employing a larger number of regressors to explain the data yields a better fit, thus resulting in a smaller residual sum of squares than that obtained for the estimator based on the true period θ0 . This indicates that minimizing the residual sum of squares will usually overestimate the true period. In particular, it will notoriously tend to select multiples of θ0 rather than θ0 itself. To overcome this problem, we add a regularization term to the residual sum of squares which penalizes choosing large periods. In particular, we base our estimation procedure on the penalized residual sum of squares

165

170

175

Q(θ, λT ) = RSS(θ) + λT θ, where the regularization parameter λT diverges to infinity at an appropriate rate to be specified later on. Our estimator θˆ of the true period θ0 is now defined as the minimizer θˆ = arg min Q(θ, λT ). 1≤θ≤ΘT

In this definition, the upper bound ΘT may tend to infinity as the sample size T increases. From a practical point of view, this means that we allow the period to be fairly large compared to the number of observations. In Section 4·2, we discuss the exact rates at which ΘT is allowed to diverge. The regularization term λT θ can be regarded as an l0 -penalty: recalling the formulation (2) of our model, θ can be seen to equal the number of model parameters. In the literature, methods based on l0 -penalties have been employed to deal with model selection problems such as variable or lag selection in linear regression; see e.g. Hannan & Quinn (1979), Nishii (1984) or Claeskens & Hjort (2008) for an overview. Indeed, the issue of estimating the period θ0 can also be regarded as a model selection problem: for each candidate period θ, we have a model of the form (2) with a different set of regressors and model parameters. The aim is to pick the correct model amongst these. Similar to Sun, Hart & Genton (2012), we thus look at estimating the period θ0 from the perspective of model selection. Nevertheless, our selection method strongly differs from their cross-validation approach. Importantly, our l0 -penalized method is computationally not very costly, as we only have to calculate the criterion function Q(θ, λT ) for ΘT different choices of θ with ΘT being of much smaller order than the sample size T . This contrasts with various problems in high-dimensional statistics, where an l0 -penalty turns out to be computationally too burdensome. To obtain computationally feasible methods, convex regularizations have been employed in this context instead. In particular, the l1 -regularization and the corresponding lasso approach have become very popular in recent years. See the original lasso article by Tibshirani (1996) and B¨uhlmann & van de Geer (2011) for a comprehensive overview. When applying our penalized least squares procedure to estimate the period θ0 , we do not correct for the presence of a trend but ignore the trend function g. This is possible because g is ‘smoothed out’ by our estimation procedure: for a given candidate period θ, the least squares estimator of the periodic component at the time point s can essentially be written as a sample average of the observations at the time points t ∈ {1, . . . , T } with t = s + (k − 1)θ for some P k ∈ Z. The trend g shows up in averages of the form K −1 K k=1 g[{s + (k − 1)θ}/T ] in this

180

185

190

195

200

205

210

M. VOGT AND O. L INTON

estimator, where K is the number of time points t = s + (k − 1)θ in the sample. These averages R1 approximate the integral 0 g(u)du which is equal to zero by our normalization of g. Hence, they converge to zero and can effectively be neglected. In this sense, g gets smoothed or integrated out. 3·2. Estimation of the periodic component m ˆ Given the estimate θ of the true period θ0 , it is straightforward to come up with an estimator of the periodic sequence {m(t)}t∈Z . We simply define the estimator of the sequence values β as ˆ i.e., the least squares estimate βˆθˆ that corresponds to the estimated period θ, βˆθˆ = (XθˆT Xθˆ)−1 XθˆT Y.

215

220

225

230

235

The estimator m(t) ˆ of the sequence value m(t) at time point t is then defined by writing ˆ ˆ ˆ = m(s) βθˆ = (m(1), ˆ . . . , m( ˆ θ)) and letting m(s ˆ + k θ) ˆ for all s = 1, . . . , θˆ and all k. Hence, ˆ As in the previous estimation step, we by construction, m ˆ is a periodic sequence with period θ. completely ignore the trend function g when estimating the periodic sequence values. This is possible for exactly the same reasons as outlined in the previous subsection. In many applications, the periodic component may be expected to have a fairly smooth shape. For this reason, it may be useful to work with a smoothed version of the estimator m ˆ in practice. In particular, we may define PT ˆ t=1 Kh (t − s)m(t) , m ˆ smooth (s) = P T t=1 Kh (t − s) where h is the bandwidth, K is a kernel function and Kh (x) = K(x/h)/h. This estimator provides a smoothed out picture of the periodic component which is easier to interpret in many cases, in particular when the estimated period is large. However, even though smoothing is a useful tool in practice, it does not make much difference from a theoretical point of view. To see this, let K be a kernel with bounded support. For m ˆ smooth (s) to be a consistent estimate of m(s), the bandwidth must shrink to zero. For small values of the bandwidth, however, the kernel weight Kh (t − s) only contains the point s itself. As the sample size grows large, we thus obtain that m ˆ smooth (s) = m(s) ˆ at any time point s. 3·3. Estimation of the trend component g We finally tackle the problem of estimating the trend function g. Let us first consider an infeasible estimator of g. If the periodic component m were known, we could observe the variables Zt,T = Yt,T − m(t). In this case, the trend component g could be estimated from the equation t + εt,T (4) Zt,T = g T by standard procedures. One could for example use a local linear estimator defined by the minimization problem T n t o2 X t g˜(u) = argmin Zt,T − g0 − g1 −u Kh u − , (5) ∂˜ g (u)/∂u T T (g0 ,g1 )∈R2 t=1

where g˜(u) is the estimate of g at time point u and ∂˜ g (u)/∂u is the estimate of the first derivative of R g at u. As in the previous subsection, h denotes the bandwidth and K is a kernel function with K(v)dv = 1 and Kh (x) = K(x/h)/h.

Nonparametric estimation of a periodic sequence

Even though we do not observe the variables Zt,T , we can approximate them by Zˆt,T = Yt,T − m(t). ˆ This allows us to come up with a feasible estimator of the trend function g: simply replacing the variables Zt,T in (5) by the approximations Zˆt,T yields an estimator gˆ which can be computed from the data. Standard calculations show that gˆ(u) has the closed form solution PT ˆ t=1 wt,T (u)Zt,T gˆ(u) = P T t=1 wt,T (u) P with wt,T (u) = Kh (u − t/T )[ST,2 (u) − (t/T − u)ST,1 (u)] and ST,j (u) = Tt=1 Kh (u − t/T )(t/T − u)j for j = 1, 2. Alternatively to the above local linear estimator, we could have used a somewhat simpler Nadaraya–Watson smoother. However, as Nadaraya–Watson smoothing notoriously suffers from boundary problems, we work with a local linear estimator throughout.

4. A SYMPTOTICS 4·1. Assumptions To derive the asymptotic properties of our estimators, we impose the following conditions. (C1) The error process {εt,T } is strongly mixing with mixing coefficients α(k) satisfying α(k) ≤ Cak for some positive constants C and a < 1. (C2) It holds that E[|εt,T |4+δ ] ≤ C for some small δ > 0 and a positive constant C < ∞. (C3) The function g is twice continuously differentiable on [0, 1]. (C4) The kernel K is bounded, symmetric about zero and has compact support. Moreover, it is Lipschitz, i.e., there exists a positive constant L with |K(u) − K(v)| ≤ L|u − v|. We briefly give some remarks on the above conditions. Most importantly, we do not assume the error process {εt,T } to be stationary. We merely put some restrictions on its dependence structure. In particular, we assume the array {εt,T } to be strongly mixing. Note that we do not necessarily require exponentially decaying mixing rates, as assumed in (C1). These could be replaced by slower polynomial rates, at the cost of having somewhat stronger restrictions on the penalty parameter λT later on. To keep the notation and structure of the proofs as clear as possible, we stick to exponential mixing rates throughout. The smoothness condition (C3) is only needed for the third estimation step, i.e., for establishing the asymptotic properties of the trend g. If we restrict attention to the first two steps of our procedure, i.e., to estimating the periodic model component, it suffices to assume that g is of bounded variation. 4·2. Asymptotics for the period estimator θˆ ˆ To formulate the The next theorem characterizes the asymptotic behaviour of the estimator θ. result in a neat way, we introduce the following notation: for any two sequences {vT } and {wT } of positive numbers, we write vT wT to mean that vT = o(wT ).

240

245

250

255

260

265

270

T HEOREM 1. Let (C1)–(C3) be fulfilled and assume that ΘT ≤ CT 2/5−δ for some small δ > 0 and a finite constant C. Moreover, choose the regularization parameter λT to satisfy 3/2 (log T )ΘT λT T . Then θˆ = θ0 + op (1). The theorem shows that we get consistency under rather general conditions on the upper bound ΘT . In particular, ΘT is allowed to grow at a rate of almost T 2/5 . Clearly, the faster ΘT goes off to infinity, the stronger restrictions have to be imposed on the regularization parameter λT . If ΘT

275

M. VOGT AND O. L INTON

is a fixed number, then it suffices to choose λT of slightly larger order than log T . This contrasts with an order of almost T 3/5 if ΘT diverges at the highest possible rate. 280

4·3. Asymptotics for the estimator m ˆ We now provide the convergence rate and the limiting distribution of the estimator m ˆ of the periodic model component. To simplify notation, define Vt0 ,T

Kt0 ,T θ02 X = cov(εt0 +(k−1)θ0 ,T , εt0 +(k0 −1)θ0 ,T ) T 0 k,k =1

for each time point t with t0 = t − θ0 bt/θ0 c and Kt0 ,T = 1 + b(T − t0 )/θ0 c. T HEOREM 2. Let the conditions of Theorem 1 be satisfied. Then it holds that

ˆ − m(t) = Op T −1/2 . max m(t) 1≤t≤T

285

In addition, assume that the limit Vt0 = limT →∞ Vt0 ,T exists. Then for each time point t = 1, . . . , T , T 1/2 m(t) ˆ − m(t) → N (0, Vt0 ) in distribution.

290

295

The limit Vt0 exists under quite general assumptions on the error process {εt,T }. If the latter P is stationary, then Vt0 simplifies to Vt0 = θ0 ∞ cov(ε 0,T , εkθ0 ,T ) and can be estimated by k=−∞ classical methods discussed in Hannan (1957). Estimating the long-run variance Vt0 in a more general setting which allows for nonstationarities in the errors is studied in de Jong & Davidson (2000) among others. Inspecting the proof of Theorem 2, one can see that the estimator m ˆ has the same limiting distribution as the estimator in the oracle case where the true period θ0 is known. In particular, it has the same asymptotic variance Vt0 . Hence, the error of estimating the period θ0 does not become visible in the limiting distribution of m. ˆ 4·4. Asymptotics for the estimator gˆ We finally derive the asymptotic properties of the local linear smoother gˆ. To do so, define Vu,T

T h X s t = Kh u − Kh u − E[εs,T εt,T ]. T T T s,t=1

The next theorem specifies the uniform convergence rate and the asymptotic distribution of the smoother gˆ. 300

305

T HEOREM 3. Suppose that the conditions of Theorem 1 are satisfied and that the kernel K fulfills (C4). (i) If the bandwidth h shrinks to zero and fulfills T 1/2−δ h → ∞ for some small δ > 0, then it holds that log T 1/2

sup gˆ(u) − g(u) = Op + h2 . Th u∈[0,1] (ii) Consider a fixed point u ∈ (0, 1) and assume that the limit Vu = limT →∞ Vu,T exists. Moreover, let T h5 → ch for some constant ch ≥ 0. Then it holds that (T h)1/2 gˆ(u) − g(u) − h2 Bu → N (0, Vu ) R in distribution with Bu = (1/2){ v 2 K(v)dv}g 00 (u).

Nonparametric estimation of a periodic sequence

Similarly to Theorem 2, the limit Vu exists under rather general conditions on the process R {εt,T }. IfPthe process is stationary, then the asymptotic variance Vu simplifies to Vu = { K 2 (v)dv} ∞ l=−∞ cov(ε0,T , εl,T ). For methods to estimate Vu , we again refer to Hannan (1957) and de Jong & Davidson (2000). Inspecting the proof of Theorem 3, one can see that the smoother gˆ asymptotically behaves in the same way as the oracle estimator g˜ which is constructed under the assumption that the periodic component m is known. In particular, replacing gˆ by g˜ results in an error of the order Op (T −1/2 ) uniformly over u and h. As a consequence, gˆ has the same limiting distribution as g˜. Thus, the need to estimate the periodic sequence m is not reflected in the limit law of gˆ. As the difference between gˆ and the standard smoother g˜ is of the asymptotically negligible order Op (T −1/2 ), the bandwidth of gˆ can be selected by the same techniques as used for the smoother g˜. In particular, standard methods like cross-validation or plug-in rules can be employed. However, these techniques may perform very poorly when the errors are correlated. To achieve reasonable results, they have to be adjusted as shown for example in Hart (1991).

5. S ELECTING THE REGULARIZATION PARAMETER λT As shown in Theorem 1, our procedure to estimate the period θ0 is asymptotically valid for all sequences of regularization parameters λT within a fairly wide range of rates. Hence, from an asymptotic perspective, we have a lot of freedom to choose the regularization parameter. In finite samples, a totally different picture arises. There, different choices of λT may result in completely different estimates of the period θ0 . Selecting the regularization parameter λT in an appropriate way is thus a crucial issue in small samples. In what follows, we provide a heuristic discussion how to choose λT in a suitable way. The argument is similar to that for deriving the classical Final Prediction Error Criterion of Akaike (1969). To make the argument as clear as possible, we consider a simplified version of model (1). In particular, we analyze the setting Yt = m(t) + εt , where the errors εt are assumed to be independent and identically distributed with E[ε2t ] = σ 2 . We thus drop the trend component from the model and assume that there is no serial dependence in the error terms. As can be seen from the proof of Theorem 1, the main role of the penalty term λT θ is to avoid selecting multiples of the true period θ0 . For this reason, we focus attention on periods θ which are multiples of θ0 , i.e., θ = rθ0 for some r. It is not difficult to show that E RSS(rθ0 ) + σ 2 rθ0 = E RSS(θ0 ) + σ 2 θ0 . (6) For completeness, the proof is provided in the Supplementary Material. Formula (6) suggests selecting the penalty parameter λT larger than σ 2 in order to avoid choosing multiples of the true period θ0 rather than θ0 itself. On the other hand, λT should not be picked too large. Otherwise we add a strong penalty to the residual sum of squares RSS(θ0 ) of the true period θ0 , thus making the criterion function at θ0 rather large, in particular larger than the criterion function at 1. As a result, our procedure would yield the estimate θˆ = 1, i.e., it would suggest a model without a periodic component. To sum up, the above heuristics suggest to select the penalty parameter λT slightly larger than 2 σ . In particular, we propose to choose it as λT = σ 2 κT

310

315

320

325

330

335

340

(7)

with some sequence {κT } that slowly diverges to infinity. More specifically, {κT } should grow slightly faster than {log T } to meet the conditions of the asymptotic theory from Theorem 1.

345

350

355

360

365

370

375

380

385

M. VOGT AND O. L INTON

As the error variance σ 2 is unknown in general, we cannot take the formula (7) at face value but must replace σ 2 with an estimator. This can be achieved as follows: to start with, define θˇ = min1≤θ≤ΘT RSS(θ). As already noted in Subsection 3·1, minimizing the residual sum of squares without a penalty does not yield a consistent estimate of θ0 . Inspecting the proof of Theorem 1, it can however be seen that pr(θˇ = kθ0 for some k ∈ N) → 1 as T → ∞. Hence, with probability approaching one, θˇ is equal to a multiple of the period θ0 . Since multiples of θ0 are periods of m, the least squares estimate βˆθˇ can be used as a preliminary estimator of the periodic sequence values. Let us denote the resulting estimator of m(t) at time point t by m(t). ˇ Given this estimator, we can repeat the third step of our procedure to obtain an estimator gˇ of the trend function g. Finally, subtracting the estimates m(t) ˇ and gˇ(t/T ) from the observations Yt yields approximations ε ˇ of the residuals ε . These can be used to construct the standard-type t PT t 2 2 −1 2 estimator σ ˇ =T ˇt of the error variance σ . t=1 ε So far, we have restricted attention to the case of independent error terms. Repeating our heuristic argument with serially correlated errors, the variance σ 2 in (6) is replaced by some type of long-run variance that incorporates covariance terms of the errors. Our selection rule for the penalty parameter λT does not take into account this effect of the dependence structure at all. Nevertheless, this does not mean that it becomes useless when the error terms are correlated. As long as the correlation is not too strong, σ 2 will be the dominant term in the long-run variance. Hence, our heuristic rule should still yield an appropriate penalty parameter λT . This consideration is confirmed by our simulations in the next section, where the error terms are assumed to follow an AR(1) process.

6. S IMULATION In this section, we examine the finite sample behaviour of our procedure in a Monte Carlo experiment. To do so, we simulate the model (1) with a periodic sequence of the form 2π 3π t+ m(t) = sin θ0 2 and a period θ0 = 60. The trend function g is given by g(u) = 2u2 . The error terms εt of the simulated model are drawn from the AR(1) process εt = 0.45εt−1 + ηt , where ηt are independent and identically distributed variables following a normal distribution with mean zero and variance ση2 . We will choose different values for ση2 later on, thus altering the signal-to-noise ratio in the model. The simulation setup is chosen to mimic the situation in the real data example investigated in Section 7. We simulate the model N = 1000 times for three different sample sizes T = 160, 250, 500 and three different values of the residual variance ση2 = 0.2, 0.4, 0.6. The sample size T = 160 corresponds to the situation in the application later on where we have 162 data points. The values of ση2 translate into an error variance σ 2 = E[ε2t ] of approximately 0.25, 0.5, and 0.75, respectively. To P get a rough idea of the noise level in our setup, we consider the ratio P ε2 /Y 2 = ( Tt=1 ε2t )/( Tt=1 Yt2 ), which gives the fraction of variation in the data that is due to the variation in the error terms. More exactly, we report the values of the ratio εˆ2 /Y 2 with εˆt being the estimated residuals. This makes it easier to compare the noise level in the simulations to that in the real data example later on. For σ 2 = 0.25, 0.5, 0.75, we obtain εˆ2 /Y 2 ≈ 0.12, 0.2, 0.26. These numbers are a bit higher than the value 0.07 obtained in the real data example, indicating that the noise level is somewhat higher in the simulations. The regularization parameter is chosen as λT = σ ˇ 2 κT with κT = log T . Here, σ ˇ 2 is the esti2 mator of the error variance σ introduced in Section 5. We thus pick λT according to the heuris-

Nonparametric estimation of a periodic sequence

tic idea described there. From a theoretical perspective, we should have chosen κT to diverge slightly faster than log T . However, as the rate of κT may become arbitrarily close to log T , we neglect this technicality and simply choose κT to equal log T . In our simulation exercise, we focus on the estimation of the period θ0 . This is the crucial step in our estimation scheme as the finite sample behaviour of the estimators m ˆ and gˆ hinges on how well θˆ approximates the true period θ0 . If the period θ0 is known, m ˆ simplifies to a standard least squares estimator. Moreover, if the periodic model component m as a whole is observed, then gˆ turns into an ordinary local linear smoother. The finite sample properties of these estimators have been extensively studied and are well-known. Given a good approximation of θ0 , our estimators m ˆ and gˆ can be expected to perform similarly to these standard estimators. For this reason, we concentrate on the properties of θˆ in what follows.

390

395

400

Table 1: Empirical probabilities that θˆ = 60 (first three columns) and that 55 ≤ θˆ ≤ 65 (last three columns) for different choices of T and σ 2 pr(θˆ = 60) T = 250

T = 500

T = 160

T = 250

T = 500

0.102 0.089 0.087

0.247 0.178 0.143

0.587 0.539 0.472

0.994 0.942 0.854

0.932 0.979 0.950

1.000 1.000 1.000

60 Estimated period θ^

120

150 0

100

150

Number of simulations

T = 250, σ2 = 0.75

100

200

Number of simulations

T = 250, σ2 = 0.5

Number of simulations

T = 250, σ2 = 0.25

100

σ = 0.25 σ 2 = 0.5 σ 2 = 0.75

T = 160

pr(55 ≤ θˆ ≤ 65)

60 Estimated period θ^

120

60 Estimated period θ^

120

Fig. 1: Histograms of the simulation results for T = 250 and different choices of σ 2 . The bars give the number of simulations (out of a total of 1000) in which a certain value θˆ is obtained.

The simulation results are presented in Table 1. For each choice of the sample size T and the error variance σ 2 , we have performed 1000 simulations, where periods θ with 1 ≤ θ ≤ T /2 have been taken into account for the estimation. The first three columns of the table give the probabilities with which the estimator θˆ hits the true value θ0 = 60; the last three columns are the probabilities of θˆ taking values between 55 and 65. Overall, Table 1 suggests that the estimator θˆ performs well in small samples. Even at a sample size of T = 160 where we only observe a bit less than three full cycles of the periodic component, the estimates strongly cluster around the true period θ0 . Clearly, at this small sample size, the estimator θˆ does not exactly hit the true period in many cases. Nevertheless, it gives a reasonable approximation to it most of the time. The performance of the estimator quickly improves as we observe more and more cycles of the periodic component. Moving to a sample size of T = 500, it already hits the true value θ0 in around 50–60% of the simulations and always takes values between 55 and 65. A graphical presentation of the simulation results for the sample size T = 250 is given in Figure 1. Each panel shows the distribution of θˆ for a specific choice of σ 2 . The figure makes

405

410

M. VOGT AND O. L INTON

420

visible some additional features of the simulation results: (a) In addition to the main cluster of estimates around the true period θ0 , smaller clusters can be found around multiples of θ0 . As can be seen from the proof of Theorem 1, this behaviour of θˆ is suggested by the asymptotic theory. (b) For σ 2 = 0.75, θˆ is equal to 1 in a non-negligible number of simulations. This is a finite sample effect which vanishes as the sample size increases. As will become clear below, this effect has to do with the choice of the penalty parameter λT . In particular, we could considerably lower the number of simulations with θˆ = 1 by decreasing the penalty λT slightly. (b)

Q(θ, λT,c)

1400

550

1800

(c)

600

350

1000

Q(θ, λT,b)

700 600 500

Q(θ, λT,a)

800

(a)

100

150 θ

200

250

450

415

100

150 θ

200

250

100

150

200

250

Fig. 2: Plot of the criterion function for a typical simulation with T = 500, σ 2 = 0.5 and three choices of λT . In particular, λT is given by λT,a = σ ˇ 2 log T , λT,b = 4λT,a and λT,c = λT,a /4 in the three panels.

425

430

435

440

445

In what follows, we have a closer look at what happens when the regularization parameter λT is varied. Since the effect of varying λT is better visible for larger sample sizes, we consider a situation with T = 500. Figure 2 presents the criterion function Q(θ, λT ) for a typical simulation with T = 500, σ 2 = 0.5 and three different choices of λT . In panel (a), we have chosen the regularization parameter as before, i.e., λT,a = σ ˇ 2 log T . In panel (b), we pick it a bit larger, λT,b = 4λT,a , and in panel (c), we choose it somewhat smaller, λT,c = λT,a /4. As can be seen from the plots, the main features of the criterion function are the downward spikes around the true period θ0 and multiples thereof. The parameter λT influences the overall upward or downward movement of the criterion function, because the penalty λT θ is linear in θ with slope parameter λT . If λT is too large, then the criterion function increases too quickly, and the global minimum does not lie at the first downward spike around θ0 but at θ = 1; see panel (b). If λT is chosen too small, then the criterion function decreases, taking its global minimum not at the first downward spike but at a subsequent one; see panel (c). Our heuristic rule for selecting λT can be regarded as a guideline to choose the right order of magnitude for the penalty term. Nevertheless, we may still pick λT a bit too large or small, thus ending up in a similar situation as in panels (b) or (c). When applying our method to real data, it is thus important to examine the criterion function. If it exhibits large downward spikes at a certain value and at multiples thereof, this is strong evidence for there being a periodic component in the data. In particular, the true period should lie in the region of the first downward spike. If our procedure yields a completely different estimate of the period, one should treat this result with caution; it may be due to an inappropriate choice of the penalty parameter.

7. A PPLICATION Global mean temperature records over the last 150 years suggest that there has been a significant upward trend in the temperatures; cp. Bloomfield (1992) or Hansen et al. (2002)

Nonparametric estimation of a periodic sequence

12.0 11.5 11.0 10.5

Value of criterion function

0.4 0.2 0.0 −0.2

450

−0.6

Temperature anomalies

among others. This global warming trend is also visible in the time series presented in the left-hand panel of Figure 3. The depicted data are yearly global temperature anomalies from 1850 to 2011. By anomalies we mean the departure of the temperature from some reference value or a long-term average. In particular, the data at hand are temperature deviations from the average 1961–1990 measured in Celsius degree. The data set is called HadCRUT3 and can be obtained from the Climatic Research Unit of the University of East Anglia, England (www.cru.uea.ac.uk/cru/data/temperature). A detailed description of the data is given in Brohan et al. (2006).

1850

1900

1950

2000

Years

Fig. 3: The left-hand panel shows yearly global temperature anomalies from 1850 to 2011 (measured in ◦ C), the right-hand panel is a plot of the criterion function Q(θ, λT ).

The issue of global warming has received considerable attention over the last decades. From a statistical point of view, the challenge is to come up with methods to reliably estimate the warming trend. Providing such methods is complicated by the fact that the global mean temperatures may not only contain a trend but also a long-run oscillatory component. Various research articles in climatology suggest that the global temperature system possesses an oscillation with a period in the region between 60 and 70 years; see Schlesinger & Ramankutty (1994), Delworth & Mann (2000) and Mazzarella (2007) among others. The presence of such a periodic component obviously creates problems when estimating the trend function. In particular, an estimation procedure is required that is able to accurately separate the periodic and the trend component. Otherwise, an inaccurate picture of the global warming trend emerges. Moreover, a precise estimate of both components is required to reliably predict future temperature changes. In what follows, we apply our three-step procedure to the temperature anomalies from Figure 3. We thus fit the model t + m(t) + εt,T Yt,T = g T with E[εt,T ] = 0 to the sample of global anomaly data {Yt,T } and estimate the unknown period θ0 , the values of the periodic sequence {m(t)}t∈Z , and the nonparametric trend function g. To estimate the period θ0 , we employ our penalized least squares method with the penalty parameter λT = σ ˇ 2 log T . As in the simulations, σ ˇ 2 is an estimate of the error variance which is constructed as described in Section 5. Selecting the penalty parameter in this way, the criterion function Q(θ, λT ) is minimized at θˆ = 60. We thus detect an oscillation in the temperature

455

460

465

470

M. VOGT AND O. L INTON

0.2 0.0

● ●● ●● ●

●●

●● ● ● ●

●

● ●● ●●

●● ●

● ● ●

●●

●● ● ● ● ● ●

0.0

−0.2

0.2

● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ●

^ m −0.4

0.4

●

● ●

●●

● ●

−0.2

0.1

●

^ g

475

data with a period in the same region as in the climatological studies cited above. The criterion function Q(θ, λT ) is plotted in the right-hand panel of Figure 3. Its most dominant feature is the enormous downward spike with a minimum at 60 years. As discussed in the simulations, this kind of spike is characteristic for the presence of a periodic component in the data. The spike being very pronounced, the shape of the criterion function provides strong evidence for there being an oscillation in the region of 60 years.

●

1850

1900

1950

2000

Years

Fig. 4: The left-hand panel shows the estimator m ˆ of the periodic component. The solid line in the right-hand panel depicts the estimator gˆ of the trend function, the dashed lines are pointwise 95% confidence bands.

0.6 0.4

0.2

0.1

0.0

−0.2

−0.1 −0.2

485

We next turn to the estimation of the periodic component m and the trend function g. The estimator m ˆ is presented in the left-hand panel of Figure 4 over a full cycle of 60 years. The solid curve in the right-hand panel shows the local linear smoother gˆ, the dashed lines are the corresponding 95% pointwise confidence bands. For the estimation, we have used an Epanechnikov kernel and have chosen the bandwidth to equal h = 0.15. To check the robustness of our results, we have additionally repeated the analysis for various choices of the bandwidth. As the results are fairly stable, we only report the findings for the bandwidth h = 0.15.

Estimated residuals

480

1850

1900

1950

2000

Fig. 5: Time series of the estimated residuals (left panel) and its sample autocorrelation function (right panel). The dashed lines show the Bartlett bounds ±1.96T −1/2 .

Nonparametric estimation of a periodic sequence

Figure 5 depicts the time series of the estimated residuals εˆt,T = Yt,T − gˆ(t/T ) − m(t) ˆ together with its sample autocorrelation function. The residuals do not exhibit a strong periodic or trending behaviour. This suggests that our procedure has done a good job in extracting the trend and periodic component from the data. Moreover, inspecting the sample autocorrelations, the residuals do not appear to be strongly dependent over time. The sample autocorrelation at the first lag has the value 0.45 and equals the parameter estimate obtained from fitting an AR(1) process to the residuals. This value was used as a guideline in the design of the error terms in the simulations.

8. C OMPARISON WITH UNOBSERVED COMPONENTS MODELS In this section, we analyze the temperature anomaly data by means of an unobserved components model and compare the empirical findings with those obtained by our method. We fit the following version of the unobserved components model to the data:

490

495

Yt = µt + ψt + εt , where Yt are the observed data points, µt is the trend, ψt denotes the cyclical component and εt is the error term. The trend component is modelled by the equations µt = µt−1 + βt−1 + ηt βt = βt−1 + ζt ,

500

where ηt and ζt are independent white noise disturbances with variances ση2 and σζ2 , respectively. The trend is thus a random walk with a drift βt . The dynamics of the cyclical component are described by ψt cos λ sin λ ψt−1 κ =ρ + ∗t , ∗ ψt∗ − sin λ cos λ ψt−1 κt where κt and κ∗t are independent white noise disturbances with a common variance σκ2 and ψt∗ is an auxiliary variable appearing by construction. The parameter ρ is a dampening factor satisfying 0 ≤ ρ ≤ 1 and λ is the frequency of the cycle. Most importantly, the parameter ϑ = 2π/λ is the counterpart of the period θ0 in our model. The amount of smoothness in the trend and the cyclical component depends on the values of the variances ση2 , σζ2 and σκ2 , which are commonly called the hyperparameters of the model. To obtain a rather smooth estimate of the trend component, we work with an integrated random walk trend, i.e., we set the variance ση2 of the disturbances ηt in the stochastic trend specification to zero. The hyperparameters and the model components can be estimated by maximum likelihood methods once the model has been brought into state space form; see Harvey (1989) or Durbin & Koopman (2001) for details. Figure 6 compares the estimation results of our method with those obtained from the unobserved components approach. The latter have been produced by the STAMP software of Koopman, Harvey, Doornik & Shephard. The solid line is the sum of the estimated trend and periodic component in our model, the dashed line is the corresponding estimate in the unobserved components framework. In order to make it easier to compare the curves visually, we have applied a small amount of smoothing to our estimator of the periodic component as described at the end of Subsection 3·2. Specifically, we have used an Epanechnikov kernel along with a bandwidth of 5.5. As can be seen, the fits are fairly similar and indeed become almost indiscernible if we make our estimator of the periodic component even smoother.

505

510

515

520

M. VOGT AND O. L INTON

0.4 0.2 0.0 −0.2 −0.6

Temperature anomalies

1850

1900

1950

2000

Fig. 6: Data fits produced by our method and the unobserved components approach. The solid line is the sum of the estimated trend and periodic component in our model, the dashed line is the corresponding estimate in the unobserved components model. The grey line in the background shows the time series of temperature anomalies.

525

0.010 −0.010

0.000

Slope

0.020

530

Even though the two methods provide similar fits to the data, they produce two very different decompositions into a trend and a periodic component. Using our method, we find a period of 60 years and a great deal of the fluctuations in the data is assigned to the periodic component. As noted in the previous section, this is in accordance with a variety of results from the climatology literature. When fitting an unobserved components model to the data in contrast, everything gets absorbed in the trend component and the cycle is effectively dropped from the model. Thus, the estimated model only contains a trend which is depicted by the dashed line in Figure 6.

1850

1900

1950

2000

Fig. 7: Time series of the estimated slope parameters βt in the unobserved components model.

535

Although the fitted unobserved components model does not involve a cycle, the periodic character of the data remains visible on a more hidden level in the trend component. Figure 7 plots the time series of the estimated slope parameters βt . As can be seen, they exhibit a fairly cyclical behaviour with roughly three cycles over the observed time span. Thus, an oscillation of 60-70 years appears to be present in the slope coefficient. A similar observation has been made by Harvey in the context of US GDP data; see Section 2.7 in Harvey (2006). There, some long-term upward and downward swings of the economy are captured by the trend component rather than the cycle and are also visible in the cyclical shape of the slope coefficients.

Nonparametric estimation of a periodic sequence

ACKNOWLEDGEMENT We thank Andrew Harvey for his advice on the empirical comparison of our method with unobserved components models, and an associate editor and two referees for their constructive comments and suggestions. Financial support by the ERC is gratefully acknowledged. S UPPLEMENTARY MATERIAL In the supplement, we provide the proofs and technical details that are omitted in the paper. In addition, we discuss some extensions of our approach and continue the empirical comparison with unobserved components models from Section 8. A PPENDIX In this appendix, we prove the main theoretical results of the paper. We use the following notation: the symbol C denotes a generic constant which may take a different value on each occurrence. We implicitly suppose that C does not depend on any model parameters, in particular it is independent of the candidate period θ and the sample size T . Moreover, we let

540

545

550

Πθ = Xθ (XθT Xθ )−1 XθT be the projection matrix onto the subspace {Xθ b : b ∈ Rθ }. As the design matrix Xθ is orthogonal, Πθ has a rather simple structure. In particular,     [θ] Dθ Dθ . . . 0 1/K1,T     .. .. ,  , Dθ =  . D (A1) Πθ = Xθ Dθ XθT =  θ .     .. [θ] 0 1/Kθ,T . [θ]

[θ]

where Ks,T = 1 + b(T − s)/θc for s = 1, . . . , θ. Ks,T is the number of time points t in the sample that satisfy t = s + (k − 1)θ for some k ∈ N. It equals either bT /θc or bT /θc + 1; in particular it holds [θ] that Ks,T = O(T /θ). Finally, rewriting the residual sum of squares in terms of Πθ yields RSS(θ) = T Y (I − Πθ )Y . We first state an auxiliary lemma which is repeatedly used in the proofs later on.

555

L EMMA A1. Let θ be any natural number with 1 ≤ θ ≤ ΘT and let s ∈ {1, . . . , θ}. Then [θ]

s,T

1 KX

n s + (k − 1)θ o Z 1 C

g − g(u)du ≤ [θ]

[θ] T 0 Ks,T k=1 Ks,T

for some constant C that is independent of s, θ, and T .

560

The proof is straightforward and thus omitted. We next provide a result on the identification of the model components g and m. L EMMA A2. The sequence m and the function g in model (1) are uniquely identified if g is normalized R1 R1 to satisfy 0 g(u)du = 0. More precisely, let g¯ be a smooth trend function with 0 g¯(u)du = 0 and m ¯ a periodic sequence with (smallest) period θ¯0 . If t t + m(t) ¯ =g + m(t) g¯ T T for all t = 1, . . . , T and all T = 1, 2, . . ., then g¯ = g and m ¯ = m with θ¯0 = θ0 . The proof can be found in the Supplementary Material. We now turn to the proofs of the main theorems.

565

M. VOGT AND O. L INTON

Proof of Theorem 1. Our arguments are based on the inequality X pr Q(θ, λT ) ≤ Q(θ0 , λT ) . pr(θˆ 6= θ0 ) ≤

(A2)

1≤θ≤ΘT θ6=θ0

570

In the sequel, we show that the right-hand side of (A2) converges to zero as T grows large. This can be achieved by bounding the probabilities pr{Q(θ, λT ) ≤ Q(θ0 , λT )} for each fixed θ 6= θ0 in an appropriate way. To do so, write pr Q(θ, λT ) ≤ Q(θ0 , λT ) = pr Vθ ≤ −Bθ − 2Sθε − 2Sθg + 2Wθε + Wθg + λT (θ0 − θ) (A3) with

575

580

585

590

Vθ = εT (Πθ0 − Πθ )ε, Sθε = εT (I − Πθ )Xθ0 β, Wθε = εT (Πθ − Πθ0 )g,

Bθ = (Xθ0 β)T (I − Πθ )(Xθ0 β), Sθg = g T (I − Πθ )Xθ0 β, Wθg = g T (Πθ − Πθ0 )g.

We now proceed in two steps. In the first, we analyze the asymptotic behaviour of the terms Vθ , Bθ , Sθε , Sθg , Wθε and Wθg one after the other. In the second, we combine the results on the various terms to obtain an appropriate bound on the probabilities pr{Q(θ, λT ) ≤ Q(θ0 , λT )}. The overall proof strategy outlined above is similar to that known from other problems based on l0 penalties such as variable or lag selection in a linear regression model; see for example the proofs in Nishii (1984), Zhang (1992) or Zheng (1995). Nevertheless, the specific arguments of our proof are rather different. The main reasons are as follows: first of all, we have to accommodate terms that result from incorporating a nonparametric trend function in the model. More importantly, the models corresponding to different candidate periods in our framework are not nested. Even worse, there are no two models that have a regressor in common: for any pair of models, the two sets of regressors, i.e., the two sets of indicator functions are disjoint. The problem of selecting the true period θ0 is thus rather different from the problem of selecting the true subset of variables in a regression model. In what follows, we give a brief summary of the two main steps of the proof. The technical details, in particular the proofs of Lemmas A3 and A4, can be found in the Supplementary Material. To examine the asymptotic behaviour of the various terms in the first step, we distinguish between two cases: Case A: θ 6= θ0 and θ is not a multiple of θ0 . Case B: θ = 6 θ0 and θ is a multiple of θ0 .

595

600

We first consider the terms Bθ , Sθε , Sθg , Wθε and Wθg . The following lemma characterizes their asymptotic properties, in particular their tail behaviour for large sample sizes. To formulate it, we let {νT } be an arbitrary sequence of positive numbers which diverges to infinity and c > 0 a fixed constant which is sufficiently small. Moreover, n = n(θ) is a natural number with n ≤ θ× , where θ× denotes the least common multiple of θ and θ0 . More specifically, n = #S, where S is the subset of indices s ∈ {1, . . . , θ× } Pθ0 for which ζs = m(s) − θ0−1 k=1 m{(k − 1)θ + sθ } 6= 0. The motivation behind this fairly technical definition will become clearer in the proof. Whereas n varies with the period θ, the constants c, C, and T0 in the lemma depend neither on θ nor on the sample size T . L EMMA A3. There exists a natural number T0 such that for all T ≥ T0 , we have the following results: 1/2 Case A: Bθ ≥ c nT pr |Sθε | > νT nT ≤ CνT−2 , |Sθg | ≤ Cn, θ , θ Case B: Bθ = 0, Sθε = 0, Sθg = 0. Moreover, |Wθg | ≤ C and pr(|Wθε | > νT ) ≤ CνT−2 in both Cases A and B.

605

For the reasons mentioned above, we cannot simply appeal to standard results from variable selection in linear regression models to derive Lemma A3. However, we have the advantage of knowing the explicit structure of the projection matrix Πθ . Our proof strategy heavily draws on exploiting this specific structure.

Nonparametric estimation of a periodic sequence We finally note that the expression Vθ can be written as Vθ = Vθ,1 − Vθ,2 with

Vθ,1 =

[θ0 ] ,T

[θ]

T X

l=1

Klθ 0,T

θ0 X

[θ ] 0

ε(k−1)θ0 +lθ0 ,T εl,T ,

Vθ,2 =

k=1

T X

l=1

Klθ ,T

θ ,T X

[θ]

ε(k−1)θ+lθ ,T εl,T

k=1

and lθ = l − θbl/θc. The structure of Vθ is thus similar to that of a U-statistic. With the help of the above remarks on the terms Vθ , Bθ , Sθε , Sθg , Wθε and Wθg , we can now bound the probabilities pr{Q(θ, λT ) ≤ Q(θ0 , λT )}. Specifically, we can derive the following result.

610

L EMMA A4. There exists a natural number T0 such that for all T ≥ T0 and for all θ 6= θ0 with 1 ≤ θ ≤ ΘT , pr Q(θ, λT ) ≤ Q(θ0 , λT ) ≤ C(ρT ΘT )−1 , (A4) where {ρT } is a sequence of positive numbers that slowly diverges to infinity (e.g. ρT = log log T ). Plugging the bound (A4) into (A2), it immediately follows that pr(θˆ 6= θ0 ) = o(1), which in turn yields that θˆ = θ0 + op (1), thus completing the proof of Theorem 1. As already noted, the proof of Lemma A4 is given in the Supplementary Material. Here, we are content with sketching its idea. Using (A3) together with the tail properties summarized in Lemma A3, we can show that pr Q(θ, λT ) ≤ Q(θ0 , λT ) ≤ pr Vθ ≤ −CrT + C(ρT ΘT )−1 with a certain sequence {rT } that diverges to infinity as T grows large. Thus, the tail behaviour of the terms Bθ , Sθε , Sθg , Wθg and Wθε allows us to replace them by the deterministic sequence {−CrT }, introduc ing an error term of the order (ρT ΘT )−1 . It now remains to bound the tail probability pr Vθ ≤ −CrT . To do so, we exploit the U-statistic-like structure of Vθ . In particular, we first apply Chebychev’s inequality and then use a covariance inequality for mixing variables to bound the resulting sum of higher moments. This completes the proof of Lemma A4. Proof of Theorem 2. The overall idea of the proof is as follows: we first compare m ˆ with an oracle estimator of m that ‘knows’ the true period θ0 and show that the difference between these two estimators is asymptotically negligible. This allows us to replace m ˆ with the oracle estimator whose asymptotic properties can be derived by standard arguments. The technical details are given in the Supplementary Material. Proof of Theorem 3. Similarly to the proof of Theorem 2, we replace gˆ with an oracle estimator and then derive the properties of the latter. The details are again deferred to the Supplementary Material.

615

620

625

630

R EFERENCES A KAIKE , H. (1969). Fitting autoregressive models for prediction. Annals of the Institute of Statistical Mathematics 21, 243–247. ATAK , A., L INTON , O. & X IAO , Z. (2011). A semiparametric panel model for unbalanced data with application to climate change in the United Kingdom. Journal of Econometrics 164, 92–115. B LOOMFIELD , P. (1992). Trends in global temperature. Climatic Change 21, 1–16. B ROCKWELL , P. J. & DAVIS , R. A. (1991). Time Series: Theory and Methods. Berlin: Springer, 2nd ed. B ROHAN , P., K ENNEDY, J. J., H ARRIS , I., T ETT, S. F. B. & J ONES , P. D. (2006). Uncertainty estimates in regional and global observed temperature changes: A new data set from 1850. Journal of Geophysical Research 111. ¨ B UHLMANN , P. & VAN DE G EER , S. (2011). Statistics for High-Dimensional Data. Berlin: Springer. C LAESKENS , G. & H JORT, N. L. (2008). Model Selection and Model Averaging. Cambridge: Cambridge University Press. DAHLHAUS , R. (1997). Fitting time series models to nonstationary processes. Annals of Statistics 25, 1–37. D ELWORTH , T. L. & M ANN , M. E. (2000). Observed and simulated multidecadal variability in the northern hemisphere. Climate Dynamics 16, 661–676. D URBIN , J. & KOOPMAN , S. J. (2001). Time Series Analysis by State Space Methods. Oxford: Oxford University Press.

635

640

645

20 650

655

660

665

670

675

680

685

690

M. VOGT AND O. L INTON

FAN , J. & YAO , Q. (2005). Nonlinear Time Series. Berlin: Springer. G ASSIAT, E. & L E´ VY-L EDUC , C. (2006). Efficient semiparametric estimation of the periods in a superposition of periodic functions with unknown shape. Journal of Time Series Analysis 27, 877–910. G ENTON , M. G. & H ALL , P. (2007). Statistical inference for evolving periodic functions. Journal of the Royal Statistical Society B 69, 643–657. H ALL , P., R EIMANN , J. & R ICE , J. (2000). Nonparametric estimation of a periodic function. Biometrika 87, 545–557. H ALL , P. & Y IN , J. (2003). Nonparametric methods for deconvolving multiperiodic functions. Journal of the Royal Statistical Society B 65, 869–886. H ALL , P. & L I , M. (2006). Using the periodogram to estimate period in nonparametric regression. Biometrika 93, 411–424. H ANNAN , E. J. (1957). The variance of the mean of a stationary process. Journal of the Royal Statistical Society B 19, 282–285. H ANNAN , E. J. & Q UINN , B. G. (1979). The determination of the order of an autoregression. Journal of the Royal Statistical Society B 41, 190–195. H ANSEN , J., RUEDY, R., S ATO , M. & L O , K. (2002). Global warming continues. Science 295, 275. H ART, J. D. (1991). Kernel regression estimation with time series errors. Journal of the Statistical Royal Society B 53, 173–187. H ARVEY, A. C. (1989). Forecasting, Structural Time Series Models and the Kalman Filter. Cambridge: Cambridge University Press. H ARVEY, A. C. (2006). Forecasting with unobserved components time series models. In Handbook of Economic Forecasting, Vol. 1 (Eds. G. Elliott, C. W. J. Granger and A. Timmermann) 327–412. DE J ONG , R. M. & DAVIDSON , J. (2000). Consistency of kernel estimators of heteroscedastic and autocorrelated covariance matrices. Econometrica 68, 407–423. M AZZARELLA , A. (2007). The 60-year solar modulation of global air temperature: the earths rotation and atmospheric circulation connection. Theoretical and Applied Climatology 88, 193–199. N ISHII , R. (1984). Asymptotic properties of criteria for selection of variables in multiple regression. Annals of Statistics 12, 758–765. Q UINN , B. G. & H ANNAN , E. J. (2001). The Estimation and Tracking of Frequency. Cambridge: Cambridge University Press. Q UINN , B. G. & T HOMSON , P. J. (1991). Estimating the frequency of a periodic function. Biometrika 78, 65–74. R ICE , J. A. & ROSENBLATT, M. (1988). On frequency estimation. Biometrika 75, 477–484. ROBINSON , P. M. (1989). Nonparametric estimation of time-varying parameters. In Statistical Analysis and Forecasting of Economic Structural Change (Ed. P. Hackl) 253–264. Berlin: Springer. S CHLESINGER , M. E. & R AMANKUTTY, N. (1994). An oscillation in the global climate system of period 65–70 years. Nature 367, 723–726. S UN , Y., H ART, J. D. & G ENTON , M. G. (2012). Nonparametric inference for periodic sequences. Technometrics 54, 83–96. T IBSHIRANI , R. (1996). Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society B 58, 267–288. WALKER , A. M. (1971). On the estimation of a harmonic component in a time series with stationary independent residuals. Biometrika 58, 21–36. Z HANG , P. (1992). On the distributional properties of model selection criteria. Journal of the American Statistical Association 87, 732–737. Z HENG , X. & L OH W.-Y. (1995). Consistent variable selection in linear models. Journal of the American Statistical Association 90, 151–156.

Biometrika, pp. 1–17 C 2012 Biometrika Trust Printed in Great Britain

Supplementary material for “Nonparametric estimation of a periodic sequence in the presence of a smooth trend” B Y M ICHAEL VOGT AND O LIVER L INTON Faculty of Economics, University of Cambridge, Austin Robinson Building, Sidgwick Avenue, Cambridge, CB3 9DD, UK mv346@cam.ac.uk obl20@cam.ac.uk

S UMMARY In this supplement, we provide the technical details that are omitted in the paper. In addition, we compare our method with unobserved components models and outline some extensions of it.

1. C OMPARISON WITH UNOBSERVED COMPONENTS MODELS In what follows, we continue the empirical comparison of our method with unobserved components models from Section 8 of the paper. To do so, we apply both approaches to a couple of different data sets. In particular, we examine a sample of energy consumption data, a set of rainfall data as well as the classical lynx trapping and sunspot number time series. We work with the same version of the unobserved component model as in Section 8 of the paper. However, we do not restrict the variance of the disturbance term ηt to zero but estimate it together with the other hyperparameters of the model. To compute the results, we have again used the STAMP software of Koopman, Harvey, Doornik & Shephard.

4.0

Value of criterion function

6.0 5.5 5.0 4.5

Log of gas consumption

6.5

1·1. Gas consumption data To start with, we examine a series of quarterly gas consumption data that has been analyzed in the book by Harvey (1989). The data can be found in the appendix of Harvey (1989) and are depicted in Figure 1. As can be seen, they have a clear annual cycle, i.e., a period of length 4. We however treat this period as unknown and let it be estimated by the two methods. This shows us whether they are capable of picking up the period appropriately.

1960

1965

1970

1975

1980

1985

Quarters

Fig. 1: The left-hand panel shows quarterly gas consumption in the UK from 1960 to 1986 (measured in millions of useful therms). The right-hand panel depicts the criterion function of our method.

M. VOGT AND O. L INTON

6.5 6.0 5.5 5.0

Log of gas consumption

6.0 5.5 5.0 4.5

4.0

Log of gas consumption

6.5

The right-hand panel of Figure 1 displays the criterion function of our method. Both here and in the subsequent data examples, we choose the penalty term λT to equal σ ˇ 2 log T , where the 2 estimate σ ˇ is defined in Section 5 of the paper. Our criterion function takes its minimum at 4, suggesting an annual period in the data. Fitting the unobserved components model to the data provides an estimate of 3.99 for the period parameter ϑ. Hence, both methods appropriately pick up the yearly cycle in the data.

4.5

1960

1965

1970

1975

1980

1985

1960

1970

1980

Fig. 2: The left-hand panel shows the estimate of the periodic component in our model (solid line) and the corresponding estimate in the unobserved components model (dashed line). Similarly, the right-hand panel depicts the trend estimate in our model (solid line) and the corresponding estimate in the unobserved components setting (dashed line). The grey line in the background is the time series of data points.

6.5 6.0 5.5 5.0 4.5 4.0

Log of gas consumption

The estimates of the periodic component are presented in the left-hand panel of Figure 2. As the periodic part in our model is deterministic, it has the same amplitude all over the sample. In the unobserved components model in contrast, the amplitude of the estimated cycle gets a bit larger in the second half of the sample, picking up a similar effect in the data. The righthand panel of Figure 2 shows the estimates of the trend component. Whereas our method suggests slight nonlinearities in the trend function, the unobserved components model favours a deterministic linear trend. To calculate the trend estimate in our model, we have employed an Epanechnikov kernel and have set the bandwidth to 0.15. The same kernel and bandwidth have been used in the other data examples.

1960

1965

1970

1975

1980

1985

Fig. 3: Data fits produced by the two methods. The solid line is the sum of the estimated trend and periodic component in our model, the dashed line is the corresponding sum in the unobserved components model. The grey line in the background shows the time series of data points.

Supplementary material

Finally, Figure 3 depicts the sum of the estimated trend and periodic component in the two models. Overall, the fits are fairly similar and capture both the trend and the cyclical behaviour of the data in a satisfying way.

1860

1900

1940

1980

450000

350000

Value of criterion function

200 150 50

100

Rainfall (cm)

250

1·2. Rainfall data We next consider a series of annual rainfall data from Fortaleza in Brazil which has been examined in a variety of studies before; see Morettin et al. (1985), Kane & Trivedi (1986) and Harvey & Souza (1987) among others.

Years

Fig. 4: The left-hand panel shows annual rainfall (measured in centimetres) in Fortaleza, Brazil, from 1849 to 1992. The right-hand panel is a plot of the criterion function of our method.

200 150 50

100

Rainfall (cm)

250

The data are plotted in Figure 4 and can again be found in the appendix of Harvey (1989). The criterion function of our model is minimized at 13. By applying our method, we thus find a period of 13 years in the data. This is in close accordance with the findings in the unobserved components model, where the estimate of the period parameter ϑ amounts to 13.18.

1860

1880

1900

1920

1940

1960

1980

Fig. 5: Data fits produced by the two methods. The solid line is the sum of the estimated trend and periodic component in our model, the dashed line is the corresponding estimate in the unobserved components model. The grey line in the background shows the time series of data points.

The plot of the rainfall series suggests that there is not much trending behaviour in the data. Indeed, the unobserved components model yields a linear trend estimate with a rather small

M. VOGT AND O. L INTON

1·3. Lynx trapping data We now turn to a rather classical data set, annual lynx trappings in Canada from 1821 to 1934. The data are plotted in the left-hand panel of Figure 6. As they do not appear to exhibit any noteworthy trending behaviour, we drop the trend component from both models and concentrate on the periodic part. In particular, we restrict attention to the problem of estimating the unknown period.

1820 1840 1860 1880 1900 1920

2.5e+08 1.5e+08

Value of criterion function

5000 3000 0 1000

Lynx trappings

7000

slope. The trend estimate in our model is also fairly smooth and flat. For this reason, we do not present the trend and the periodic component separately but only plot the sum of the two components. This is done in Figure 5, which shows that the two methods yield comparable fits of the data.

Years

Fig. 6: The left-hand panel is the time series of annual lynx trappings in Canada from 1821 to 1934. The right-hand panel shows the criterion function of our method.

Our criterion function takes its minimum at 38, suggesting a period of 38 years. As reported in Sun, Hart & Genton (2012), the same estimate is obtained from applying their cross-validation approach. The unobserved components model in contrast, produces an estimate of 9.98 for the period parameter ϑ. This discrepancy can be explained as follows: inspecting the plot of the lynx data in Figure 6, the peaks of the time series can be seen to follow a particular pattern. Specifically, a high peak is always followed by three smaller ones and this pattern is repeated (almost) three times within the sample. If the periodic component is modelled to be deterministic, it is not unreasonable to regard the data as having a 38-year cycle with four peaks that follow the described pattern. This is reflected in the form of our criterion function which has a much stronger downward spike at 38 years than around 9–10 years. If the periodic component is allowed to be stochastic in contrast, then it is more appropriate to consider the data as having a cycle of 9–10 years with the amplitude of the peaks changing over time. 1·4. Sunspot data Another classical data set are the monthly sunspot numbers from 1749 to 1983 which are depicted in the left-hand panel of Figure 7. Again we take for granted that there is no trending behaviour in the data and focus attention on estimating the unknown period. Our method produces a period estimate of 133 months. Sun, Hart & Genton (2012) obtain exactly the same result by applying their cross-validation approach. The unobserved components model yields an estimate of approximately 157 months for the period parameter ϑ. These values

Supplementary material

1750

1800

1850

1900

1950

9e+06 7e+06 5e+06

150 100 0

Sunspots

200

Value of criterion function

250

correspond to periods of approximately 11 and 13 years, respectively, and are in close accordance with the results obtained from a periodogram based approach; see for example the analysis in Brockwell & Davis (1991).

200

600

1000

1400

Months

Fig. 7: The left-hand panel is the time series of monthly sunspot numbers from 1749 to 1983. The right-hand panel shows the criterion function of our method.

2. VARIANTS AND EXTENSIONS Our estimation method may be extended in various directions. In what follows, we outline some of them. 2·1. Trend estimation When applying our procedure, we remove the estimated cyclical component from the data before estimating the trend. It is however also possible to set up a direct estimation method for the trend function. In particular, we may naively estimate g by a standard local linear smoother of the form PT t=1 wt,T (u)Yt,T , gˆnaive (u) = P T w (u) t,T t=1 the weights wt,T (u) being defined in Subsection 3·3 of the paper. The periodic component m enP P ters the estimator gˆnaive (u) via the term Tt=1 wt,T (u)m(t)/ Tt=1 wt,T (u), which is a weighted P0 average of the values m(t). Renormalizing m and g to satisfy θs=1 m(s) = 0 for convenience, PT PT it is easily seen that | t=1 wt,T (u)m(t)/ t=1 wt,T (u)| ≤ C/T h. Hence, the periodic component gets smoothed out in a similar way as the trend function in Subsections 3·1 and 3·2 of the paper. As a consequence, gˆnaive can be shown to have the same limiting behaviour as the oracle estimator which is based on the deseasonalized observations Zt,T = Yt,T − m(t). From an asymptotic perspective, it is thus possible to estimate the trend function g without taking into account the periodic model part at all. Nevertheless, this naive way of estimating the trend function should be treated with caution. The reason is that it may produce very poor estimates of g in small samples. In particular, when the period θ0 is large relative to the sample size, then the estimator gˆnaive will tend to pick up the periodic component as part of the trend function. For example, if we estimate the warming trend in our application by gˆnaive , we will incorporate part of the 60-year cyclical component into it.

100

105

110

115

120

125

130

135

140

145

150

M. VOGT AND O. L INTON

2·2. Reversing the estimation scheme The previous subsection suggests that the steps of our estimation procedure may be reversed. Indeed, it is possible to start off with estimating the trend function and then proceed by estimating the periodic component. In what follows, we have a closer look at this modified estimation scheme. For convenience, we again normalize the components m and g to satisfy P θ0 s=1 m(s) = 0. Step 1: Estimation of the trend function g. The trend function g can be estimated by the smoother gˆnaive from the previous subsection which is relabelled as gˆre in what follows. When applying the estimator gˆre one should however keep in mind its potential pitfalls. In particular, one should avoid using it when the period of the cyclical part is expected to be large relative to the sample size. Step 2: Estimation of the period θ0 . The period θ0 may be estimated by applying our penalized ˆ t,T = Yt,T − gˆre (t/T ), where we least squares procedure to the approximately detrended data W undersmooth gˆre by picking the bandwidth h to be of the order T −(1/4+δ) for some small δ > 0. Let us denote the resulting estimator by θˆre . Arguments similar to those for the proof of Theorem 1 show that θˆre consistently estimates the period θ0 . An obvious drawback of the estimator θˆre is that it depends on the bandwidth h. This contrasts with the estimator θˆ which is fully independent of h. Given a good choice of the bandwidth, ˆ The reasoning however, intuition suggests that the estimator θˆre should be more precise than θ. ˆ is as follows: θ re is based on preprocessed data from which the trend has been approximately removed. Since the trend plays the role of an additional noise component when it comes to ˆ estimating the periodic model part, θˆre should perform better than θ. Having a closer look at the proof of Theorem 1, this intuition turns out to be misguided. As noted in Subsection 3·1 of the paper, the trend function gets smoothed P or integrated out in the proof. In particular, it shows up in sums of the form ST = K −1 K k=1 g((s + (k − 1)θ)/T ), where 1 ≤ s ≤ θ and K is the number of time points in the sample which can be written as t = s + (k − 1)θ for some k ∈ Z. For a fixed period θ, ST is of the order O(1/T ). If we estimate the P trend in a first step by gˆre , then ST gets replaced by SˆT = K −1 K k=1 [g((s + (k − 1)θ)/T ) − gˆre ((s + (k − 1)θ)/T )] in the proof. Since the error of estimating g by the smoother gˆre is of much larger order than O(1/T ), the sum SˆT will in general be of larger order than ST as well. Thus, approximately eliminating the trend in a first step tends to introduce additional ‘noise’ in the estimation of θ0 rather than to reduce it. Step 3: Estimation of the periodic sequence values. The values {m(t)}t∈Z can be estimated by applying the least squares procedure from Subsection 3·2 of the paper to the approximately ˆ t,T = Yt,T − gˆre (t/T ), where as in the previous step we choose the bandwidth detrended data W to be of the order T −(1/4+δ) . Going along the lines of the proof for Theorem 2, the resulting estimator m ˆ re (t) can be shown to be asymptotically normal for each fixed time point t. However, the limiting distribution will in general differ from that of the oracle estimator which is based on the exactly detrended data Wt,T = Yt,T − g(t/T ). Thus, the error of estimating the trend function gets reflected in the asymptotic distribution of m ˆ re (t). This again indicates that approximately eliminating the trend in a first step tends to increase the ‘noise’ in the subsequent estimation steps rather than to decrease it. The above remarks show that our estimation scheme can in principle be reversed. One should however keep in mind that setting up the procedure in this way comes along with some disadvantages and potential pitfalls.

Supplementary material

2·3. Iterating the estimation scheme It is also possible to iterate our procedure. In particular, we can set up a backfitting scheme of a similar type as described in Section 8.5 of Hastie & Tibshirani (1990): (1) Perform the three steps of the estimation procedure described in Section 3 of the paper. This ˆ m yields initial estimates θˆ(0) = θ, ˆ (0) = m ˆ and gˆ(0) = gˆ. (2) Apply the first two estimation steps to the approximately detrended data Yt,T − gˆ(0) (t/T ). This yields updated estimates θˆ(1) and m ˆ (1) . (3) Apply the third estimation step to the data Yt,T − m ˆ (1) (t). This yields an updated trend (1) estimator gˆ . (4) Steps (2) and (3) may be repeated to get further updates of the estimators. The motivation behind such a scheme is to improve the quality of the estimators. From an asymptotic point of view, however, there is in general no gain from performing one or more backfitting steps. Moreover, backfitting comes along with the same disadvantages as reversing the estimation scheme. It is thus questionable whether backfitting pays off in any way when working with a specific sample of data. 2·4. Multiple periods In some applications, the periodic sequence m can be expected to be a superposition of multiple periodic components. Neglecting the trend function for simplicity, we may for example consider a model with two periods given by Yt = m1 (t) + m2 (t) + εt ,

160

165

170

(1)

where mi is a periodic sequence with unknown (smallest) period θi for i = 1, 2. The superposition m = m1 + m2 is periodic as well. As before, we denote its (smallest) period by θ0 . In many situations, θ0 equals the least common multiple of θ1 and θ2 . As shown in Restrepo & Chac´on (1998), this is however not always the case. Applying our penalized least squares method to model (1) yields a consistent estimator of θ0 . Hence, if we ignore the multiperiodic structure of the model, our procedure results in estimating the period θ0 of the superposition m. Sometimes, however, we are not primarily interested in estimating the period of the superposition but want to find out about the periods of the individual cyclical components. Tackling this problem is complicated by the fact that the periods θ1 and θ2 are not uniquely identified in general. Even though the superposition m and its period θ0 are identified by Lemma A2, the superposition may be generated by different pairs of periodic sequences having different periods. More formally, let Θ be the set of pairs (θ1 , θ2 ) such that there exist periodic sequences m1 and m2 with m = m1 + m2 . In general, Θ contains more than one pair of periods.1 One possible way to estimate the elements of Θ is to construct a two-dimensional (or more generally a multi-dimensional) version of our penalized least squares method. Informally, the procedure looks as follows: for each pair of candidate periods, we fit a model with two cyclical components to the data and calculate the corresponding residual sum of squares. Our estimator is then defined by minimizing a penalized version of the latter. Which elements of Θ are approximated by this procedure will be determined by the structure of the penalty. For example, if we choose the penalty to have the form λT (θ1 + θ2 ), then we will estimate the pair of periods in Θ with the smallest sum. As far as we can see, it is however not trivial at all to extend our theory 1

155

As an example, consider the pair of periodic sequences {m1 (1), . . . , m1 (5)} = {−1, 0, 0, 0, 0} and {m2 (1), . . . , m2 (6)} = {1, 0, 2, −1, 2, 0} having the periods 5 and 6. The sum of these two sequences generates a periodic sequence of period 30. The same sequence is generated by the sequences {m1 (1), . . . , m1 (3)} = {−1, 0, 0} and {m2 (1), . . . , m2 (10)} = {1, 0, 2, 0, 2, −1, 2, 0, 2, 0} with the periods 3 and 10.

175

180

185

190

M. VOGT AND O. L INTON

195

200

to this multi-dimensional case. The main problem is that our proofs for the single-period case heavily draw on the rather simple structure of the design matrix Xθ . In the multiperiod case, this structure gets lost, making it hard to carry over some of the arguments. For the time being, we are thus content with estimating the period θ0 of the superposition m.

3. T ECHNICAL DETAILS In this section, we provide the proofs that are omitted in the paper. The notation is the same as summarized at the beginning of the appendix. Proof of (6). Suppose that θ is a multiple of θ0 , i.e., θ = rθ0 for some r and let βˆθ = ˆ (βθ,1 , . . . , βˆθ,θ ) be the least squares estimator based on the period θ. For ease of notation, we define the shorthand Is (t) = I(t = kθ + s for some k) and write (β1 , . . . , βθ ) with βs = βs−θ0 bs/θ0 c for s = 1, . . . , θ. With this, it holds that T 2 RSS(θ) 1X Yt − βˆθ,1 I1 (t) − . . . − βˆθ,θ Iθ (t) . = T T t=1

As Yt − β1 I1 (t) − . . . − βθ Iθ (t) = εt for θ = rθ0 , we further obtain T T RSS(θ) 1X 2 2X = εt + εt β1 − βˆθ,1 )I1 (t) + . . . + (βθ − βˆθ,θ )Iθ (t) T T T

+ 205

t=1 T X

1 T

t=1

2 β1 − βˆθ,1 )I1 (t) + . . . + (βθ − βˆθ,θ )Iθ (t) .

t=1

Inspecting the definition of the least squares estimator βˆθ , it can be seen that βˆθ,s = P [θ] [θ] (Ks,T )−1 Tt=1 Is (t)Yt with Ks,T = 1 + b(T − s)/θc. Thus T 1 X βs − βˆs,θ = − [θ] Is (t)εt . Ks,T t=1

Using this, some straightforward calculations yield that T θ T RSS(θ) 1 X 2 X 1 1 X 0 0 = εt − I (t)I (t )ε ε s s t t T T T K [θ] 0 t=1

s=1

s,T t,t =1

and hence E 210

h RSS(θ) i T

= σ2 −

σ2θ . T

As a result, h RSS(rθ ) i h RSS(θ ) i σ 2 (rθ0 ) σ 2 θ0 0 0 2 2 E =σ − <σ − =E . T T T T Rearranging yields (6). Proof of Lemma A2. By assumption, for all t = 1, . . . , T and all T = 1, 2, . . ., t t g¯ −g = m(t) − m(t). ¯ T T

Supplementary material

Let θ× be the least common multiple of θ0 and θ¯0 . As m and m ¯ are periodic with (smallest) period θ0 and θ¯0 respectively, they are both periodic with period θ× . We thus obtain that s + (k − 1)θ× s + (k − 1)θ× g¯ −g = m(s) − m(s) ¯ T T [θ× ]

[θ× ]

for all s = 1, . . . , θ× and k = 1, . . . , Ks,T with Ks,T = 1 + b(T − s)/θ× c. If m ¯ = m, then × clearly g¯ = g follows since the points (s + (k − 1)θ )/T become dense in [0, 1] as T increases and the functions g¯ and g are smooth. We next assume that m(s) ¯ 6= m(s) for some × s ∈ {1, . . . , θ } and show that this leads to a contradiction: w.l.o.g. let m(s) − m(s) ¯ = ds > 0 for some s ∈ {1, . . . , θ× }. Then

215

s + (k − 1)θ× s + (k − 1)θ× g¯ −g = ds > 0 T T as well as

220

[θ × ]

Ks,T

s + (k − 1)θ× i X h s + (k − 1)θ× g¯ −g = ds > 0. T T

[θ× ] Ks,T k=1

However, by Lemma A1, [θ × ]

lim

Ks,T

× T →∞ K [θ ] s,T

s + (k − 1)θ× i X h s + (k − 1)θ× −g g¯ T T k=1 Z 1 Z 1 = g¯(u)du − g(u)du = 0 6= ds , 0

which is a contradiction.

Proof of Lemma A3. We first derive the results on the terms Bθ , Sθε and Sθg . To do so, we have a closer look at the expression (I − Πθ )(Xθ0 β) which is the common component of these terms. It holds that

225

(I − Πθ )Xθ0 β = (I − Xθ Dθ XθT )Xθ0 β  m(1)  ..  Dθ Dθ . . .  .       Dθ . . .   m(θ0 ) = I −       ..  m(1)  . .. . 





=: (γ1,T , . . . , γθ× ,T , γ1,T , . . . , γθ× ,T , . . .)T with [θ]

γs,T = m(s) −

θ ,T X

[θ]

Ksθ ,T

k=1

m((k − 1)θ + sθ )

230

M. VOGT AND O. L INTON

for s = 1, . . . , θ× , where sθ = s − θbs/θc and θ× is the least common multiple of θ0 and θ. A representation of γs,T which will turn out to be useful in what follows is given by γs,T = ζs + Rs,T

(2)

with Rs,T = R1,s,T + R2,s,T and ζs = m(s) −

235

θ0 1 X m((k − 1)θ + sθ ) θ0 k=1

R1,s,T = 1 −

[θ] θ0 θ0 j Ksθ ,T k 1 X m((k − 1)θ + sθ ) [θ] θ0 θ0 K k=1

sθ ,T

R2,s,T = −

[θ] Ks ,T θ

m((k − 1)θ + sθ ).

[θ]

Ksθ ,T

[θ] k=θ0 bKs ,T /θ0 c+1 θ

The components of the representation in (2) have the following properties: first of all, the remainder satisfies |Rs,T | ≤ [θ]

240

[θ]

Cθ0 [θ]

(3)

Ksθ ,T

[θ]

since |1 − (θ0 /Ksθ ,T )bKsθ ,T /θ0 c| ≤ θ0 /Ksθ ,T and |R2,s,T | ≤ Cθ0 /Ksθ ,T . Morover, the terms ζs have the following properties: (Aζ ) Assume that Case A holds. Then there exists an index s ∈ {1, . . . , θ× } with ζs 6= 0. Moreover, there exists a small constant η > 0 such that |ζs | ≥ η whenever ζs 6= 0. (Bζ ) Assume that Case B holds. Then ζs = 0 for all s.

245

250

Note that the constant η does not depend on any model parameters, in particular it is independent of θ and s. The proof of (Aζ ) and (Bζ ) is postponed until the arguments for Lemma A3 are completed. We now derive the properties of the terms Bθ , Sθε and Sθg in Case A. To do so, define S to be the subset of indices s ∈ {1, . . . , θ× } for which ζs 6= 0 and let #S = n. Moreover, write Sc = {1, . . . , θ× } \ S. Using (Aζ ) together with (2) and (3), it is easily seen that γs,T → ζs 6= 0 with |γs,T − ζs | = |Rs,T | ≤ CΘT /T for all s ∈ S and |γs,T | = |Rs,T | ≤ CΘT /T for all s ∈ Sc . From this, it immediately follows that Bθ = (Xθ0 β)T (I − Πθ )Xθ0 β = (Xθ0 β)T (I − Πθ )T (I − Πθ )Xθ0 β = (γ1,T , . . . , γθ× ,T , . . .)(γ1,T , . . . , γθ× ,T , . . .)T ≥ c

nT θ

for some fixed constant c > 0 and all T ≥ T0 with T0 being sufficiently large. Next write Sθε =

T X t=1

γt,T εt,T =

X t∈IS

γt,T εt,T +

X t∈ISc

γt,T εt,T

Supplementary material

with IS = {t : t − θ× bt/θ× c ∈ S} and ISc = {t : t − θ× bt/θ× c ∈ Sc }. Then r

ν r nT X nT

T ε ≤ pr

γt,T εt,T > pr |Sθ | > νT θ 2 θ t∈IS

ν r nT X

T + pr

γt,T εt,T > =: Qθ,1 + Qθ,2 . 2 θ

255

t∈IS c

As |γs,T | ≤ C for all s and T for some sufficiently large constant C (which is evident from (2) and (3)), we can apply Chebychev’s inequality and then exploit the mixing conditions on our model variables with the help of Davydov’s inequality (see Corollary 1.1 in Bosq (1998)) to get that Qθ,1 ≤ C/νT2 . Using the same argument together with the fact that |γs,T | ≤ CΘT /T for all s ∈ Sc , we further obtain that Qθ,2 ≤ CΘ3T /(T νT )2 ≤ C/νT2 . As a result, r nT C ε ≤ 2. pr |Sθ | > νT θ νT

260

Finally, T

X t

|Sθg | = |g T (I − Πθ )Xθ0 β| =

γt,T g

T t=1

θ×

[θ× ] =

γs,T Ks,T s=1

[θ × ]

Ks,T

1 [θ× ] Ks,T

X s + (k − 1)θ×

≤ Cn T k=1 {z } [θ × ]

|·|≤C/Ks,T by Lemma A1

for some sufficiently large constant C. Let us now turn to Case B. In this case, we do not only have that ζs = 0 but even γs,T = 0 for all s. Hence, (I − Πθ )Xθ0 β = 0, which immediately implies that the terms Bθ , Sθε and Sθg are all equal to zero. We finally turn to the expressions Wθε and Wθg . It holds that     Dθ0 Dθ0 . . . Dθ Dθ . . .     ..  − Dθ . . .  g  (Πθ − Πθ0 )g =    0 Dθ .  .. .. . .     [θ0 ] [θ] K1,T K1,T P (k−1)θ+1   1 P g (k−1)θ0 +1   1[θ]  [θ0 ] T T  K1,T k=1 g   K1,T  k=1     ..    ..      . .     [θ ] [θ]    Kθ 0,T Kθ,T    1 P   1 0 P . (k−1)θ+θ (k−1)θ +θ 0 0 = −  g g   K [θ]   K [θ0 ] T T   θ,T k=1   θ0 ,T k=1  [θ]   [θ0 ]  K1,T K1,T     P  P  1   1 0 +1  g (k−1)θ+1 g (k−1)θ  K [θ]   [θ0 ] T T   1,T k=1   K1,T k=1  .. .. . .

265

270

M. VOGT AND O. L INTON

12 Hence,

[θ]

Wθg = g T (Πθ − Πθ0 )g =

T X

θ ,T X (k − 1)θ + lθ l g g T T

1 [θ]

l=1

−

Klθ ,T

k=1

[θ0 ] θ0 ,T

T X

[θ ]

l=1

Klθ 0,T

k=1

(k − 1)θ + l l 0 θ0 g T T

with lθ = l − θbl/θc. Moreover, [θ]

275

l=1

1 [θ]

Klθ ,T

θ ,T X (k − 1)θ + lθ l

g g

T T

k=1

[θ]

s=1

Ks,T CΘ2T 1 X (k − 1)θ + s 2

Cθ g ≤ ≤

θ [θ] T T Ks,T Ks,T k=1 | {z } ≤C by Lemma A1

and thus |Wθg | ≤ C. Similarly, [θ]

Wθε = εT (Πθ − Πθ0 )g =

T X

θ ,T X (k − 1)θ + lθ g εl,T T

[θ]

l=1

−

Klθ ,T

T X

k=1

[θ0 ] θ0 ,T

[θ ]

l=1 280

Klθ 0,T 0

k=1

(k − 1)θ + l 0 θ0 εl,T . T

Rewriting Wθε in this way, we can apply Chebychev’s inequality and subsequently exploit our mixing assumptions by Davydov’s inequality to obtain that C pr |Wθε | > CνT ≤ 2 νT for any diverging sequence {νT }.

285

Proof of (Aζ ) and (Bζ ). It is trivial to see that ζs = 0 for all s in Case B. We thus only have to prove (Aζ ). To do so, we first show that there exists an index s ∈ {1, . . . , θ× } with ζs 6= 0. The proof proceeds by contradiction: suppose there exists some θ with ζs = 0 for all s ∈ N (or equivalently for all s ∈ {1, . . . , θ× }). As sθ = (s + rθ)θ for all natural numbers s and r, it holds that θ0 θ0 1 X 1 X m((k − 1)θ + sθ ) = m((k − 1)θ + (s + rθ)θ ). θ0 θ0 k=1

290

k=1

Moreover, as ζs = ζs+rθ (= 0) by assumption, we obtain that m(s) = m(s + rθ) for all s and r, which means that m has the period θ. If θ < θ0 , this contradicts the assumption that θ0 is the smallest period of m. If θ > θ0 , we run into the following contradiction: as θ is not a multiple of

Supplementary material

θ0 , it holds that jθk θ0 + k = m(s + k) m(s) = m(s + θ) = m s + θ0 for some k with 1 ≤ k < θ0 . However, m(s + k) 6= m(s) for at least one s, as otherwise k < θ0 would be a period of m. It remains to show that |ζs | ≥ η for some small constant η > 0 whenever ζs 6= 0. To see this, P0 m((k − 1)θ + sθ ) is the average of θ0 different elements of the sequence first note that θ0−1 θk=1 {m(t)}t∈Z . The sequence being periodic, this average can only take a finite number of different values. More precisely, there are at most 2θθ00−1 different values (independently of s and θ). P0 m((k − 1)θ + sθ ) can only take From this, it immediately follows that ζs = m(s) − θ0−1 θk=1 a finite number of values as well. In particular, there is only a finite number of possible non-zero values. We can thus find a constant η > 0 with |ζs | ≥ η whenever ζs 6= 0. Proof of Lemma A4. Set νT = (ρT ΘT )1/2 . In Case A, we obtain pr Q(θ, λT ) ≤ Q(θ0 , λT ) = pr Vθ ≤ −Bθ − 2Sθε − 2Sθg + 2Wθε + Wθg + λT (θ0 − θ) q ε| ≤ ν ≤ pr Vθ ≤ −Bθ − 2Sθε − 2Sθg + 2Wθε + Wθg + λT (θ0 − θ), |Sθε | ≤ νT nT , |W T θ θ q + pr |Sθε | > νT nT + pr |Wθε | > νT θ q ≤ pr Vθ ≤ −Bθ + CνT nT + λ (θ − θ) + CνT−2 , T 0 θ where we have used the results from Lemma A3. Choosing λT to satisfy λT /T → 0 and noting p that the regularization term λT (θ0 − θ) is negative for θ > θ0 , it can be seen that CνT nT /θ + λT (θ0 − θ) ≤ δ(nT /θ) for some arbitrarily small δ > 0 and all T ≥ T0 with T0 being sufficiently large. Hence, r nT nT nT −Bθ + CνT + λT (θ0 − θ) ≤ −(c − δ) ≤ −C1 θ θ θ for some constant C1 > 0. From this, it follows that nT pr Q(θ, λT ) ≤ Q(θ0 , λT ) ≤ pr Vθ ≤ −C1 + CνT−2 θ T ≤ pr Vθ ≤ −C1 + CνT−2 . ΘT Moreover, T T pr Vθ ≤ −C1 = pr Vθ,1 + Vθ,2 ≤ −C1 ΘT ΘT T ≤ pr |Vθ,1 | + |Vθ,2 | ≥ C1 ΘT C1 T C1 T ≤ pr |Vθ,1 | ≥ + pr |Vθ,2 | ≥ 2ΘT 2ΘT =: Pθ,1 + Pθ,2 . To deal with the probabilities Pθ,1 and Pθ,2 , we introduce the following concept: we say that an index i1 is separated from the indices i2 , . . . , id if |i1 − ik | > C2 log T for a sufficiently large

295

300

305

310

315

320

M. VOGT AND O. L INTON

constant C2 (to be specified later on) and all k = 2, . . . , d. With this definition at hand, we can use Chebychev’s inequality to get [θ]

Pθ,2

T X

= pr

l=1

≤

325

CΘ2T T2 CΘ2T T2 +

θ ,T X

1 [θ]

Klθ ,T

k=1

C T

1 ε(k−1)θ+lθ ,T εl,T ≥ 2ΘT K

T X

1 [θ] [θ] Klθ ,T Kl0 ,T θ

l,l0 =1

CΘ2T T2

[θ]

Klθ ,T Kl0 ,T θ 1 [θ]

(l,l0 ,k,k0 )∈Γc

[θ] 0

E ε(k−1)θ+lθ ,T εl,T ε(k0 −1)θ+lθ0 ,T εl0 ,T

k=1 k0 =1

[θ]

(l,l0 ,k,k0 )∈Γ

[θ]

l ,T lθ ,T θ X X

E ε(k−1)θ+lθ ,T εl,T ε(k0 −1)θ+lθ0 ,T εl0 ,T

[θ]

Klθ ,T Kl0 ,T

E ε(k−1)θ+lθ ,T εl,T ε(k0 −1)θ+lθ0 ,T εl0 ,T

=: Pθ,2,a + Pθ,2,b ,

330

335

where Γ is the set of tuples (l, l0 , k, k 0 ) such that none of the indices l, l0 , (k − 1)θ + lθ , (k 0 − 1)θ + lθ0 is separated from the others and Γc is its complement. Since E[ε4t,T ] ≤ C by assumption and the number of elements contained in Γ is smaller than C(T log T )2 for some sufficiently large constant C, it immediately follows that Pθ,2,a ≤ C(Θ2T log T /T )2 ≤ C(ρT ΘT )−1 , keeping in mind that ΘT = o(T 2/5 ). To cope with the term Pθ,2,b , we exploit the mixing conditions on the error variables: for any tuple (l, l0 , k, k 0 ) ∈ Γc , there exists an index, say l, which is separated from the others. We can thus apply Davydov’s inequality to obtain

E ε(k−1)θ+l ,T εl,T ε(k0 −1)θ+l0 ,T εl0 ,T = cov εl,T , ε(k−1)θ+l ,T ε(k0 −1)θ+l0 ,T εl0 ,T

1−(1/q)−(1/r)

≤ Cα(C2 log T )

340

345

350

≤ CT −C3

with some C3 > 0, where q and r are chosen slightly larger than 4/3 and 4, respectively. Note that C3 can be made arbitrarily large by choosing the constant C2 large enough. Bounding the moments contained in the expression Pθ,2,b in this way, it is easily seen that Pθ,2,b ≤ C(ρT ΘT )−1 . An analogous result holds for the term Pθ,1 . This shows that pr(Q(θ, λT ) ≤ Q(θ0 , λT )) ≤ C(ρT ΘT )−1 in Case A. Let us now turn to Case B. The regularization term λT (θ0 − θ) plays a crucial role in this case. In particular, it takes over the role of the term Bθ which is now equal to zero. Since Sθε and Sθg are equal to zero as well, we have pr Q(θ, λT ) ≤ Q(θ0 , λT ) = pr Vθ ≤ 2Wθε + Wθg + λT (θ0 − θ) g ε ε ≤ pr Vθ ≤ 2Wθ + Wθ + λT (θ0 − θ), |Wθ | ≤ νT + CνT−2 ≤ pr Vθ ≤ CνT + λT (θ0 − θ) + CνT−2 . Choosing λT such that νT /λT → 0 and noting that θ0 − θ < 0 in Case B, we obtain that CνT + λT (θ0 − θ) ≤ −C4 λT for some positive constant C4 and T large enough. Hence, pr Q(θ, λT ) ≤ Q(θ0 , λT ) ≤ pr Vθ ≤ −C4 λT + CνT−2

Supplementary material and by analogous arguments as for Case A, Θ log T 2 T . pr Vθ ≤ −C4 λT ≤ C λT 3/2

Thus, choosing λT to satisfy λT ≥ τT (log T )ΘT with some sequence {τT } that slowly diverges to infinity (e.g. τT = log log T ), we get that pr(Q(θ, λT ) ≤ Q(θ0 , λT )) ≤ C(ρT ΘT )−1 in Case B as well. Proof of Theorem 2. We first prove the result on asymptotic normality: let m ˜ be the estimator of m in the oracle case where the true period θ0 is known, i.e., (m(1), ˜ . . . , m(θ ˜ 0 )) = βˆθ0 and m(s ˜ + kθ0 ) = m(s) ˜ for all s = 1, . . . , θ0 and all k ∈ N. Then we can write √ √ √ T (m(t) ˆ − m(t)) = T (m(t) ˆ − m(t)) ˜ + T (m(t) ˜ − m(t)). For any δ > 0, it holds that √

>δ pr T (m(t) ˆ − m(t)) ˜ √

> δ, θˆ = θ0 + pr θˆ = ≤ pr T (m(t) ˆ − m(t)) ˜ 6 θ0 .

355

360

The right-hand side of the above inequality is o(1), as the first term is equal to zero (note that m(t) ˆ = m(t) ˜ for θˆ = θ0 ) and pr(θˆ 6= θ0 ) = o(1) by Theorem 1. Hence, √ √ T (m(t) ˆ − m(t)) = T (m(t) ˜ − m(t)) + op (1). Next, note that we can write m(t) ˜ =

1 Kt0 ,T

Kt0 ,T

Yt0 +(k−1)θ0 ,T

k=1

with t0 = t − θ0 bt/θ0 c and Kt0 ,T = 1 + b(T − t0 )/θ0 c, i.e., the estimate m(t) ˜ can be expressed as the empirical mean of observations that are separated by a multiple of θ0 periods. This can be seen by inspecting the formula for the least squares estimate βˆθ0 . We thus obtain that √

√ T (m(t) ˜ − m(t)) = T

1 Kt0 ,T

365

Kt0 ,T

Kt0 ,T X t0 + (k − 1)θ0 1 X g + εt0 +(k−1)θ0 ,T T Kt0 ,T k=1

k=1

=: Q1 + Q2 . R1 The term Q1 approximates the integral 0 g(u)du. Using Lemma A1, the convergence rate is R1 seen to be O(T −1/2 ). As 0 g(u)du = 0 by our normalization, we obtain that Q1 is of the order O(T −1/2 ) and can thus be asymptotically neglected. Noting that {εt,T } is mixing by (C1) and has mean zero, we can now apply a central limit theorem for mixing variables to Q2 to get the normality result of Theorem 2. We next turn to the uniform convergence result. We have to show that for each δ > 0 there exists a constant C such that

C pr max m(t) ˆ − m(t) > √ <δ (4) 1≤t≤T T

370

375

M. VOGT AND O. L INTON

for sufficiently large T . This can be seen as follows: for each fixed constant C > 0,

C pr max m(t) ˆ − m(t) > √ 1≤t≤T T

C ≤ pr max m(t) ˆ − m(t) > √ , θˆ = θ0 + pr θˆ 6= θ0 . 1≤t≤T T Moreover, pr(θˆ 6= θ0 ) = o(1) by Theorem 1 and

C pr max m(t) ˆ − m(t) > √ , θˆ = θ0 1≤t≤T T

C = pr max m(t) ˆ − m(t) > √ , θˆ = θ0 1≤t≤θ0 T θ 0

C ≤ pr m(t) ˆ − m(t) > √ . T t=1

380

385

By the above arguments for the asymptotic normality result, m(t) ˆ

− m(t) = Op (T −1/2 ) for each

ˆ − m(t) > CT −1/2 ) for t = fixed time point t. Hence, we can make the probabilities pr( m(t) 1, . . . , θ0 arbitrarily small by choosing the constant C large enough. From this, (4) immediately follows. Proof of Theorem 3. We start with the proof of the uniform convergence result. Letting g˜ be the infeasible estimator defined in Section 3·3 of the paper, we can write

sup gˆ(u) − g(u) ≤ sup gˆ(u) − g˜(u) + sup g˜(u) − g(u) . u∈[0,1]

u∈[0,1]

Since max1≤t≤T |m(t) − m(t)| ˆ = Op (T −1/2 ), it holds that

PT w (u)(m(t) − m(t))

t,T

sup gˆ(u) − g˜(u) = sup t=1 PT

= Op √ . T u∈[0,1] u∈[0,1] t=1 wt,T (u) It thus remains to show that

sup g˜(u) − g(u) = Op u∈[0,1] 390

395

400

log T + h2 . Th

To do so, we decompose the local linear smoother g˜ into the variance component g˜V (u) = g˜(u) − E[˜ g (u)] and the bias component g˜B (u) = E[˜ g (u)] − g(u). Using a simplified version of the proof for Theorem 4.1 in Vogt (2012) or alternatively applying Theorem 1 of Kristensen p (2009), it can be shown that supu∈[0,1] |˜ g V (u)| = Op ( log T /T h). In addition, straightforward calculations yield that supu∈[0,1] |˜ g B (u)| ≤ Ch2 for some sufficiently large constant C. This completes the proof of the uniform convergence result. The result on asymptotic normality can be derived in an analogous way by first replacing gˆ with the smoother g˜ and then using the decomposition g˜ = g˜V + g˜B . Standard arguments show that the bias component g˜B (u) has the expansion g˜B (u) = h2 Bu + o(h2 ) for any fixed u √∈ (0,V 1). Moreover, applying a central limit theorem for mixing arrays yields that the term T h˜ g (u) is asymptotically normal with mean zero and variance Vu .

Supplementary material

R EFERENCES B OSQ , D. (1998). Nonparametric Statistics for Stochastic Processes: Estimation and Prediction. Berlin: Springer, 2nd ed. B ROCKWELL , P. J. & DAVIS , R. A. (1991). Time Series: Theory and Methods. Berlin: Springer, 2nd ed. FAN , J. & YAO , Q. (2005). Nonlinear Time Series. Berlin: Springer. H ARVEY, A. C. & S OUZA , R. C. (1987). Assessing and modeling the cyclical behavior of rainfall in northeast Brazil. Journal of Climate and Applied Meteorology 26, 1317–1322. H ARVEY, A. C. (1989). Forecasting, Structural Time Series Models and the Kalman Filter. Cambridge: Cambridge University Press. H ASTIE , T. J. & T IBSHIRANI , R. J. (1990). Generalized Additive Models. London: Chapman & Hall. K ANE , R. P. & T RIVEDI , N. B. (1986). Are droughts predictable? Climate Change 8, 208–223. K RISTENSEN , D. (2009). Uniform convergence rates of kernel estimators with heterogeneous, dependent data. Econometric Theory 25, 1433–1445. M ORETTIN , P. A., M ESQUITA , A. R. & ROCHA , J. G. C. (1985). Rainfall at Fortaleza in Brazil revisited. In Time Series Analysis: Theory and Practice 6 (Eds. O. D. Anderson, E. A. Robinson and K. Ord) 67–86. North Holland, Amsterdam. ´ , L. P. (1998). On the period of sums of discrete periodic signals. IEEE Signal Processing R ESTREPO , A. & C HAC ON Letters 5, 164–166. S UN , Y., H ART, J. D. & G ENTON , M. G. (2012). Nonparametric inference for periodic sequences. Technometrics 54, 83–96. VOGT, M. (2012). Nonparametric regression for locally stationary time series. Annals of Statistics 40, 2601–2633.

405

410

415

420