Counterfactual estimation on nonstationary data, be careful!!!

(This article was first published on R – insightR, and kindly contributed to R-bloggers)

By Gabriel Vasconcelos

In a recent paper that can be downloaded here, Carvalho, Masini and Medeiros show that estimating counterfactuals in a non-stationary framework (when I say non-stationary it means integrated) is a tricky task. It is intuitive that the models will not work properly in the absence of cointegration (spurious case), but what the authors show is that even with cointegration, the average treatment effect (ATE) converges to a non-standard distribution. As a result, standard tests on the ATE will identify treatment effects in several cases that there is no effect at all.

For those who are not familiar with counterfactual models: normally, these models have a treated unit (we want to know the effects of the treatment) and several untreated units that we call peers. There units may be cities, countries, companies, etc. Assuming that only one unit was treated and that there is no contamination, we can use information from the peers to project the behaviour of the treated unit as if it was not treated, which is the counterfactual. The models should also be able to identify if the treatment had no effects on the treated unit.

Simulation tests

Here I am going to generate cointegrated data with no treatment to show the behaviour of Synthetic Control and the ArCo (Other techniques will have the same features). The data is very simple and it is based on the mentioned article’s example. We are going to generate 100 observations of 10 time-series, 9 random walks and one that is the sum of these random walks plus an error, which is the treated unit. However, I will not include any treatment and I want the models to show that there was no treatment. The hypothesis to be tested is that at t_0=51 we had some treatment in the treated unit.

library(ArCo)
library(glmnet)
library(Synth)

T=100 # Number of Observations
n=10 # Number of Time Series
set.seed(1) # Seed for replication

# Generate random walk peers
peers=matrix(0,T,n-1)
for(i in 2:T){
  peers[i,]=peers[i-1,]+rnorm(n-1)
}

# Generate the treated unit
y1=rowSums(peers)+rnorm(T)

# Put all the TS together
Y=cbind(y1,peers)

# Plot all TS. The black line is the "treated unit"
matplot(Y,type="l")

plot of chunk unnamed-chunk-16

First, I am going to estimate the ArCo. The $delta show the average treatment effect with 95\% confidence interval. As you can see, the test indicated that the treatment was statistically significant (the zero is very far from the confidence interval). The plot also show a dislocated counterfactual (blue line).

# Estimate the ArCo in the non-stationary data
arco=fitArCo(list(Y),cv.glmnet,predict,treated.unity = 1,t0=51)
arco$delta
##                  LB     delta        UB
## Variable1 -6.157713 -5.236705 -4.315697
plot(arco)

plot of chunk unnamed-chunk-17

Now I am going to estimate the Synthetic Control. The Synth package does not have an implementation of a formal test. However, we can have a good idea from the counterfactual plot, which also indicates that the treatment was effective (even though there was no treatment).

# Estimate the Synthetic Control in the nonstationary data

# Put the data in the Synth package notation
sY=as.vector(Y)
timeid=rep(1:T,ncol(Y))
unitid=sort(rep(1:ncol(Y),T))
synthdata=data.frame(time=timeid,unit=unitid,Y=sY)
synthdataprep<-
  dataprep(
    foo = synthdata,
    predictors = c("Y"),
    predictors.op = "mean",
    dependent = "Y",
    unit.variable = "unit",
    time.variable = "time",
    treatment.identifier = 1,
    controls.identifier = c(2:10),
    time.predictors.prior = c(1:50),
    time.optimize.ssr = c(1:50),
    time.plot = 1:100
)
SC=synth(synthdataprep)
path.plot(SC,synthdataprep)
abline(v=50,col="blue",lty=2)

plot of chunk unnamed-chunk-18

As you can see, both the ArCo and the Synthetic Control wrongly rejected the null hypothesis of no treatment. The best way to solve the problem is to take the first difference of all non-stationary time-series. However, there is an important thing to point out here. Imagine the random walk:

\displaystyle y_t = y_{t-1} + \varepsilon_t

Suppose we have a treatment in t=t_0 that simply makes y_{t_0} = c + y_{t_0-1}+\varepsilon_{t_0} only for t=t_0. For any t\neq t_0 we will have \Delta y_t=\varepsilon_t and for t=t_0 we will have \Delta y_{t_0}=c+\varepsilon_{t_0}. Therefore, if we take the first difference of y_t the treatment will have an impact only at t=t_0, which makes it impossible for any counterfactual model to identify an intervention. In this example, we would need an intervention that changed the drift of the Random Walk in a way that for all t\geq t_0 we have:

\displaystyle y_t=d+y_{t-1}+\varepsilon_t

This will be better illustrated in an example. The plot below shows the difference between the two treatments. The red line is the case we can’t identify if we take the first difference. The Blue line changes the drift of the Random Walk, and therefore we can identify. It is impossible, with the tools I am using here, to identify if we are in the black line or in the red line.

y=rep(0,T)
yt1=yt2=y
d=1
c=10
for(i in 2:T){
 e=rnorm(1)
 y[i]=y[i-1]+e
 if(i==51){
   yt1[i]=c+yt1[i-1]+e
 }else{
   yt1[i]=yt1[i-1]+e
 }
 if(i>=51){
   yt2[i]=d+yt2[i-1]+e
 }else{
   yt2[i]=yt2[i-1]+e
 }
}
plot(yt2,type="l",col=4,ylab="")
lines(yt1,col=2)
lines(y,col=1)
legend("topleft",legend=c("untreated","treated by a constant in t0","treated in the drift"),col=c(1,2,4),bty="n",seg.len=0.5,cex=0.8,lty=1)

plot of chunk unnamed-chunk-19

Now let’s get back to the first example. The results clearly indicated an intervention where there was none. Now I will show what happens if we take the first difference.

# Take the first difference
dY=diff(Y,1)
# Estimate the ArCo
darco=fitArCo(list(dY),cv.glmnet,predict,treated.unity = 1,t0=50)
darco$delta
##                   LB      delta        UB
## Variable1 -0.6892162 0.02737802 0.7439722
plot(darco)

plot of chunk unnamed-chunk-20

# Adjust the data to the Synth and estimate the model
dsY=as.vector(dY)
timeid=rep(2:T,ncol(Y))
unitid=sort(rep(1:ncol(Y),T-1))
dsynthdata=data.frame(time=timeid,unit=unitid,Y=dsY)

dsynthdataprep<-
  dataprep(
    foo = dsynthdata,
    predictors = c("Y"),
    predictors.op = "mean",
    dependent = "Y",
    unit.variable = "unit",
    time.variable = "time",
    treatment.identifier = 1,
    controls.identifier = c(2:10),
    time.predictors.prior = c(2:50),
    time.optimize.ssr = c(2:50),
    time.plot = 2:100
)

dSC=synth(dsynthdataprep)
##
## X1, X0, Z1, Z0 all come directly from dataprep object.
##
##
## ****************
##  optimization over w weights: computing synthtic control unit
##
##
##
## ****************
## ****************
## ****************
##
## MSPE (LOSS V): 8.176593
##
## solution.v:
##  1
##
## solution.w:
##  2.9e-09 1.4e-08 3.7e-09 0.9999999 7.4e-09 2.9e-09 1.16e-08 6.9e-09 4e-09
path.plot(dSC,dsynthdataprep)
abline(v=50,col="blue",lty=2)

plot of chunk unnamed-chunk-21

As you can see, the $delta in the ArCo showed that the treatment effect was statistically zero, which is true. The Synthetic Control plot also showed that there was no difference cause by the intervention even though the model adjustment was not very good.

Finally, note that we have two scenarios. If there is no intervention and we do not take the first difference we will probably wrongly reject the null hypothesis of no intervention. However, if we are in the “red line” case of the plot and we take the first difference, we will not reject the null when it may be false depending on the type of variable we are dealing with.

References

Carvalho, Carlos, Ricardo Masini, and Marcelo C. Medeiros. “The perils of counterfactual analysis with integrated processes.” (2016).

To leave a comment for the author, please follow the link and comment on their blog: R – insightR.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...
Data http://ift.tt/2hiRD4m August 01, 2017 at 12:00AM

Comments

Popular posts from this blog

How Blockchain Will Redefine Social Media Marketing (And How to Prepare)

.rprofile: David Smith

CEO Daily: Wednesday, 12th July