*Bounty: 50*

*Bounty: 50*

I am working with industry level data and trying to solve an issue with omitted variables bias, by using an instrument. The problem with my instrument is that it only varies relatively little. I.e. for most my industries its value is the same. Or in other words it only varies by groups encompassing several industries.

One of my teachers told me that I can fix that problem by putting additional explanatory variables in the first stage and interacting them with the instrument. And these variables do not have to be exogenous (with regards to my outcome variable).

**The following is what I have thought about this, if someone has a better idea how solve the problem I describe above, I would very much appreciate that too!**

So basically, first I was thinking of this:

**Equation of interest:** $$ y_i = alpha_0 + alpha_1 x_i + alpha_2 mathbf{X}+ epsilon_i $$

Where $y_i$ is the outcome, $alpha_1$ the coefficient of interest, $x_i$ is the endogenous variable, $mathbf{X}$ are covariates.

**First stage:** $$x_i = beta_0 + beta_1 z_i + beta_2 w_i + beta_3 z_i * w_i + beta_4 mathbf{X} + eta_i $$

To get $hat x_i$, $z_i$ is an instrument that only has an effect on $y_i$ through $x_i$, $w_i$ is a covariate correlated with $epsilon_i $

**Second stage:** $$y_i = gamma_1 + alpha_1 hat x_i + gamma_2 w_i + gamma_3 mathbf{X} + e_i$$

Then I thought I must have understood something wrong, because this seems kind of weird. But now I read a paper – Nizalova, Murtazashvili (2014) – (https://www.degruyter.com/document/doi/10.1515/jem-2013-0012/html) that explains why interaction effects of one exogenous and one endogenous variable are still consistent.

And another paper – Bun, Harrison (2019) – (https://www.tandfonline.com/doi/full/10.1080/07474938.2018.1427486) that argues similarly: specifically they write:

… we have that, even if we have an endogenous regressor x, the OLS estimator of the ceofficient $beta_{xw}$ is consistent and standard heteroskeasticity-robust OLS inference applies.

and

… we show that endogeneity bias can be reduced to zero for the OLS estimator as far as the interaction term is concerned.

Does this mean what I outlined above works? I.e. the interaction term is not endogenous (wrt to $y_i$) eventhough $W_i$ is?