OLS is a technique of estimating** linear** relations between a dependent variable on one hand, and a set of explanatory variables on the other. For example, you might be interested in estimating how workers’ wages (W) depends on the job experience (X), age (A) and education level (E) of the worker. Then you can run an OLS regression as follows:

W_{i} = b_{0} + b_{1}X_{i} + b_{2}A_{i} + b_{3}E_{i} + u_{i}.

Note that linearity of the regression model in OLS depends on the linearity of the parameters and not the linearity of the explanatory variables.

**STATA Command: **See here

Suppose we are interested in understanding the effect of education of a person and experience on the job on wages of that person. To do so, we will regress *wage *on the two explanatory variables; *educ* (education) and *exper *(experience). (The data can be found here. )

This can be easily done in STATA using the following command:

**reg **wage educ exper

- Alternatively one can type
**regress**too instead of**reg**. - STATA then estimates 3 parameters: the
*intercept*term, the coefficient of*educ*and the coefficient of*exper*. - The coefficient of
*educ*means that for one year increase in schooling wages of that person will increase by $2.95. - The coefficient of
*exper*implies that for every extra year spent on the job increases the person’s wages by $0.38. - The constant shows the average wage of a person with no schooling and no experience on the job. The value of the constant is -$24.38 which does not make sense since in our data the minimum years of schooling is 8. Thus, one needs to be careful while interpreting the constant since depending on the regression, the constant might or might not have a useful interpretation.
- The fourth and fifth column show the t-statistic and p-value of the null hypothesis that the coefficient is equal to zero. For all the coefficients we can reject that hypothesis since the p-value is less than 1%.
- The 95% Confidence interval implies that there is a 95% probability that the interval will contain the population parameter.
- The probability (
*Prob > F*) tests whether the independent variables have no power to explain the dependent variables or not. Given a p-value of 0.000% we can reject the null hypothesis. In other words, the null hypothesis is that joint test whether all the coefficients are equal to zero or not. We can reject such a hypothesis and conclude that jointly the coefficients are significantly different from zero and they can predict the dependent variable. - R
^{2}is the percentage of the variance of the dependent variable to the variance of the independent variable. Thus, how much of the variation in the dependent variable that can be explained by the independent variables. - Adjusted R
^{2}compensates for the number of variables in the model. Adding another variable to the model will increase the Adjusted R^{2}only when the new variable improves the model fit more than expected by chance alone. Thus, Adjusted R^{2}will be less than R^{2}and it can be negative too unlike R^{2}.

While estimating the parameters, it is customary to adjust the standard errors of the parameter estimates for heteroskedasticity. This is done by writing the following command:

**reg** wage educ exper**, r**

- Alternatively one can type
**robust**instead of**r**after the comma. - Note that the option ‘robust’ in STATA, only changes the standard error of the parameter estimates but not the estimates themselves.
- In this example, correcting for heteroskedasticity increased the standard error of education but reduced the standard error of experience.

Sometimes the theoretical model dictates that the intercept term is zero. Suppose we want to understand the relationship between the weight and length of a car. When the length of a car is zero then its weight should also be zero.

To run a regression of weight on length of the car with the additional impose restriction in STATA, one needs to write the following command (data can be found by typing: **webuse** auto, clear ) :

**reg** weight length**, noconstant **

- In this case, STATA then estimates only 1 parameter: the coefficient of length.