Fixed Effects

Stats
Author

Yiğit Aşık

Published

October 12, 2025

Entity & Time Fixed Effects

I mentioned fixed effects on difference in differences post but I wanted to elaborate a bit further on the topic and show where it’s useful. I’m diving right into an example and explain along the way.

import pandas as pd
import numpy as np

from matplotlib import pyplot as plt
import seaborn as sns

import statsmodels.api as sm
import statsmodels.formula.api as smf
from linearmodels.panel import PanelOLS

import warnings

warnings.filterwarnings('ignore')

pd.set_option('display.float_format', lambda x: '%.3f' % x)
df = pd.read_csv('Grunfeld.csv', index_col=0)
df.head()
invest value capital firm year
1 317.600 3078.500 2.800 General Motors 1935
2 391.800 4661.700 52.600 General Motors 1936
3 410.600 5387.100 156.900 General Motors 1937
4 257.700 2792.200 209.200 General Motors 1938
5 330.800 4313.200 203.400 General Motors 1939

I have data from 11 firms: Their capital, market value, investment for each year between 1935 to 1954. This is a panel data, since I have multiple observations for each firm, on different time periods.

Let’s say that I am interested in the relationship between market value and investment. For simplicity, if we had data on a single year we could estimate the following for each firm i:

\(\displaystyle invest_i = \beta_0 + \beta_1 value_i + \beta_2 capital_i + \epsilon_i\)

However, there are things that we miss with this approach:

  1. There might be firm-level variables that we would like to have in the model. These are assumed to be constant for a firm.

The idea is pretty neat actually. Think of having two years of data. Let’s say 1935 and 1936:

\(\displaystyle invest_{i \, 1936} = \beta_0 + \beta_{1}value_{i \, 1936} + \beta_{2}capital_{i \, 1936} + \beta_{3}\alpha_i + \epsilon_{i \, 1936}\)

\(\displaystyle invest_{i \, 1935} = \beta_0 + \beta_{1}value_{i \, 1935} + \beta_{2}capital_{i \, 1935} + \beta_{3}\alpha_i + \epsilon_{i \, 1935}\)

Now, if you take the difference what happens is those \(\beta_{3}\alpha_i\) terms get cancelled. What you’re left with is:

\(\displaystyle invest_{i \, 1936} - invest_{i \, 1935} = \beta_{1}(value_{i\,1936} - value_{i\,1935}) + \beta_{2}(capital_{i\,1936} - capital_{i\,1935}) + (\epsilon_{i \, 1936} - \epsilon_{i \, 1935})\)

I believe this is a very intuitive example. Accounting for unobserved firm-level characteristics is just adding firm as dummy in the regression!

  1. The other thing that I haven’t mentioed above is the effects that are constant within a time period but may differ between years. These are shared between firms. Think of things like inflation, market trends etc.

Well, I’ve got the idea. Let’s add that as a dummy as well?

lm = smf.ols(
    'invest ~ value + capital + C(firm) + C(year)',
    data=df
)
res = lm.fit()

res.summary()
OLS Regression Results
Dep. Variable: invest R-squared: 0.953
Model: OLS Adj. R-squared: 0.945
Method: Least Squares F-statistic: 122.1
Date: Sun, 12 Oct 2025 Prob (F-statistic): 5.20e-108
Time: 01:14:10 Log-Likelihood: -1153.0
No. Observations: 220 AIC: 2370.
Df Residuals: 188 BIC: 2479.
Df Model: 31
Covariance Type: nonrobust
coef std err t P>|t| [0.025 0.975]
Intercept 18.0876 18.656 0.970 0.334 -18.715 54.890
C(firm)[T.Atlantic Refining] -112.5008 17.752 -6.337 0.000 -147.520 -77.482
C(firm)[T.Chrysler] -13.5993 17.540 -0.775 0.439 -48.199 21.001
C(firm)[T.Diamond Match] 16.4928 15.692 1.051 0.295 -14.462 47.448
C(firm)[T.General Electric] -241.0850 28.000 -8.610 0.000 -296.319 -185.851
C(firm)[T.General Motors] -101.7696 55.177 -1.844 0.067 -210.615 7.075
C(firm)[T.Goodyear] -77.9628 16.435 -4.744 0.000 -110.383 -45.543
C(firm)[T.IBM] -6.4573 16.271 -0.397 0.692 -38.554 25.640
C(firm)[T.US Steel] 100.5492 28.438 3.536 0.001 44.450 156.648
C(firm)[T.Union Oil] -56.7936 16.403 -3.462 0.001 -89.151 -24.436
C(firm)[T.Westinghouse] -41.7165 17.483 -2.386 0.018 -76.204 -7.229
C(year)[T.1936] -16.9592 21.518 -0.788 0.432 -59.407 25.488
C(year)[T.1937] -36.3756 22.364 -1.627 0.106 -80.492 7.741
C(year)[T.1938] -35.6237 21.162 -1.683 0.094 -77.370 6.122
C(year)[T.1939] -63.0994 21.505 -2.934 0.004 -105.522 -20.677
C(year)[T.1940] -39.8248 21.626 -1.842 0.067 -82.486 2.836
C(year)[T.1941] -16.4878 21.529 -0.766 0.445 -58.957 25.982
C(year)[T.1942] -17.9993 21.275 -0.846 0.399 -59.967 23.968
C(year)[T.1943] -37.7724 21.415 -1.764 0.079 -80.016 4.471
C(year)[T.1944] -38.3201 21.459 -1.786 0.076 -80.652 4.012
C(year)[T.1945] -49.5395 21.687 -2.284 0.023 -92.322 -6.757
C(year)[T.1946] -27.7544 21.866 -1.269 0.206 -70.888 15.379
C(year)[T.1947] -34.8775 21.589 -1.616 0.108 -77.464 7.709
C(year)[T.1948] -38.3307 21.734 -1.764 0.079 -81.204 4.542
C(year)[T.1949] -65.2008 21.901 -2.977 0.003 -108.404 -21.998
C(year)[T.1950] -67.3877 22.028 -3.059 0.003 -110.841 -23.935
C(year)[T.1951] -54.8346 22.437 -2.444 0.015 -99.095 -10.574
C(year)[T.1952] -56.4890 22.819 -2.475 0.014 -101.504 -11.474
C(year)[T.1953] -58.5126 23.819 -2.457 0.015 -105.500 -11.525
C(year)[T.1954] -81.7939 24.204 -3.379 0.001 -129.540 -34.047
value 0.1167 0.013 9.022 0.000 0.091 0.142
capital 0.3514 0.021 16.696 0.000 0.310 0.393
Omnibus: 32.466 Durbin-Watson: 0.988
Prob(Omnibus): 0.000 Jarque-Bera (JB): 180.276
Skew: 0.311 Prob(JB): 7.14e-40
Kurtosis: 7.391 Cond. No. 3.92e+04


Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
[2] The condition number is large, 3.92e+04. This might indicate that there are
strong multicollinearity or other numerical problems.

You can fit the same with PanelOLS, like below, and get a cleaner table.

fe_model = PanelOLS.from_formula('invest ~ value + capital + EntityEffects + TimeEffects', data=df.set_index(['firm', 'year']))
fe_res = fe_model.fit()

fe_res.summary
PanelOLS Estimation Summary
Dep. Variable: invest R-squared: 0.7253
Estimator: PanelOLS R-squared (Between): 0.7637
No. Observations: 220 R-squared (Within): 0.7566
Date: Sun, Oct 12 2025 R-squared (Overall): 0.7625
Time: 01:14:15 Log-likelihood -1153.0
Cov. Estimator: Unadjusted
F-statistic: 248.15
Entities: 11 P-value 0.0000
Avg Obs: 20.000 Distribution: F(2,188)
Min Obs: 20.000
Max Obs: 20.000 F-statistic (robust): 248.15
P-value 0.0000
Time periods: 20 Distribution: F(2,188)
Avg Obs: 11.000
Min Obs: 11.000
Max Obs: 11.000
Parameter Estimates
Parameter Std. Err. T-stat P-value Lower CI Upper CI
value 0.1167 0.0129 9.0219 0.0000 0.0912 0.1422
capital 0.3514 0.0210 16.696 0.0000 0.3099 0.3930


F-test for Poolability: 18.476
P-value: 0.0000
Distribution: F(29,188)

Included effects: Entity, Time

One more thing though, check covariance type on both tables (nonrobust, unadjusted). It means errors are assumed to be independent which might be violated here. Think about it, observations are grouped in the sense that they belong to same firm. So, they share some unobserved component. Hence, errors might be correlated within each firm (across year).

For the same reason, errors might be correlated within each year (e.g., firms are subject to same inflation).

So, we should allow residuals to be correlated within groups.

It’s possible to use clustered covariance type with statsmodels but it doesn’t allow it to be 2 dimensional. In other words, you either cluster by entity dimension (e.g., firm) or time dimension (e.g., year). PanelOLS, on the other hand, allows for two-way clustering.

fe_model = PanelOLS.from_formula('invest ~ value + capital + EntityEffects + TimeEffects', data=df.set_index(['firm', 'year']))
fe_res = fe_model.fit(cov_type='clustered', cluster_entity=True, cluster_time=True)

fe_res.summary
PanelOLS Estimation Summary
Dep. Variable: invest R-squared: 0.7253
Estimator: PanelOLS R-squared (Between): 0.7637
No. Observations: 220 R-squared (Within): 0.7566
Date: Sun, Oct 12 2025 R-squared (Overall): 0.7625
Time: 01:15:17 Log-likelihood -1153.0
Cov. Estimator: Clustered
F-statistic: 248.15
Entities: 11 P-value 0.0000
Avg Obs: 20.000 Distribution: F(2,188)
Min Obs: 20.000
Max Obs: 20.000 F-statistic (robust): 84.060
P-value 0.0000
Time periods: 20 Distribution: F(2,188)
Avg Obs: 11.000
Min Obs: 11.000
Max Obs: 11.000
Parameter Estimates
Parameter Std. Err. T-stat P-value Lower CI Upper CI
value 0.1167 0.0117 10.015 0.0000 0.0937 0.1397
capital 0.3514 0.0447 7.8622 0.0000 0.2633 0.4396


F-test for Poolability: 18.476
P-value: 0.0000
Distribution: F(29,188)

Included effects: Entity, Time

I feel like this one is a very intuitive example but for more, you can check this.