DMCA
Computing person and firm effects using linked longitudinal employer-employee data,” Center for Economic Studies, US Census Bureau, (2002)
Citations: | 140 - 16 self |
BibTeX
@MISC{Abowd02computingperson,
author = {John M Abowd and Robert H Creecy and John M Abowd and Robert H Creecy and Francis Kramarz},
title = {Computing person and firm effects using linked longitudinal employer-employee data,” Center for Economic Studies, US Census Bureau,},
year = {2002}
}
OpenURL
Abstract
Abstract In this paper we provide the exact formulas for the direct least squares estimation of statistical models that include both person and firm effects. We also provide an algorithm for determining the estimable functions of the person and firm effects (the identifiable effects). The computational techniques are also directly applicable to any linear two-factor analysis of covariance with two high-dimension non-orthogonal factors. We show that the application of the exact solution does not change the substantive conclusions about the relative importance of person and firm effects in the explanation of log real compensation; however, the correlation between person and firm effects is negative, not weakly positive, in the exact solution. We also provide guidance for using the methods developed in earlier work to obtain an accurate approximation. Introduction Two related articles Abowd, Kramarz and Margolis (AKM, 1999) and Abowd, Finer and Kramarz (AFK, 1999) provided a basic statistical framework for decomposing wage rates into components due to individual heterogeneity (measured and unmeasured) and firm heterogeneity (measured and unmeasured). The first of these articles, AKM, analyzes French data. The second of these articles, AFK, analyzes data from the State of Washington. Both AKM and AFK used statistical approximations to estimate the decomposition of wage differentials into individual and employer components. In this article we show new methods that provide the exact solution to the estimation problem. We analyze the same French data as AKM and the same American data as AFK. The exact results fully confirm the approximate results for the State of Washington but slightly change the explanation for wage differentials for France. The reason for the difference in the French results is that the computations for the approximation in AKM were limited by the capacity of the computers on which they were generated. The approximation was not sufficiently accurate. The same approximation, using more terms in the conditioning set, worked well for the analysis of the State of Washington. Section 2 summarizes the basic statistical model. Section 3 provides the details for identification and estimation by fixed-effect methods. Section 4 presents the data analysis comparing the original approximate results with the exact results. Section 5 concludes. Basic Statistical Model The dependent variable is the natural logarithm of the rate of compensation per unit of time, it y , observed for individual i at date t, expressed as a function of individual heterogeneity, firm heterogeneity, and measured time-varying characteristics: where , and there is no intercept included in x it . 1 The function J(i,t) indicates the employer j of i at date t where There are T i observations per individual and T N * total observations. The first component of equation In order to state the basic statistical relations more clearly we restate equation Identification and Estimation by Fixed-effect Methods The normal equations for least squares estimation of fixed person, firm, and characteristic effects are of very high dimension. Estimation of the full model by fixed-effect methods requires special algorithms to deal with the high dimensionality of the problem. After completing work on AFK and AKM, which use statistical approximations, we developed new algorithms that permit the exact least squares estimation of all the effects in equation (2). These algorithms, which are based on the iterative conjugate gradient method, deal with the high dimensionality of the data by using sparse matrices. Our methods have some similarity to those used in the animal and plant breeding literature. 3 Because of the way these algorithms work, conventional methods for assuring that the effects are identified (estimable) do not work. Thus, we also developed appropriate, new, methods for computing the estimable functions of interest based on equation (3) below. Least Squares Normal Equations The full least squares solution to the estimation problem for equation In both of our estimation samples, the cross-product matrix on the left-hand side of equation Identification of Individual and Firm Effects Many interesting economic applications of equation 4 The usual technique of sweeping out singular row/column combinations from the normal equations The identification problem for the person and firm effects can be solved by applying methods from graph theory to determine groups of connected individuals and firms. Within a connected group of persons/firms, identification can be determined using conventional methods from the analysis of covariance. Connecting persons and firms requires that some of the individuals in the sample be employed at multiple employers. When a group of persons and firms is connected, the group contains all the workers who ever worked for any of the firms in the group and all the firms at which any of the workers were ever employed. In contrast, when a group of persons and firms is not connected to a second group, no firm in the first group has ever employed a person in the second group, nor has any person in the first group ever been employed by a firm in the second group. From an economic perspective, connected groups of workers and firms show the realized mobility network in the economy. From a statistical perspective, connected groups of workers and firms block-diagonalize the normal equations (see equation The following algorithm constructs G mutually-exclusive groups of connected observations from the N workers in J firms observed over the sample period. 4 Standard statistical references, for example 4 For g = 1, ..., repeat until no firms remain: 5 The first firm not assigned to a group is in group g. Repeat until no more firms or persons are added to group g: Add all persons employed by a firm in group g to group g. Add all firms that have employed a person in group g to group g. End repeat. End for. At the conclusion of the algorithm, the persons and firms in the sample have been divided into G groups. The number of individuals in each group is N g . The number of employers in each group is J g . Some groups contain a single employer and, possibly, only one individual. For groups that contain more than one employer, every employer in the group is connected (in the graph-theoretic sense) to at least one other employer in the group. This algorithm finds all of the maximally connected sub-graphs of a graph. The relevant graph has a set of vertices that is the union of the set of persons and the set of firms and edges that are pairs of persons and firms. An edge (i,j) is in the graph if person i has worked for firm j. 5 Normal Equations after Group Blocking Our identification argument can be clarified by considering the normal equations after reordering the person and firm effects so that those associated with each group are placed in the design matrix in ascending order. For simplicity, let the arbitrary equation determining the unidentified effect simply set that effect equal to zero, i.e, set one person or firm effect equal to zero in each group. Thus, the column associated with this effect can be removed from the reorganized design matrix and the column associated with the group mean is suppressed (recall that there is no constant in X). The resulting normal equations are: The normal equations have a sub-matrix with block diagonal components. This matrix is of full rank and the solution for the parameter vector is unique. We do not solve equation Characteristics of the Groups Estimation by Direct Solution of the Least Squares Problem Appendix 2 shows the exact algorithm used to solve equation 4. Some Results Comparing AKM, AFK, and Direct Least Squares Summary of Data Sources The French data are based on a collection of employer payroll reports called the Déclaration annuelles des données sociales. These consist of a 1/25