Recently I have had a paper rejected by the SAJE. This has happened before and probably will again. In practical terms it means that I will have to revise the paper and find another journal that may be interested. We all know how it works. In this case there are some aspects that I think may inform a broader audience and is worth writing a blog about.
The paper is the one on the thickness of the day labour market with Derick Blaauw – neither the paper nor this blog should be taken to reflect on Derick – he graciously let me borrow data from a country-wide research project on day labourers for this geography-of-the-labour-market story and it has not quite worked out yet.
Obviously I think that it is a good paper with a number of things going for it:
- We use the unique set of primary data collected by Derick, his co-researcher and fieldworkers: 3800 day labourers from all over the country were interviewed over a period of 10 months in 2007. Collecting this type of primary data present numerous challenges for researchers and is no easy task.
- Not a lot has been written about the geographical economics aspects of the labour market in South Africa and nothing about day labourers. Given the context of large scale unemployment and a relatively small informal sector, the day labourers are an interesting part of the SA labour story.
- The key claims we wanted to test were whether large urban areas do allow for a better match between workers and jobs, whether it allows for day labourers to become more specialised and if these factors contribute to higher earnings.
- The geography is not just a rural-urban dummy, we have district council level population density, a measure of occupation density and interactions of other cool measures with a metro dummy.
The drawback is that we only have the one cross-section and the regression model can say something about the direction of relationships, but not about causation. The sub-editor writes:
I find the topic of this paper to be quite interesting but I am afraid that the paper fails to deal with the difference between causation and correlation. The authors find that living in larger metro area is correlated with higher earnings. However, if more able workers (even after controlling for education) tend to be in larger urban areas then the paper is overestimating the effect.
To the uninitiated, the problem is unobserved heterogeneity. There are characteristics of the day labourers that influence their wages, like ability or the fact that some are more honest or dependable than others. These end up in the error term of the regression. This is fine as long as these effects are random and uncorrelated with the explanatory variables that we do have. Unfortunately, this is unlikely to be the case and the consequence is that the results are biased and we are in violation of CLRM assumptions. We know this, the SAJE knows this, but until recently it was a limitation that people could live with as long as the results are carefully written up and the rest of the story is interesting. But in this case and probably more frequently in the future, more will be required. This serious implications for future research.
How much further out the goal posts are moving is the point that I want to get to. To solve the above problem you can go one of two routes:
- Use panel data. If you are able to observe the same people, households, firms or countries over time you can control for unobserved fixed effects and make bolder claims about causality.
- Undertake a randomized control trial. If you have a control group and you are able administer some kind of treatment to an experimental group you would be able to identify the causal effect.
It is easy to see that these solutions are non-trivial, particularly if you are also teaching a few courses every year, have limited research funding and research management thinks that you should be averaging 3 or 4 publications a year.
So what is my advice to microeconometricians trying to get into SAJE? If you have been following a panel of day labourers across the country, or if you have a RCT where you have enabled random day labourers to move to Gauteng (and others not) and you have observed changes in their earnings, controlling for everything else, you probably would not be submitting to SAJE anyway. Shortly after walking on water you would be submitting to JEG or JDE. Barring this there are a few options. You can trawl through existing data looking for exogenous variation. You will need data in which there has been some kind of change that allows you to distinguish a before and after and then you can talk about RCT-type causality. Similarly you can sharpen your econometric skills and see if you can use techniques like propensity score matching to identify RCT-type causality in cross-sectional data. An alternative is to use existing cross-section data, pool it and build pseudo-panels. All this is possible and is being done by some, but I’m not sure how sustainable it is. If you do want to collect your own primary data in panel or RCT format, know that it will take considerable time and resources and require some hard thinking during the conceptualization of the project. With that panel, reviewers will always have issues with the sample and attrition. With RCTs, there is a whole debate raging on the internet about reliability, validity and repeatability. All of this implies time and resources that most academics do not have. If you are able to do such good work, research management will have to make peace with smaller quantities of outputs in return for the better quality. It may in fact be easier to try and join a team that is already doing this sort of work. There is NIDS at UCT, firm-level work at Wits, education work at Maties and gender and migration work at UKZN. If all else fails, learn DSGE models!