academic publication, cross-section data, panel data, RCT, SAJE

The SAJE has fired a shot across all our bows

Recently I have had a paper rejected by the SAJE. This has happened before and probably will again. In practical terms it means that I will have to revise the paper and find another journal that may be interested. We all know how it works. In this case there are some aspects that I think may inform a broader audience and is worth writing a blog about.
The paper is the one on the thickness of the day labour market with Derick Blaauw – neither the paper nor this blog should be taken to reflect on Derick – he graciously let me borrow data from a country-wide research project on day labourers for this geography-of-the-labour-market story and it has not quite worked out yet.
Obviously I think that it is a good paper with a number of things going for it:
  • We use the unique set of primary data collected by Derick, his co-researcher and fieldworkers: 3800 day labourers from all over the country were interviewed over a period of 10 months in 2007. Collecting this type of primary data present numerous challenges for researchers and is no easy task.
  • Not a lot has been written about the geographical economics aspects of the labour market in South Africa and nothing about day labourers. Given the context of large scale unemployment and a relatively small informal sector, the day labourers are an interesting part of the SA labour story.
  • The key claims we wanted to test were whether large urban areas do allow for a better match between workers and jobs, whether it allows for day labourers to become more specialised and if these factors contribute to higher earnings.
  • The geography is not just a rural-urban dummy, we have district council level population density, a measure of occupation density and interactions of other cool measures with a metro dummy.
The drawback is that we only have the one cross-section and the regression model can say something about the direction of relationships, but not about causation. The sub-editor writes:
I find the topic of this paper to be quite interesting but I am afraid that the paper fails to deal with the difference between causation and correlation. The authors find that living in larger metro area is correlated with higher earnings. However, if more able workers (even after controlling for education) tend to be in larger urban areas then the paper is overestimating the effect.
To the uninitiated, the problem is unobserved heterogeneity. There are characteristics of the day labourers that influence their wages, like ability or the fact that some are more honest or dependable than others. These end up in the error term of the regression. This is fine as long as these effects are random and uncorrelated with the explanatory variables that we do have. Unfortunately, this is unlikely to be the case and the consequence is that the results are biased and we are in violation of CLRM assumptions. We know this, the SAJE knows this, but until recently it was a limitation that people could live with as long as the results are carefully written up and the rest of the story is interesting. But in this case and probably more frequently in the future, more will be required.  This serious implications for future research.
How much further out the goal posts are moving is the point that I want to get to. To solve the above problem you can go one of two routes:
  • Use panel data. If you are able to observe the same people, households, firms or countries over time you can control for unobserved fixed effects and make bolder claims about causality.
  • Undertake a randomized control trial. If you have a control group and you are able administer some kind of treatment to an experimental group you would be able to identify the causal effect.
It is easy to see that these solutions are non-trivial, particularly if you are also teaching a few courses every year, have limited research funding and research management thinks that you should be averaging 3 or 4 publications a year.
So what is my advice to microeconometricians trying to get into SAJE?  If you have been following a panel of day labourers across the country, or if you have a RCT where you have enabled random day labourers to move to Gauteng (and others not) and you have observed changes in their earnings, controlling for everything else, you probably would not be submitting to SAJE anyway.  Shortly after walking on water you would be submitting to JEG or JDE.  Barring this there are a few options. You can trawl through existing data looking for exogenous variation. You will need data in which there has been some kind of change that allows you to distinguish a before and after and then you can talk about RCT-type causality.  Similarly you can sharpen your econometric skills and see if you can use techniques like propensity score matching to identify RCT-type causality in cross-sectional data.  An alternative is to use existing cross-section data, pool it and build pseudo-panels.  All this is possible and is being done by some, but I’m not sure how sustainable it is.  If you do want to collect your own primary data in panel or RCT format, know that it will take considerable time and resources and require some hard thinking during the conceptualization of the project.  With that panel, reviewers will always have issues with the sample and attrition.  With RCTs, there is a whole debate raging on the internet about reliability, validity and repeatability. All of this implies time and resources that most academics do not have.  If you are able to do such good work, research management will have to make peace with smaller quantities of outputs in return for the better quality.  It may in fact be easier to try and join a team that is already doing this sort of work.  There is NIDS at UCT, firm-level work at Wits, education work at Maties and gender and migration work at UKZN.  If all else fails, learn DSGE models!

2 thoughts on “The SAJE has fired a shot across all our bows

  1. From a technical note, I can identify with this problem. Almost all of my work are based on case studies of some sort, whether it be related to a particular industry or a particular business. Of course, inferring broader patterns of behaviour from these case studies is challenging, but it is also quite interesting. Yet I encounter similar objections – it seems some scholars want experimental type data from case studies, as they often ask: what about controlling for this, what about controlling for that? If the data is created by a particular setting, then, yes, it is bound to be affected by a plethora of factors. It is impossible to remove all the noise!

    What you highlight is a problem facing many South African economists. The country has serious policy challenges which require at least some input from economists. We do not have the luxury to sit around and wait for another wave of data before attempting to at least interrogate what little shreds of data we do have. Our willingness to adopt American or European trends in sophisticated econometric modelling may yet be self-destructive. What purpose do we serve in our society if Rome burns while we are trying to collect the data set? Besides, whoever claims that he can prove causality using an econometric model is taking himself/herself too serious. Econometric models are human constructions – attempts to organize information in interesting ways. Nothing more. I will take a well-reasoned qualitative story anytime before believing a purely statistical analysis that has little feeling for what is going on beyond the numbers.


  2. Willem, thanks for the thoughtful comment. Since writing the post I have been thinking and talking about it quite a bit. I think there are a few points to take away and would love to hear more thoughts on this:
    1. International standards are coming to get me and I will have to do more to get published in the SAJE.
    2. Thus, the days of quick and dirty papers are numbered.
    3. This means that I have to change the way that I work: greater specialisation and more focus is required for one or two multiple-year projects.
    4. I will have to learn the latest techniques – there is too much time left in academia to hope to make a contribution with my current stock of methods knowledge.
    5. Working in a good team should make all this easier.
    6. The end result should be fewer, but hopefully better, publications.
    7. The process of generating outputs will also change. While waiting for the above I will have to make the work accessible through blogs, conference and working papers.

    Does this describe the US and EU model? Is it do-able?


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s