I was inspired by the post "Why you should stop worrying about deep learning and deepen your understanding of causality instead" to write up some of the resources I've used over the past year as I myself have tried to learn more about causality.
The field of Causal Inference has become much more rich and interesting over the past 20 years as a number of new statistical tools were created to help improve the bias inherent in model dependent statistical inference. I find it's best to start with understanding the split between prediction and causal inference that has been in the field for quite a while. Each of the following three references goes into much more detail about how many of the same tools are used between causal inference and prediction, but the meaning assigned to the model, and in particular how you evaluate the model for appropriateness is very different depending on what you're trying to do.
Statistical Modeling: The Two Cultures : http://projecteuclid.org/download/pdf_1/euclid.ss/1009213726
To Explain or Predict: http://www.galitshmueli.com/content/explain-or-predict
My team and I spent a lot of time dealing with observational data. Therefore much of my focus has been about how to make better decisions when dealing with observational data and quasiexperimental study design. There's been a lot of research in this area because so many medical studies are based on observational data. The Evidence Based Medicine movement came out of a desire to improve clinical decision-making outcomes and provides many ideas that can be reused within my own field. One of the pieces that is fantastic for decision-making in general, is the hierarchy of evidence. This provides a framework within which to base your decision making and understand how biased your study could possibly be.
One of the articles I really enjoyed coming across was by Rubin: "For objective causal inference, design trumps analysis". In it he briefly covers the counterfactual framework, and reworks an observational study through the lens of experimental design, using the appropriate tools to approximate a true experiment to the best of his ability. It definitely gave me a much better understanding about the role of treatment assignment and how it participates and causal inference.
And now onto books!
The first book is particularly awesome and mathy. I find that it hops right in and covers the key concepts you need to understand about modern causal inference theory. That is both a strength, and weakness. If you're not up to date on reading mathematical notation, it can be a little challenging.
This was the first book I got. I actually had the first edition, and upgraded to the second edition when it came out, definitely worth it. I found many of the topics more approachable in this book than the previous book, but they restrict the set of tools they give you. Therefore I found it a great place to start and become comfortable with counterfactual theory and causal diagrams, but I eventually had to upgrade to the book out of the Harvard school of public health.
Many papers you encounter will refer back to the work in this book, which is largely a compendium of the research done by Rubin. I found it an additional perspective to many of the concepts covered in the previous two books. So probably not required, but nice to round things out.
This book showed me how little I really knew. It was the last one I purchased and I still haven't finished it. I really need to sit down and compare the contents of this textbook against the second half (Model Dependent Causal Inference) of the Causal Inference book out of Harvard.
OK. This book hasn't shipped, and I haven't read it. But I'm very excited by it. Judea Pearl's other book: "Causality: Models, Reasoning and Inference" is well-regarded, but also known to be very difficult as it connects together causal reasoning in several different fields into one overarching framework. He also has a blog we can stay up-to-date on some of the latest books and research in this area: http://causality.cs.ucla.edu/blog/index.php/2016/02/12/winter-greeting-from-the-ucla-causality-blog-2/ .
Lastly, one of the early papers I encountered that I felt did a good job in this area: Sekhon, J. S. (2011). Multivariate and propensity score matching software with automated balance optimization: The Matching package for R. _ Journal of Statistical Software _ 42(7). http://www.jstatsoft.org/v42/i07 . I found his package rather straightforward to use and high enough performance to work against the large data sets I deal with on a regular basis.
If you're ever in the Seattle area and want to chat about these things, I would love to do coffee.