Roger D. Peng is a Professor of Biostatistics at the Johns Hopkins Bloomberg School of Public Health. He is also a co-founder of the Johns Hopkins Data Science Specialization.
Summary
I think my proposed definition of a successful data analysis is challenging (and perhaps unsettling) because it suggests that data analysts are responsible for things outside the data. In particular, they need to understand the context around which the data are collected and the audience to which results will be presented. I also think that’s why it took so long for me to come around to it. But I think this definition explains much more clearly why it is so difficult to be a good data analyst. When we consider data analysis using traditional criteria developed by statisticians, we struggle to explain why some people are better data analysts than others and why some analyses are better than others. However, when we consider that data analysts have to juggle a variety of factors both internal and external to the data in order to achieve success, we see more clearly why this is such a difficult job and why good people are hard to come by.
Another implication of this definition of data analysis success is that it suggests that human nature plays a big role and that much of successful data analysis is essentially a successful negotiation of human relations. Good communication with an audience can often play a much bigger role in success than whether you used a linear model or quadratic model. Trust between an analyst and audience is critical when an analyst must make choices about what to present and what to omit. Admitting that human nature plays a role in data analysis success is difficult because humans are highly subjective, inconsistent, and difficult to quantify. However, I think doing so gives us a better understanding about how to judge the quality of data analyses and how to improve them in the future.