Chapter 3 Producing Data

Introduction

In Chapters 1 and 2, we learned many of the basic tools of exploratory data analysis (EDA). We used both graphs and numbers to describe sets of data and looked for patterns that suggest interesting conclusions or questions for further study. However, exploratory analysis alone rarely provides convincing evidence for its conclusions because striking patterns that we find in data can arise from many sources. Thus, an important step prior to EDA is ensuring that our data are of high quality.

In this chapter, we discuss methods of data collection and production that ensure quality data. Use of these methods will allow us to more confidently answer questions such as

In this chapter, we will develop the skills needed to produce trustworthy data and to judge the quality of data produced by others. The validity of the conclusions that we draw from an analysis of data depends not only on the use of the best methods to perform the analysis but also on the quality of the data. Therefore, Section 3.1 begins this chapter with a short overview on the various sources of data. This is followed by an in-depth discussion of the two main sources for quality data: designed experiments and sample surveys. We study these two sources in Sections 3.2 and 3.3, respectively.

Should an experiment or a sample survey that could possibly provide interesting and important information always be performed? How can we safeguard the privacy of subjects in a sample survey? What constitutes the mistreatment of people or animals who are studied in an experiment? These are questions of ethics. In Section 3.4, we address ethical issues related to the design of studies and the analysis of data.