**Chapter 1: What is Statistics?**

People often confuse statistics with mathematics, which is an understandable assumption. Statistics contains mathematical calculations. There are some math symbols in most statistical method equations. You could potentially classify statistics as a form of applied mathematics. Statistics has a lot of Greek symbols, graphs, and algebraic equations in statistical text; that is true of this book. Statistics is the organization, analysis, and interpretation of data.

### Data

Occasionally during the day, you may hear people mention this thing referred to as *data*. Any source of organized information in the form of variables (columns) and observations (rows) is data. Table 1-1 shows us an example of data with three variables and three observations. A variable is not just a column in your data. It represents an entity which can describe numbers, characters, or symbols. Each variable contains values or attributes that help with its classification. The rows signify how many cases are in your data. See Table 1-1.

Interval Variable (ID) | Weight (in pounds) | Gender |

100 | 150 | Male |

101 | 160 | Female |

102 | 165 | Male |

#### Table 1-1: Example of Data

In Table 1-1, we have three individuals (100, 101, and 102) and three variables: ID, weight, and gender. We know how many people are in the data by looking at distinct values in the ID column. There are a few things to notice in the data file. The first thing to notice is that each column has an assigned format. Variable format, the appearance of the values, is determined by its type. There are four variable types: nominal, ordinal, ratio, and interval. Nominal variables contain values which are categorical.

Ordinal variables are ordered categorical values. Ratio variables are continuous measurements based on a natural zero starting point. And interval variables are continuous measurements which have an equal interval between variables. As we look at Table 1-1, we have one interval variable (ID); one ratio variable (Weight), and a nominal variable (Gender). Table 1-2 provides more examples of typical variable types and formats.

Variable Type | Variable Format | Variable Name | Value |

Nominal | Categorical | Gender | Male; Female |

Ordinal | Ordered Categorical | Severity (scale 1 to 3) | 1= Mild 2= Moderate 3= Severe |

Ratio | Continuous measurement based on a natural zero value. Can contain Decimals. | Weight (in pounds) | 120 130 150 165 |

Interval | Ordered Continuous Measurement | Temperature in Fahrenheit | 32 33 31 30 |

#### Table 1-2: Variables by Format and Type

One could consider Table 1-2, as the defined list of variable types and formats. We care about how the data is arranged in the file. Because our file reflects what information is available on our population. For instance, if we wanted to gather information on the ages of people in a classroom, age would be a piece of information we would collect in our data. So our interests drive what we collect in the data, and inevitably help us to arrive at conclusions about the population under study. In other words, our data is a vital tool for doing statistics.

#### Data Collection Do I Include Everybody?

Suppose I want to be able to quantify how many people are wearing red shirts in a classroom? I would determine the number of people wearing red shirts. Since the number of people wearing red shirts is countable, we refer to our data collection as *your true population*. The term, *countable*, is anything we can visually see and tally in our results. If our data does not represent all the members of the true population, then we have the *sample population*. And there are instances where it is more feasible to use a sample versus the true population. Table 1-3 shows examples where one may have the true population versus a sample.

Here is a link to the book for purchase: https://moorestat.com/product/statistics-for-the-math-illiterate/

-Moore to Follow Amy