Book Excerpt: Learning the Language of Stata by Amy Moore


There are computer programs which require a knowledge of coding syntax.  Some statistical packages have both coding and user-friendly options. STATA is a statistical software program which allows you to code in its own language. STATA is a statistical software package which is used by finance, clinical researchers, epidemiologist, and other quantitative fields. It is built to handle multiple operations with very few lines of code.  If you lack the initiative to learn all the nuances of SAS or R, then STATA may be your program of choice. If you want a less mainstream computer program than SPSS, you may try to use STATA.


You need to know the dance of programming before you ‘tango’ with STATA or any other statistical language. Programming skills help one to navigate through statistical analysis.  You should know how to read a file into your software of choice. Once the file is read or viewable, the data should be manipulated until you are familiar with it. Certain data structures call for recoding or transformation. Others may require you to analyze the data using statistical methodology. But in the end, you will present your results in an interpretable way for the reader. So again, here is the dance:

1st Step: Read that file in (kick one and two)

2nd Step: Run some descriptive and frequencies (twirl )

3rd Step: Data Manipulation (kick and turn)

4th Step: Run a Statistical Analytic procedure (high kick)

5th Step: Illustrate Your Results for Interpretation (cha cha cha) 

Components of STATA

STATA has several main components: the command prompt window, the data view, and the results window. Your syntax should be executed in the command view, which is pointed out using the red arrow in figure 1.1. The results window is indicated with the green arrow in figure 1.1. If you want to view your data, the red arrow in figure 1.2 points to the data view. In figure 1.3 we can see exactly what we imported into our session.

Reading Files in STATA

There are two ways to read a file into STATA, I will show you both ways. The first approach is to import the file using the ‘IMPORT’ button under the file tab. In figure 1.4, we chose the csv import option because we have a csv file.  Figure 1.5 shows the file we will work with in this chapter, cancer.csv.  After you choose the file, another interface will pop-up that show a preview of the contents (see figure 1.6).’

Checking the Content of Your File

Once you read the file into STATA, it needs to be checked for variable characteristics. Especially if you had a third party create the file as a data deliverable. Any file you receive will require a quick validation of its contents ensure it is complete. One of the checks that you can perform is a verification of the variable names, variable formats, and number observations. A codebook is a universal document that includes all the characteristics of the file for each variable.

You can generate a file’s codebook by using the data tab as illustrated in figure 1.8a and 1.8b. The path for the codebook is the following: Data-> Describe Data-> Describe Data Contents. The other method is to simply type the word ‘codebook’ and the variable names at the command prompt as in Figure 1.6. You can also use internal short- hand sub-commands to avoid typing out each variable name. If you put a hyphen between the first and last variable name- the system will generate the details for all the variables in the file.

Here is the link to purchase the book:

-Moore to follow Amy

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.