Statistical procedures are good to know but what about the data structure? What type of data structure do you need for a linear regression? What type of structure do you need for an ANOVA? What about a Chi Square or a T-Test? Linear Mixed Models? You need to be able to recognize how your data should be structured before you analyze it. For instance, a repeat measures design calls for repeat measures or multiple records per case id. Case id representing each subject in your study. So if you do not have multiple observations per case id then you should not do a repeat measures analysis. If your independent variable has only two levels , ANOVA is not the correct statistic to perform. If you are performing a t-test with multiple records for some subjects and not others- redundancy has its price. Data mistakes or structural problems with the data do not go away by itself.
You might have to restructure your data. Now I highly recommend working off a copy of your work file. That way you avoid overwriting your original file. One popular data maneuver is the transpose. You looked at your file and you realize that the repeat measures are rows instead of columns. Or you might need the log transformation of a variable to ensure its normally distributed in your analysis. You might need to make a continuous variable a binomial to run a logistic regression. It all depends on what you are trying to do…if the analysis calls for no missing data then you have to account for missings. Occasionally I help people structure their data for analysis. It’s very easy to walk someone through it. But the ideal situation is having a file structure hat was planned out from the very beginning. Get that data in shape!
If you enjoyed this blog, more to come-Amy