When you begin a data analysis project, you typically begin by analyzing each variable independently to describe the data you have and assess its quality. The next step is then to explore the relationships that exist among the variables. These relationships might lead you to draw certain inferences or conclusions about the population the data represents. The conclusions might lead to a mathematical model that predicts the results for data that's not currently in your data set. No data analysis, however, is effective until it leads to a decision or action step.
The simplest form of data analysis is descriptive analysis. Descriptive analysis lists and summarizes the values of each variable in a data set. For example, if survey respondents provided a rating from one to 10 for a particular question, a descriptive analysis might show the number and percentage of respondents for each rating, the average and median ratings, the mode or most common rating, and some measure of central tendency such as the standard deviation. Descriptive analysis helps you become familiar with a data set and to identify problems with the data, such as respondents who didn't provide any rating at all or data that shows a response of "99."
Once you understand the data you have, the next step is to start looking for relationships among data elements. This is called exploratory data analysis, and typically focuses on correlations among variables. For example, one data set shows an extremely high correlation between the number of cavities a child has and the size of her vocabulary. However, this does not suggest that if you allow your child to get more cavities, her vocabulary will also grow. There might be other factors that are driving the results, such as age, that you don't have in your data set.
To develop the Consumer Confidence Index, the Conference Board doesn't ask every consumer about his confidence in the economy. It uses inferential analysis to draw conclusions about U.S. consumers based on data from a smaller sample of the population. It's important to understand the sampling method used in inferential analysis because you can often draw very different conclusions from the same data set by selecting different samples. Like many inferential analyses, the Consumer Confidence Index selects a random sample from its data set so that the result is approximately the same, regardless of the sample you choose.
Predictive analytics are very popular in business intelligence applications. The objective is to use data you have to predict an unknown outcome, and then to take action based on that prediction. For example, insurance companies use data such as gender, age, marital status and credit score to predict which customers are most likely to have an accident. Then they increase insurance rates for customers who fall in to the high-risk groups. Analysts develop predictive models by training the model on a portion of the data set where the outcome is known, and then applying the model to the remaining data where the outcome is unknown.