A distribution is if one side of the histogram is a mirror image of the other side.

Histograms are one of the most common graphs used to display numeric data. Anyone who takes a statistics course is likely to learn about the histogram, and for good reason: histograms are easy to understand and can instantly tell you a lot about your data.

Here are three of the most important things you can learn by looking at a histogram. 

Shape—Mirror, Mirror, On the Wall…

If the left side of a histogram resembles a mirror image of the right side, then the data are said to be symmetric. In this case, the mean (or average) is a good approximation for the center of the data. And we can therefore safely utilize statistical tools that use the mean to analyze our data, such as t-tests.

If the data are not symmetric, then the data are either left-skewed or right-skewed. If the data are skewed, then the mean may not provide a good estimate for the center of the data and represent where most of the data fall. In this case, you should consider using the median to evaluate the center of the data, rather than the mean.

Did you know...

If the data are left-skewed, then the mean is typically LESS THAN the median.    

If the data are right-skewed, then the mean is typically GREATER THAN the median.

A distribution is if one side of the histogram is a mirror image of the other side.

Span—A Little or a Lot?

A distribution is if one side of the histogram is a mirror image of the other side.
Suppose you have a data set that contains the salaries of people who work at your organization. It would be interesting to know where the minimum and maximum values fall, and where you are relative to those values. Because histograms use bins to display data—where a bin represents a given range of values—you can’t see exactly what the specific values are for the minimum and maximum, like you can on an individual value plot. However, you can still observe an approximation for the range and see how spread out the data are. And you can answer questions such as "Is there a little bit of variability in my organization's salaries, or a lot?"

Outliers (and the ozone layer)

Outliers can be described as extremely low or high values that do not fall near any other data points. Sometimes outliers represent unusual cases. Other times they represent data entry errors, or perhaps data that does not belong with the other data of interest. Whatever the case may be, outliers can easily be identified using a histogram and should be investigated as they can shed interesting information about your data. 

A distribution is if one side of the histogram is a mirror image of the other side.

Rewind to the mid-1980s when scientists reported depleting ozone levels above Antarctica. The Goddard Space Center had studied atmospheric ozone levels, but surprisingly didn’t discover the issue. Why? The analysis they used automatically eliminated any Dobson readings below 180 units because ozone levels that low were thought to be impossible.

  Inspecting Distributions

Making a statistical graph is not an end in itself. After all, a computer or graphing calculator can make graphs faster than we can. The purpose of the graph is to help us understand the data. After you (or your calculator) make a graph, always ask, �What do I see?� Here is a general tactic for looking at graphs:

Look for an overall pattern and also for striking deviations from that pattern.

OVERALL PATTERN OF A DISTRIBUTION

To describe the overall pattern of a distribution:

Give the center and the spread.

See if the distribution has a simple shape that you can describe in a few words.

Figure 1.9

Section 6 will tell us in detail how to measure center and spread. For now, describe the center by finding a value that divides the observations so that about half take larger values and about half have smaller values. In Figure 1.9,  the center is 1. That is, a typical team scored about 1 goal in its playoff soccer  game. You can describe the spread by giving the smallest and largest values. The spread in Figure 1.9 is from 0 goals to 7 goals scored. The dotplot in Figure 1.9 shows that in most of the playoff games, Division V soccer teams scored very few goals. There were only four teams that scored 4 or more goals. We can say that the distribution has a �long tail� to the right, or that its shape is �skewed right.� You will learn more about describing shape shortly. Is the one team that scored 7 goals an outlier? This value certainly differs from the overall pattern. To some extent, deciding whether an observation is an outlier is a matter of judgment. We will introduce an objective criterion for determining outliers in Section 6. Once you have spotted outliers, look for an explanation. Many outliers are due to mistakes, such as typing 4.0 as 40. Other outliers point to the special nature of some observations. Explaining outliers usually requires some background information. Perhaps the soccer team that scored seven goals has some very talented offensive players. Or maybe their opponents played poor defense. Sometimes the values of a variable are too spread out for us to make a reasonable dotplot.

OUTLIERS

An

outlier in any graph of data is an individual observation that falls outside the overall pattern of the graph.

Let's revisit the histogram of the presidential inauguration ages.

A distribution is if one side of the histogram is a mirror image of the other side.

Here is a good interpretation of the graph.

Center:

It appears that the typical age of a new president is about 55 years, because 55 is near the center of the histogram.

Spread:

As the histogram shows, there is a good deal of variation in the ages at which presidents take office. Teddy Roosevelt was the youngest, at age 42, and Ronald Reagan, at age 69, was the oldest.

Shape:

The distribution is roughly symmetric and has a single peak (unimodal).

Outliers:

There appear to be no outliers.

More about shape

When you describe a distribution, concentrate on the main features. Look for major peaks, not for minor ups and downs in the bars of the histogram. Look for clear outliers, not just for the smallest and largest observations. Look for rough

symmetry or clear skewness.

In mathematics, symmetry means that the two sides of a figure like a histogram are exact mirror images of each other. Data are almost never exactly symmetric, so we are willing to the call the presidential inauguration ages histogram approximately symmetric as an overall description.

Here are more examples.

SYMMETRIC AND SKEWED DISTRIBUTIONS

A distribution is

symmetric if the right and left sides of the histogram are approximately mirror images of each other.

Symmetric

A distribution is skewed to the right if the right side of the histogram (containing the half of the observations with larger values) extends much farther out than the left side. This type of distribution is also called positively skewed.

Skewed right

It is skewed to the left if the left side of the histogram extends much farther out than the right side. This type of distribution is also called negatively skewed.

Skewed left

Remember these basic shapes as they will appear throughout the course.

Relative frequency, cumulative frequency, percentiles, and ogives

Sometimes we are interested in describing the relative position of an individual within a distribution. You may have received a standardized test score report that said you were in the 80th percentile. What does this mean? Put simply, 80% of the people who took the test earned scores that were less than or equal to your score. The other 20% of students taking the test earned higher scores than you did.

PERCENTILE

The pth percentile of a distribution is the value such that p percent of the observations fall at or below it.

A histogram does a good job of displaying the distribution of values of a variable. But it tells us little about the relative standing of an individual observation. If we want this type of information, we should construct a

relative cumulative frequency graph, often called an ogive (pronounced O-JIVE).

Recall the histogram of the ages of U.S. presidents when they were inaugurated. Now we will examine where some specific presidents fall within the age distribution.

How to construct an ogive (relative cumulative frequency graph):

Step 1:

Decide on class intervals and make a frequency table, just as in making a histogram. Add three columns to your frequency table: relative frequency, cumulative frequency, and relative cumulative frequency.

To get the values in the relative frequency column, divide the count in each class interval by 43, the total number of presidents. Multiply by 100 to convert to a percentage.

To fill in the cumulative frequency column, add the counts in the frequency column that fall in or below the current class interval.

For the relative cumulative frequency column, divide the entries in the cumulative frequency column by 43, the total number of individuals.

Here is the frequency table from the presidential inauguration ages with the relative frequency, cumulative frequency, and relative cumulative frequency columns added.

Class

Frequency

Relative Frequency

Cumulative frequency

Relative Cumulative Frequency

40-44

2

2/43 = 0.047

2

2/43 = 0.047

45-49

6

6/43 = 0.140

8

8/43 = 0.186

50-54

13

13/43 = 0.302

21

21/43 = 0.488

55-59

12

12/43 = 0.279

33

33/43 = 0.767

60-64

7

7/43 = 0.163

40

40/43 = 0.930

65-69

3

3/43 = 0.070

43

43/43 = 1.000

Total

43

Step 2:

Label and scale your axes and title your graph. Label the horizontal axis �Age at inauguration� and the vertical axis �Relative cumulative frequency.� Scale the horizontal axis according to your choice of class intervals and the vertical axis from 0% to 100%.

Step 3:

Plot a point corresponding to the relative cumulative frequency in each class interval at the left endpoint of the next class interval. For example, for the 40�44 interval, plot a point at a height of 4.7% above the age value of 45. This means that 4.7% of presidents were inaugurated before they were 45 years old. Begin your ogive with a point at a height of 0% at the left endpoint of the lowest class interval. Connect consecutive points with a line segment to form the ogive. The last point you plot should be at a height of 100%. The complete ogive is plotted below.

A distribution is if one side of the histogram is a mirror image of the other side.

How to locate an individual within the distribution:

What about Bill Clinton? He was age 46 when he took office. To find his relative standing, draw a vertical line up from his age (46) on the horizontal axis until it meets the ogive. Then draw a horizontal line from this point of intersection to the vertical axis. We would estimate that Bill Clinton�s age places him at the 10% relative cumulative frequency mark. That tells us that about 10% of all U.S. presidents were the same age as or younger than Bill Clinton when they were inaugurated. Put another way, President Clinton was younger than about 90% of all U.S. presidents based on his inauguration age. His age places him at the 10th percentile of the distribution.

How to locate a value corresponding to a percentile:

What inauguration age corresponds to the 60th percentile? To answer this question, draw a horizontal line across from the vertical axis at a height of 60% until it meets the ogive. From the point of intersection, draw a vertical line down to the horizontal axis.

Find the center of the distribution.

Since we use the value that has half of the observations above it and half below it as our estimate of center, we simply need to find the 50th percentile of the distribution. Estimating as for the previous question, confirm that 55 is the center.

Try Self Check 4

Practice Problem:

Here is an ogive of the amount spent by grocery shoppers.

A distribution is if one side of the histogram is a mirror image of the other side.

(a)

Estimate the center of this distribution. Explain your method.

(b)

At what percentile would the shopper who spent $17.00 fall?

(c)

Draw the histogram that corresponds to the ogive.

Answers:

a. To find the center of the distribution I would go to 50 on the y-axis (Relative Cumulative Frequency) since 50 represents the center and draw a horizontal line until it met the line of the ogive. At that point I would draw a vertical line to the x-axis (Amount Spent ($)). The estimate at this point is $27.

b. 35th  percentile

A distribution is if one side of the histogram is a mirror image of the other side.

c.

A distribution is if one side of the histogram is a mirror image of the other side.

What is the distribution of a histogram?

A histogram shows the distribution of the data to assess the central tendency, variability, and shape. A histogram for a quantitative variable divides the range of the values into discrete classes, and then counts the number of observations falling into each class interval.

When the left side of the distribution is a mirror image of the right side we say the distribution has which of the following characteristics?

In a symmetrical distribution the two sides of the distribution are a mirror image of each other. A normal distribution is a true symmetric distribution of observed values.

Can you tell distribution from a histogram?

A frequency distribution shows how often each different value in a set of data occurs. A histogram is the most commonly used graph to show frequency distributions.

When the right half of a histogram is a mirror image of the left?

A histogram is symmetric if its right half is a mirror image of its left half. Very few histograms are perfectly symmetric, but many are approximately symmetric.