Value that describes a sample, usually derived from measurements of the individuals in the sample

Definitions

StatisticsCollection of methods for planning experiments, obtaining data, and then organizing, summarizing, presenting, analyzing, interpreting, and drawing conclusions.VariableCharacteristic or attribute that can assume different valuesRandom VariableA variable whose values are determined by chance.PopulationAll subjects possessing a common characteristic that is being studied.Sample A subgroup or subset of the population.ParameterCharacteristic or measure obtained from a population.Statistic (not to be confused with Statistics)Characteristic or measure obtained from a sample.Descriptive StatisticsCollection, organization, summarization, and presentation of data.Inferential StatisticsGeneralizing from samples to populations using probabilities. Performing hypothesis testing, determining relationships between variables, and making predictions.Qualitative VariablesVariables which assume non-numerical values.Quantitative VariablesVariables which assume numerical values.Discrete VariablesVariables which assume a finite or countable number of possible values. Usually obtained by counting.Continuous VariablesVariables which assume an infinite number of possible values. Usually obtained by measurement.Nominal Level Level of measurement which classifies data into mutually exclusive, all inclusive categories in which no order or ranking can be imposed on the data.Ordinal LevelLevel of measurement which classifies data into categories that can be ranked. Differences between the ranks do not exist.Interval LevelLevel of measurement which classifies data that can be ranked and differences are meaningful. However, there is no meaningful zero, so ratios are meaningless. Ratio LevelLevel of measurement which classifies data that can be ranked, differences are meaningful, and there is a true zero. True ratios exist between the different units of measure.Random SamplingSampling in which the data is collected using chance methods or random numbers.Systematic SamplingSampling in which data is obtained by selecting every kth object.Convenience SamplingSampling in which data is which is readily available is used.Stratified SamplingSampling in which the population is divided into groups (called strata) according to some characteristic. Each of these strata is then sampled using one of the other sampling techniques.Cluster SamplingSampling in which the population is divided into groups (usually geographically). Some of these groups are randomly selected, and then all of the elements in those groups are selected.

Table of Contents

Researchers use of statistics - refers to a set of methods and rules for organizing, summarizing, and interpreting information.

Two basic kinds of statistics

-       Descriptivestatistics are statistical procedures used to summarize, organize, and simplify data.

-       Inferentialstatistics consist of techniques that allow us to study samples and then make generalizations about the populations from which they were selected.

-       A population is the set of all individuals of interest in a particular study

-       A prameter is a value, usually a numerical value, that describes a population.  A parameter may be obtained from a single measurement, or it may be derived from a set of measurements from the population.

-       A sample is a set of individuals selected from a population, ususally intended to represent the population in a study.

-       A statistic is a value, usually a numerical value, that describes a sample.  A statistic may be obtained from a single measurement, or it may be derived from a set of measurements from the sample.

Sampling error is the discrepancy or amount of error, that exists between a sample statistic and the corresponding population parameter.

There are 3 characteristics used that completely describe a distribution:  shape, centraltendency, and variability.

Shape:   In a symmetrical distribution, it is possible to draw a vertical line through the middle so that one side of the distribution is an exact mirror image of the other.

Value that describes a sample, usually derived from measurements of the individuals in the sample

In a skewed distribution, the scores tend to pile up toward one end of the scale and taper off gradually at the other end.

The section where the scores taper off towards one end of a distribution is called the tail of the distribution.

Value that describes a sample, usually derived from measurements of the individuals in the sample

                                    negatively                                            positively

A skewed distribution with the tail on the right-hand side is said to be positively skewed (because the tail points towards positive numbers).  If the tail points to the left, then the distribution is said to be negatively skewed.

Central tendency is a statistical measure that identifies a single score as representative of an entire distribution.  The goal of central tendency is to find the single score that is most typical or most representative of the entire group.

There are several measures of central tendency, but we’ll only focus on the mean.

The most commonly known measure of central tendency is the arithmetic average, or the mean (note: in everyday speech, the term average actually refers to all three measures of central tendency, for examples of this see gray box 3.4, pg 90).  We’ve already talked about how you would go about figuring this out from the data in a frequency distribution table.

The mean for a distribution is the sum of the scores divided by the number of scores. 

The formula for the population mean is:          m   =  å X

                                                                                    N

The formula for the sample mean is:              

Value that describes a sample, usually derived from measurements of the individuals in the sample
= å X

                                                                                    n

Variability provides a quantitiative measure of the degree to which scores in a distribution are spread out or clustered together.  In other words variablility refers to the degree of “differentness” of the scores in the distribution.  High variability means that the scores differ by a lot, while low variability means that the scores are all similar (“homogeneousness”).

There are several measures of variability, but we’ll concentrate on the standard deviation.

In essence, the standard deviation measures how far off all of the individuals in the distribution are from a standard, where that standard is the mean of the distribution.

            So to get a measure of the deviation we need to subtract the population mean from every individual in our distribution.

                                    X - m = deviation score

            - if the score is a value above the mean the deviation score will be positive

            - if the score is a value below the mean the deviation score will be negative

Add up all the deviations and you get zero.  So what we have to do is get rid of the negative signs.  We do this by squaring the deviations and then taking the square root of the sum of the squared deviations.

Sum of Squares = SS = å (X - m)2

Population variance = s2 = SS/N

standard deviation =

Value that describes a sample, usually derived from measurements of the individuals in the sample
 =
Value that describes a sample, usually derived from measurements of the individuals in the sample

The Standard Deviation of a Sample is nearly the same

            - the computations are pretty much the same here:

                        - different notation:

                                                 - s = sample standard deviation

                                                - use

Value that describes a sample, usually derived from measurements of the individuals in the sample
 instead of m in the computaion of SS

                        - need to adjust the computation to tak into account that a sample will

                                    typically be less variable than the corresponding population.

Value that describes a sample, usually derived from measurements of the individuals in the sample

- if you have a good, representative sample, then your sample and population means should be very similar, and the overall shape of the two distributions should be similar.  However, notice that the variability of the sample is smaller than the variability of the population.

- to account for this the sample variance is divided by n - 1 rather than just n

                                                sample variance = s2 = _SS_

                                                                                      n - 1

                                    - and the same is true for sample standard deviation

                                                sample standard deviation = s = 

Value that describes a sample, usually derived from measurements of the individuals in the sample

So what we’re doing when we subtract 1 from n is using degrees of freedom to adjust our sample deviations to make an unbiased estimation of the population values.

Recall that the goal of inferential statistics is to make claims about population parameters based on sample statistics.  So the logic will be something like this.  We can’t measure the whole population, so we take a sample.  Our best estimate for the mean of the population will be the mean of our sample. (remember that it is only an estimate because we have sampling error before - the difference between a sample statistic and the corresponding population parameter).  It sounds simple and straight forward, but consider the following:

Value that describes a sample, usually derived from measurements of the individuals in the sample

Suppose that you take 3 different samples from the same population.  They are going to be different from one another.  They will have different shapes, different means, and different variability.  So how do you figure out what the best estimate of the population mean is?

How many possible samples can we take?  Infinite (remember that we are sampling with replacement)?  Luckily for us, the huge set of possible samples forms a simple, orderly, and

predictable pattern (a sampling distribution).  Because of this, we are able to base our predictions about sample characteristics on the distribution of sample means.

The distribution of sample means is the collection of sample means for all the possible random samples of a particular size (n) that can be obtained from a population.

            mean:  the average of all of the sample means will equal the mean of the

                        population.  The average of all of the sample means is called the expected

                        value of

Value that describes a sample, usually derived from measurements of the individuals in the sample
.  It is “expected” because it should be a value near the

                        population mean m. 

            variability:  the standard deviation of the distribution of sample means is called

                        the standard error of

Value that describes a sample, usually derived from measurements of the individuals in the sample

            standard error of

Value that describes a sample, usually derived from measurements of the individuals in the sample
 =
Value that describes a sample, usually derived from measurements of the individuals in the sample
= standard distance between
Value that describes a sample, usually derived from measurements of the individuals in the sample
  and m.

in other words, this statistic describes the standard (typical/average)

distance from the mean.  In this case it is the distance between the sample mean

Value that describes a sample, usually derived from measurements of the individuals in the sample
 and the population mean m.  The major purpose/use of the standard error of
Value that describes a sample, usually derived from measurements of the individuals in the sample
 is that it tells us how well the sample mean estimates the population mean.  In other words, how big is the sample error.

            the numerical value of the standard error is determined by two characteristics: the

                        variability of the population & the size of the sample

                                    1)  the variability of the population - the bigger the variability

                                                of the population, the more variability you’ll have in the        

                                                sample means. 

2) the size of the sample - the larger your sample size (n), the

more accurately the sample represents the population. 

Central Limit Theorem:  For any population with mean m and standard

                        deviation s, the distribution of sample means for sample size n will

                        approach a normal distriution with a mean of m and a standard deviation

                        of

Value that describes a sample, usually derived from measurements of the individuals in the sample
  as n approaches infinity.

            Hypothesis testing is an inferential procedure that uses sample data to evaluate

                        the credibility of a hypothesis about a population.

step1:  Make a hypothesis and select a criteria for the decsion

            step2:  Collect your data

                        - randomly select individuals from a population

                        - randomly assign selected individuals to specific treatment groups

            step3:  Compute a test statistic (more on this later in the lecture, and the course)

                                    - things like z-scores, t-tests, f-tests (ANOVA)

            step4:  Compare the test statistic to a distribution to make an inference about the

                        parameter and hence draw a conclusion about the sample.

The decision criteria = the alpha level

Actual situation

H0 is correct

H0 is wrong

Experimenter’s

Reject H0

oops!

Type I error

Yay!

correct

Conclusions

Fail to reject H0

Yay!

correct

oops!

Type II error

the two kinds of error each have their own name, because they really are reflecting different things

type I error (a, alpha) - the H0 is actually correct, but the experimenter rejected it

type II error (b, beta)- the H0 is really wrong, but the experiment didn’t feel as though they could reject it

The courtroom/jury analogy

Actual situation

X is innocent

X is guilty

Jury’s

Guilty

oops!

Type I error

Yay!

correct

Conclusions

Not Guilty

Yay!

correct

oops!

Type II error

                        Type I error - sending an innocent person to jail

                        Type II error - letting a guilty person go free

In scientific research, we typically take a conservative approach, and set our critera such

that we try to minimize the chance of making a Type I error (concluding that there is an effect of something when there really isn’t).  In other words, scientists focus on setting an acceptible alpha level (a), or levelof significance. 

            The alpha level (a), or levelof significance, is a probabiity value that defines the very unlikely sample outcomes when the null hypothesis is true.  Whenever an experiment produces very unlikely data (as defined by alpha), we will reject the null hypothesis.  Thus, the alpha level also defines the probability of a Type I error - that is, the probability of rejecting H0 when it is actually true.  note: In psychology a is usually set at 0.05

Using tables and graphs to present your results.

What is a measurement from a sample called?

A statistic is a numerical measurement describing some characteristic of a sample.

What is a characteristic that describes the sample?

A statistic is a characteristic, usually numerical, that describes a sample.

What is it that values used to summarize or describe a population?

For example, tables or graphs are used to organize data, and descriptive values such as the average score are used to summarize data. A descriptive value for a population is called a parameter and a descriptive value for a sample is called a statistic.

What is a number that describes a population called?

A parameter is a number describing a whole population (e.g., population mean), while a statistic is a number describing a sample (e.g., sample mean).