Statistics and Probability |
Statistics :
It is the branch of science in which we study different methods and principals of collecting, presenting, analyzing and interpretation the numerical data related to any field of life.
Types of statistics :
There are two types of statistics
1- Descriptive statistics
2- Inferential statistics
Descriptive statistics:
It is consist of different methods for organizing and sumraising information in a clear and effective way.
Inferential statistics :
It is the branch of statistics in which we gain information from the sample observations and estimate the population parameter.
Observation :
A single numerical value related to any fact is called observation.
Definition of data in statistics:
The collection of different observation is called data.
Population:
The total number of atom or observation of anything which is under study called population.
Its size denoted by N.
Sample:
A Small part of population or any subset of all the unit of the population. It is usually selected at random.
Its size is denoted by n.
Parameter:
The result which is obtained from all the units of population.It is always constant quality quantity.
Statistic:
The result which is obtained from sample observation is called statistic.
Primary data:
The first collection of data by any person without any arrangement is called primary data.It is the most original or raw data. It is also called ungroup data.
Secondary data:
When we arrange the primary data by any statistical rule then it becomes secondary data.
Qualitative data:
The variable which cannot be measure numerically is called qualitative data or variable. For example, education, colour, honesty and grade.
Quantitative data:
The variable which can be measure numerically is called quantitative data. It is obtained either by counting on measuring. For example, height of student and temperature.
Average :
A value which divide the data into two equal parts. OR A value which represent the central idea of the observations in a given set.
Type of Average
- Arithmetic mean
- Median mode
- Geometric mean
- Harmonic mean
Arithmetic Mean
A ratio between the sum of observations and the number of observations. It also called average. It denoted by x bar.
Median
A value which divide the data into two equal parts after arranging the value in ascending and descending order is called median.
For ungroup data:
Median = value of ( (n+1) / 2 )th item For group data :
Median = lb + h/f ( (n/2 - c )
lb = lower class boundary
h = class interval
f = frequency
n = £f
Quartiles
The values which divide the data into four equal parts after arranging the values in ascending and descending order are called quartiles.
It is denoted by Q1,Q2,Q3.
Q1=lower quartile
Q2=Median quartile
Q3=upper quartile
For ungroup data: Q1 = value of ( (n+1) / 4 )th item
Q3 = value of 3( (n+1) / 4 )th item
For group data:
Q1 = lb + h/f ( (n/4 - c )
Q3 = lb + h/f ( (3n/4 - c )
lb = lower class boundary
h = class interval
f = frequency
n = £f
Deciles:
The value which divide the data into ten equal parts after arranging the values in ascending and descending order are called deciles.
It is denoted by (D1,D2,D3...D9)
For ungroup data:
D4 = value of 4( (n+1) / 10 )th item
For group data:
D4 = lb + h/f ( (4n/10 - c )
lb = lower class boundary
h = class interval
f = frequency
n = £f
Percentiles:
The value which divide the data into hundred equal parts after arranging the values in ascending and descending order are called percentiles.
It is denoted by (P1, P2, P3... P9)
For ungroup data:
P70 = value of 70( (n+1) / 100 )th item
For group data:
P70 = lb + h/f ( (70n/100) - c )
lb = lower class boundary
h = class interval
f = frequency
n = £f
Mode:
Most repeated value of the observation in a given set is called Mode.
It is denoted by X̂.
Geometric Mean:
Geometric mean is the define as nth posite root for the product of "n"observation is called geometric mean.
Harmonic Mean:
The ratio between number of observation and sum of their reciprocal of observation is guard harmonic mean.
Dispersion:
We need some measurments which tell us whether the dispersion is small or large and how the data is dispersed as such measures are called measures of dispersion or measure of variation.
There are two types of measures of dispersion.
- Absolute measure of dispersion
- Relative measure of dispersion
Absolute Measure of Dispersion
This measure gives us an idea about the amount of dispersion from a set of observation.It gives us and answer in same unit as unit of the original observation.
For Example:
when the observation in kg.
The most common types of absolute measures
- Range
- Quartile Deviation
- Mean Deviation
- Variance and Standard Deviation
Relative Measure of Dispersion:
These measures are calculated for the comparison of dispersion in two or more than two set of observations.These measure are free of the unit in which the original data is measure.
The most common types of relative measures of dispersion.
- Coefficient of Range
- Coefficient of Quartile Deviation
- Coefficient of Mean Deviation
- Coefficient of Variation
Range
The different between the largest value and smallest value of the observation in a given setis called range.
it is denoted by R.
Coefficient of Range
It is relative measure of dispersion which is based on the range.
Quartile Deviation:
It is the half of the difference between upper quartile and lower quartile.
It is denoted by Q.D.
Coefficient of Quartile Deviation:
It is the relative measure of dispersion which is based on quartile deviation.
Mean deviation / Average Deviation:
It is the average of absolute deviation of the observation from there mean, median and mode.
Coefficient of Mean Deviation:
It is the relative measure of dispersion which is base on mean deviation.
Variance and Standard Deviation:
It is the average of square deviation of the observations from their mean is called variance. it denoted by saqure of S.
Standard deviation:
It is the positive square root of average of square deviation of the observation from their mean is called standard deviation. It is denoted by S.
Coefficient of variation:
It is the relative measure of dispersion which is used to compare the variability between two or more than two set of data.
It denoted by C. V.