Data Measurements
Location:
Minimum, Maximum
Central Tendency: Mean, Median, Mode
Quantile: Quartile, Percentile
Variability:
Range
Variance (Var)
Standard Deviation (SD)
Coefficient of Variation (CV)
Interquartile Range (IQR)
40 trang |
Chia sẻ: thanhle95 | Lượt xem: 295 | Lượt tải: 0
Bạn đang xem trước 20 trang tài liệu Bài giảng Probability & Statistics - Lecture 3: Numerical summary - Bùi Dương Hải, để xem tài liệu hoàn chỉnh bạn click vào nút DOWNLOAD ở trên
Lecture 3. NUMERICAL SUMMARY
Data Measurements
Locations
Variability Measures
Shape
[1] Chapter 3, pp. 99 - 162
[3] Chapter 2
PROBABILITY & STATISTICS– Bui Duong Hai – NEU – www.mfe.edu.vn/buiduonghai 1
Comparison
Profit of two
project A & B
PROBABILITY & STATISTICS– Bui Duong Hai – NEU – www.mfe.edu.vn/buiduonghai 2
5%
10%
15%
20%
30%
20%
1 2 3 4 5 6
Profit of Project A (million)
20%
30%
20%
15%
10%
5%
1 2 3 4 5 6
Profit of Project B (million)
Comparison
PROBABILITY & STATISTICS – Bui Duong Hai – NEU – www.mfe.edu.vn/buiduonghai 3
2%
5%
8%
15%
20%
30%
20%
0% 0%
1 2 3 4 5 6 7 8 9
Profit of Project C (million)
0% 0%
20%
30%
20%
15%
8%
5%
2%
1 2 3 4 5 6 7 8 9
Profit of Project D (million)
Comparison
PROBABILITY & STATISTICS – Bui Duong Hai – NEU – www.mfe.edu.vn/buiduonghai 4
0% 0%
10%
40% 40%
10%
0% 0%
-1 0 1 2 3 4 5 6
Profit of Project F (million)
5%
10%
15%
20% 20%
15%
10%
5%
-1 0 1 2 3 4 5 6
Profit of Project E (million)
Data Measurements
Location:
Minimum, Maximum
Central Tendency: Mean, Median, Mode
Quantile: Quartile, Percentile
Variability:
Range
Variance (Var)
Standard Deviation (SD)
Coefficient of Variation (CV)
Interquartile Range (IQR)
PROBABILITY & STATISTICS – Bui Duong Hai – NEU – www.mfe.edu.vn/buiduonghai 5
3.1. Mean (arithmetic mean)
Apply for scale variable only
=
Have the same unit as the original data
PROBABILITY & STATISTICS – Bui Duong Hai – NEU – www.mfe.edu.vn/buiduonghai 6
Population Sample
Data: { , , , } Data: { , , , }
=
+ + ⋯ +
=
+ + ⋯ +
Weighted mean
Price ($) in Quarter 1, 2, 3, 4 are 10, 12, 18, 14,
respectively.
=
10 + 12 + 18 + 14
4
=
Any difference if the volume of sales in Quarter 1, 2, 3,
4 are 70, 90, 110, 130?
PROBABILITY & STATISTICS – Bui Duong Hai – NEU – www.mfe.edu.vn/buiduonghai 7
Q1 Q2 Q3 Q4
Price 10 12 18 14
Volume 70 90 110 130
Value xi
Weight wi
Weighted Mean
In general, for grouped data:
=
+ + ⋯ +
+ + ⋯ +
=
∑
∑
For Example of Price:
̅ =
70 ∗ 10 + 90 ∗ 12 + 110 ∗ 18 + 130 ∗ 14
70 + 90 + 110 + 130
=
PROBABILITY & STATISTICS – Bui Duong Hai – NEU – www.mfe.edu.vn/buiduonghai 8
Mean of Grouped data
Frequency, Proportion, Percent table
PROBABILITY & STATISTICS – Bui Duong Hai – NEU – www.mfe.edu.vn/buiduonghai 9
Wage ($) 7 8 9
Number of worker
(Frequency)
4 10 6
Proportion
(Relative frequency)
0.2 0.5 0.3
Percent 20% 50% 30%
Compare the Mean
Compare the mean of following data:
Data 1: {10, 10, 11, 12, 12}
Data 2: {5, 5, 6, 6, 100}
The mean is easily affected by the extreme or outlier
value
May lead to biased comparison
Use the other measures
PROBABILITY & STATISTICS – Bui Duong Hai – NEU – www.mfe.edu.vn/buiduonghai 10
3.2. Median
Median, denoted by me, is the midpoint of ordered
list of values
Median could be applied for ordinal variable
Ex. Data: { 5, 6, 9, 5, 6 }
Ordered data: { 5, 5, 6, 6, 9 } : Median =
Ordered Data {6, 6, 7, 8, 9, 11} : Median =
Data: {XXS, XS, S, S, S, M, L, XL, XXL}: Median =
PROBABILITY & STATISTICS – Bui Duong Hai – NEU – www.mfe.edu.vn/buiduonghai 11
Median
Median is the ‘cutoff point’ of lower 50% - upper 50%
parts
PROBABILITY & STATISTICS – Bui Duong Hai – NEU – www.mfe.edu.vn/buiduonghai 12
Discrete vs Continous
Discrete Continuous
Lower
50%
Upper
50%
Median
3.3. Mode
Mode, denoted by m0, is the value that occurs most
often, frequency of (X = m0) is the largest.
There may be no mode or several modes.
Mode could be applied for nominal variable
Example What are the modes?
Data 1: { 5, 6, 6, 7, 7, 7, 9 }
Data 2: { 5, 6, 7, 8, 9 }
Data 3: { 5, 6, 9, 5, 6 }
Data 4: { Yellow, Yellow, Red, Blue, Green}
PROBABILITY & STATISTICS – Bui Duong Hai – NEU – www.mfe.edu.vn/buiduonghai 13
Mean, Median, Mode
PROBABILITY & STATISTICS – Bui Duong Hai – NEU – www.mfe.edu.vn/buiduonghai 14
Mean = 4
0 1 2 3 4 5 6 7 8 9 10
Mean = 3Median = 3 Median = 3
0 1 2 3 4 5 6 7 8 9 10
No Mode
0 1 2 3 4 5 6 7 8 9 10
0 1 2 3 4 5 6 7 8 9 10
Median = 5.5
Mean = Median =
Mode = 5
Mode: 7Mean = 4.8
Mean, Median, Mode
PROBABILITY & STATISTICS – Bui Duong Hai – NEU – www.mfe.edu.vn/buiduonghai 15
Mean
Median
Mode
Symmetric Right skewedLeft skewed
Mode < Median < MeanMean < Median < Mode
Grouped data
Customer’s waiting time
Median is in group of [5 – 10)
Modal group:
Mean: using middle value
PROBABILITY & STATISTICS – Bui Duong Hai – NEU – www.mfe.edu.vn/buiduonghai 16
Waiting time 0 – 5 5 – 10 10 – 15 15 – 20 20 +
Frequency 15 20 8 5 2
Waiting time 2.5 7.5 12.5 17.5 22.5
Frequency 15 20 8 5 2
3.4. Quartile
Divide data into 4 equal-parts by 3 cutoff points: 3
quartile , ,
2nd quartile: =
PROBABILITY & STATISTICS – Bui Duong Hai – NEU – www.mfe.edu.vn/buiduonghai 17
25% 25% 25% 25%
Quantile
Divide into 5 equal-parts by 4 cutoff point: 4 Quintile
Divide into 10 equal-parts by 9 cutoff point: 9 Decile
100 equal-parts: 99 percentile
10th percentile = 1st decile
20th percentile = 2nd decile = 1st quintile
25th percentile = 1st quartile
50th percentile = 2nd quartile = median
PROBABILITY & STATISTICS – Bui Duong Hai – NEU – www.mfe.edu.vn/buiduonghai 18
Micrsoft Excel Function
Measures Command / Function
Mean = average(data)
Median = median(data)
Mode = mode(data)
Quartile k (k = 1,2,3) = quartile(data, k)
Percentile k (k = 1,2,,99) = percentile(data, k)
PROBABILITY & STATISTICS – Bui Duong Hai – NEU – www.mfe.edu.vn/buiduonghai 19
Variability
Central Tendency may
not provide efficient
information of the data.
Data have the same
Mean, Median, but
differ in variability
(dispersion, spread).
PROBABILITY & STATISTICS – Bui Duong Hai – NEU – www.mfe.edu.vn/buiduonghai 20
0 1 2 3 4 5 6 7 8 9
0 1 2 3 4 5 6 7 8 9
0 1 2 3 4 5 6 7 8 9
0 1 2 3 4 5 6 7 8 9
Mean = Median = 5
3.5. Range
Range = largest value – smallest value
= xmax – xmin
Simplest, but poorest information.
PROBABILITY & STATISTICS – Bui Duong Hai – NEU – www.mfe.edu.vn/buiduonghai 21
0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10
Range = 7 Range = 6
3.6. Variance & Standard Deviation
Sample Data: , , , the mean ̅
Deviation: − ̅ : (+) or (–) or zero
Sum of Squares: = ∑ − ̅
Variance:
=
−
=
∑ −
−
Unit of Variance is squared unit of
PROBABILITY & STATISTICS – Bui Duong Hai – NEU – www.mfe.edu.vn/buiduonghai 22
Standard Deviation
Standard Deviation is square root of Variance
=
Standard Deviation has the same unit as
Variance & S.D measure the “absolute” variability
If
>
then:
is more variability, dispersed, widespread,
fluctuated than
is more stable, concentrated than
PROBABILITY & STATISTICS – Bui Duong Hai – NEU – www.mfe.edu.vn/buiduonghai 23
Population and Sample
Difference between Population and Sample
PROBABILITY & STATISTICS – Bui Duong Hai – NEU – www.mfe.edu.vn/buiduonghai 24
Population Sample
Data { , , , } { , , , }
Mean =
∑
=
∑
SS = ∑ −
= ∑ − ̅
Variance
=
=
−
Std. Dev. = =
Compare variability
Compare 3 samples
Firm A: Profit ($ mil.): ( 5, 6, 7, 8, 9 )
Firm B: Profit ($ mil.): ( 51, 53, 55, 57, 59 )
Firm C: Price ($): ( 15, 16, 17, 18, 19 )
PROBABILITY & STATISTICS – Bui Duong Hai – NEU – www.mfe.edu.vn/buiduonghai 25
Mean SS S2 S CV
A 7 ($m) 10 2.5 ($m)2 1.58 ($m) 22.6 %
B 55 ($m) 40 10 ($m)2 3.16 ($m) 5.7 %
C 17 ($) 10 2.5 ($)2 1.58 ($) 9.3 %
3.7. Coefficient of Variation
=
× 100%
CV has unit of %, independent to unit of the data.
CV measures “relative” variation
PROBABILITY & STATISTICS – Bui Duong Hai – NEU – www.mfe.edu.vn/buiduonghai 26
3.8. Interquartile Range
Interquartile Range is range between 3rd quartile
and 1st quartile
= 3 − 1 = −
IQR is the width of 50% middle value of data
PROBABILITY & STATISTICS – Bui Duong Hai – NEU – www.mfe.edu.vn/buiduonghai 27
25% 25% 25% 25%
Outlier
There are Lower Limit and Upper Limit for the data
Observations smaller than LL or greater than UL are
Outlier
By Quartiles: Lower Limit is − 1.5
Upper Limit is + 1.5
PROBABILITY & STATISTICS – Bui Duong Hai – NEU – www.mfe.edu.vn/buiduonghai 28
Key-point and Boxplot
Find 5 key-point and Outliers
PROBABILITY & STATISTICS – Bui Duong Hai – NEU – www.mfe.edu.vn/buiduonghai 29
Salary 10 11 12 13 14 15 16 17 18
No. of Worker 10 16 30 19 14 10 0 0 1
1.5 1.5
Boxplot
Table, Histogram, Boxplot
0
5
10
15
20
25
30
35
10 11 12 13 14 15 16 17 18
Salary
PROBABILITY & STATISTICS – Bui Duong Hai – NEU – www.mfe.edu.vn/buiduonghai 30
Value Freq.
10 10
11 16
12 30
13 19
14 14
15 10
16 0
17 0
18 1 10 11 12 13.5 18
Boxplot : Key values and Whiskers
PROBABILITY & STATISTICS – Bui Duong Hai – NEU – www.mfe.edu.vn/buiduonghai 31
A B C D E F
Max 6 6 7 9 6 4
Q3 5 4 6 6 4 3
Q2 4.5 2.5 5.5 4.5 2.5 2.5
Q1 3 2 4 4 1 2
Min 1 1 1 3 -1 1
̅ 4.2 2.8 5.16 4.84 2.5 2.5
Boxplot
2014 2015 2016 2017
Max
Q3
Q2
Q1
Min
Mean
PROBABILITY & STATISTICS – Bui Duong Hai – NEU – www.mfe.edu.vn/buiduonghai 32
3.9. Skewness (Sk)
PROBABILITY & STATISTICS – Bui Duong Hai – NEU – www.mfe.edu.vn/buiduonghai 33
Sk = 0
Two-tail
Sk = 0.3
Right short tail
Sk = – 0.3
Left short tail
Sk = 1.3
Right long tail
Sk = – 1.3
Left long tail
3.10. Covariance & Correlation
Covariance: combined variability of , , in sample:
, = =
∑ ( − ̅)( − )
− 1
PROBABILITY & STATISTICS – Bui Duong Hai – NEU – www.mfe.edu.vn/buiduonghai 34
M
ea
n
o
f
Y
Mean of X
Positive covariance
M
ea
n
o
f
Y
Mean of X
Negative covariance
Correlation Coefficient
=
( , )
=
∑ ( − ̅)( − )
∑ − ̅
∑ −
−1 ≤ ≤ 1, no unit
measures linear relationship between and
= −1 : linear negative
−1 < < 0 : negatively correlated
= 0 : no correlated
0 < < 1 : positively correlated
= 1 : linear positive
PROBABILITY & STATISTICS – Bui Duong Hai – NEU – www.mfe.edu.vn/buiduonghai 35
Correlation
Graph and Correlation Coefficient ( )
PROBABILITY & STATISTICS – Bui Duong Hai – NEU – www.mfe.edu.vn/buiduonghai 36
Positively
Week
Strong
Negatively
No
correlated
r = 0.5
r = – 0.5
r = 0.8
r = 0
Correlation Coefficient
− ̅ − − ̅
−
− ̅ ∗
−
Jan 5 10
Feb 6 15
Mar 8 10
Apr 9 18
May 12 32
Sum
Mean
PROBABILITY & STATISTICS – Bui Duong Hai – NEU – www.mfe.edu.vn/buiduonghai 37
X: Advertising; Y: sales
3.11. Standardized value
Z-score of one value in data, have no unit
=
−
.
Ex. Compare score of Microeconomics and
Macroeconomics of one student in one class if:
Micro score = 7.5; Marcro score = 9
Mean of Micro in class = 6; Mean of Macro = 7
S.D of Micro = 1; S.D of Macro = 2
PROBABILITY & STATISTICS – Bui Duong Hai – NEU – www.mfe.edu.vn/buiduonghai 38
Excel: Statistic Functions
PROBABILITY & STATISTICS – Bui Duong Hai – NEU – www.mfe.edu.vn/buiduonghai 39
Statistic Function
Sum = SUM(array)
Mean X = AVERAGE(array)
Median = MEDIAN(array)
Kth Quartile (Q1,Q2,Q3) = QUARTILE(array, k)
Sample variance (S2) = VAR(array)
Sample S.D (S) = STDEV(array)
Covariance Cov(X,Y) = COVAR(array1, array2)
Correlation rXY = CORREL(array1, array2)
X ~ N(µ,σ2); P(X < b) = NORMDIST(b, µ, σ, 1)
Exercise
[1] Chapter 3:
(p110) 2, 3, 6, 7, 11, 13,
(p120) 26, 27, 29, 33,
(p133) 49, 50, 52,
(p143) 56, 58, 59,
(p152) 62, 63, 70,
Case Problem 1, 4
PROBABILITY & STATISTICS – Bui Duong Hai – NEU – www.mfe.edu.vn/buiduonghai 40