# Bài giảng Probability & Statistics - Lecture 3: Numerical summary - Bùi Dương Hải

Data Measurements  Location:  Minimum, Maximum  Central Tendency: Mean, Median, Mode  Quantile: Quartile, Percentile  Variability:  Range  Variance (Var)  Standard Deviation (SD)  Coefficient of Variation (CV)  Interquartile Range (IQR)

40 trang | Chia sẻ: thanhle95 | Lượt xem: 237 | Lượt tải: 0
Bạn đang xem trước 20 trang tài liệu Bài giảng Probability & Statistics - Lecture 3: Numerical summary - Bùi Dương Hải, để xem tài liệu hoàn chỉnh bạn click vào nút DOWNLOAD ở trên
Lecture 3. NUMERICAL SUMMARY  Data Measurements  Locations  Variability Measures  Shape  [1] Chapter 3, pp. 99 - 162  [3] Chapter 2 PROBABILITY & STATISTICS– Bui Duong Hai – NEU – www.mfe.edu.vn/buiduonghai 1 Comparison  Profit of two project A & B PROBABILITY & STATISTICS– Bui Duong Hai – NEU – www.mfe.edu.vn/buiduonghai 2 5% 10% 15% 20% 30% 20% 1 2 3 4 5 6 Profit of Project A (million) 20% 30% 20% 15% 10% 5% 1 2 3 4 5 6 Profit of Project B (million) Comparison PROBABILITY & STATISTICS – Bui Duong Hai – NEU – www.mfe.edu.vn/buiduonghai 3 2% 5% 8% 15% 20% 30% 20% 0% 0% 1 2 3 4 5 6 7 8 9 Profit of Project C (million) 0% 0% 20% 30% 20% 15% 8% 5% 2% 1 2 3 4 5 6 7 8 9 Profit of Project D (million) Comparison PROBABILITY & STATISTICS – Bui Duong Hai – NEU – www.mfe.edu.vn/buiduonghai 4 0% 0% 10% 40% 40% 10% 0% 0% -1 0 1 2 3 4 5 6 Profit of Project F (million) 5% 10% 15% 20% 20% 15% 10% 5% -1 0 1 2 3 4 5 6 Profit of Project E (million) Data Measurements  Location:  Minimum, Maximum  Central Tendency: Mean, Median, Mode  Quantile: Quartile, Percentile  Variability:  Range  Variance (Var)  Standard Deviation (SD)  Coefficient of Variation (CV)  Interquartile Range (IQR) PROBABILITY & STATISTICS – Bui Duong Hai – NEU – www.mfe.edu.vn/buiduonghai 5 3.1. Mean (arithmetic mean)  Apply for scale variable only  =  Have the same unit as the original data PROBABILITY & STATISTICS – Bui Duong Hai – NEU – www.mfe.edu.vn/buiduonghai 6 Population Sample Data: {,, ,} Data: {,, ,} = + + ⋯ + = + + ⋯ + Weighted mean  Price (\$) in Quarter 1, 2, 3, 4 are 10, 12, 18, 14, respectively. = 10 + 12 + 18 + 14 4 =  Any difference if the volume of sales in Quarter 1, 2, 3, 4 are 70, 90, 110, 130? PROBABILITY & STATISTICS – Bui Duong Hai – NEU – www.mfe.edu.vn/buiduonghai 7 Q1 Q2 Q3 Q4 Price 10 12 18 14 Volume 70 90 110 130 Value xi Weight wi Weighted Mean  In general, for grouped data: = + + ⋯ + + + ⋯ + = ∑ ∑  For Example of Price: ̅ = 70 ∗ 10 + 90 ∗ 12 + 110 ∗ 18 + 130 ∗ 14 70 + 90 + 110 + 130 = PROBABILITY & STATISTICS – Bui Duong Hai – NEU – www.mfe.edu.vn/buiduonghai 8 Mean of Grouped data  Frequency, Proportion, Percent table PROBABILITY & STATISTICS – Bui Duong Hai – NEU – www.mfe.edu.vn/buiduonghai 9 Wage (\$) 7 8 9 Number of worker (Frequency) 4 10 6 Proportion (Relative frequency) 0.2 0.5 0.3 Percent 20% 50% 30% Compare the Mean  Compare the mean of following data:  Data 1: {10, 10, 11, 12, 12}  Data 2: {5, 5, 6, 6, 100}  The mean is easily affected by the extreme or outlier value  May lead to biased comparison   Use the other measures PROBABILITY & STATISTICS – Bui Duong Hai – NEU – www.mfe.edu.vn/buiduonghai 10 3.2. Median  Median, denoted by me, is the midpoint of ordered list of values  Median could be applied for ordinal variable Ex. Data: { 5, 6, 9, 5, 6 } Ordered data: { 5, 5, 6, 6, 9 } : Median = Ordered Data {6, 6, 7, 8, 9, 11} : Median =  Data: {XXS, XS, S, S, S, M, L, XL, XXL}: Median = PROBABILITY & STATISTICS – Bui Duong Hai – NEU – www.mfe.edu.vn/buiduonghai 11 Median  Median is the ‘cutoff point’ of lower 50% - upper 50% parts PROBABILITY & STATISTICS – Bui Duong Hai – NEU – www.mfe.edu.vn/buiduonghai 12 Discrete vs Continous Discrete Continuous Lower 50% Upper 50% Median 3.3. Mode  Mode, denoted by m0, is the value that occurs most often, frequency of (X = m0) is the largest.  There may be no mode or several modes.  Mode could be applied for nominal variable  Example What are the modes?  Data 1: { 5, 6, 6, 7, 7, 7, 9 }  Data 2: { 5, 6, 7, 8, 9 }  Data 3: { 5, 6, 9, 5, 6 }  Data 4: { Yellow, Yellow, Red, Blue, Green} PROBABILITY & STATISTICS – Bui Duong Hai – NEU – www.mfe.edu.vn/buiduonghai 13 Mean, Median, Mode PROBABILITY & STATISTICS – Bui Duong Hai – NEU – www.mfe.edu.vn/buiduonghai 14 Mean = 4 0 1 2 3 4 5 6 7 8 9 10 Mean = 3Median = 3 Median = 3 0 1 2 3 4 5 6 7 8 9 10 No Mode 0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10 Median = 5.5 Mean = Median = Mode = 5 Mode: 7Mean = 4.8 Mean, Median, Mode PROBABILITY & STATISTICS – Bui Duong Hai – NEU – www.mfe.edu.vn/buiduonghai 15 Mean Median Mode Symmetric Right skewedLeft skewed Mode < Median < MeanMean < Median < Mode Grouped data  Customer’s waiting time  Median is in group of [5 – 10)  Modal group:  Mean: using middle value PROBABILITY & STATISTICS – Bui Duong Hai – NEU – www.mfe.edu.vn/buiduonghai 16 Waiting time 0 – 5 5 – 10 10 – 15 15 – 20 20 + Frequency 15 20 8 5 2 Waiting time 2.5 7.5 12.5 17.5 22.5 Frequency 15 20 8 5 2 3.4. Quartile  Divide data into 4 equal-parts by 3 cutoff points: 3 quartile ,,  2nd quartile: = PROBABILITY & STATISTICS – Bui Duong Hai – NEU – www.mfe.edu.vn/buiduonghai 17 25% 25% 25% 25% Quantile  Divide into 5 equal-parts by 4 cutoff point: 4 Quintile  Divide into 10 equal-parts by 9 cutoff point: 9 Decile  100 equal-parts: 99 percentile  10th percentile = 1st decile  20th percentile = 2nd decile = 1st quintile  25th percentile = 1st quartile  50th percentile = 2nd quartile = median PROBABILITY & STATISTICS – Bui Duong Hai – NEU – www.mfe.edu.vn/buiduonghai 18 Micrsoft Excel Function Measures Command / Function Mean = average(data) Median = median(data) Mode = mode(data) Quartile k (k = 1,2,3) = quartile(data, k) Percentile k (k = 1,2,,99) = percentile(data, k) PROBABILITY & STATISTICS – Bui Duong Hai – NEU – www.mfe.edu.vn/buiduonghai 19 Variability  Central Tendency may not provide efficient information of the data.  Data have the same Mean, Median, but differ in variability (dispersion, spread). PROBABILITY & STATISTICS – Bui Duong Hai – NEU – www.mfe.edu.vn/buiduonghai 20 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 Mean = Median = 5 3.5. Range  Range = largest value – smallest value = xmax – xmin  Simplest, but poorest information. PROBABILITY & STATISTICS – Bui Duong Hai – NEU – www.mfe.edu.vn/buiduonghai 21 0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10 Range = 7 Range = 6 3.6. Variance & Standard Deviation  Sample Data: ,, ,  the mean ̅  Deviation: − ̅ : (+) or (–) or zero  Sum of Squares: = ∑ − ̅  Variance: = − = ∑ − −  Unit of Variance is squared unit of PROBABILITY & STATISTICS – Bui Duong Hai – NEU – www.mfe.edu.vn/buiduonghai 22 Standard Deviation  Standard Deviation is square root of Variance =  Standard Deviation has the same unit as  Variance & S.D measure the “absolute” variability  If > then:  is more variability, dispersed, widespread, fluctuated than  is more stable, concentrated than PROBABILITY & STATISTICS – Bui Duong Hai – NEU – www.mfe.edu.vn/buiduonghai 23 Population and Sample  Difference between Population and Sample PROBABILITY & STATISTICS – Bui Duong Hai – NEU – www.mfe.edu.vn/buiduonghai 24 Population Sample Data {,, ,} {,, ,} Mean = ∑ = ∑ SS = ∑ − = ∑ − ̅ Variance = = − Std. Dev. = = Compare variability  Compare 3 samples  Firm A: Profit (\$ mil.): ( 5, 6, 7, 8, 9 )  Firm B: Profit (\$ mil.): ( 51, 53, 55, 57, 59 )  Firm C: Price (\$): ( 15, 16, 17, 18, 19 ) PROBABILITY & STATISTICS – Bui Duong Hai – NEU – www.mfe.edu.vn/buiduonghai 25 Mean SS S2 S CV A 7 (\$m) 10 2.5 (\$m)2 1.58 (\$m) 22.6 % B 55 (\$m) 40 10 (\$m)2 3.16 (\$m) 5.7 % C 17 (\$) 10 2.5 (\$)2 1.58 (\$) 9.3 % 3.7. Coefficient of Variation = × 100%  CV has unit of %, independent to unit of the data.  CV measures “relative” variation PROBABILITY & STATISTICS – Bui Duong Hai – NEU – www.mfe.edu.vn/buiduonghai 26 3.8. Interquartile Range  Interquartile Range is range between 3rd quartile and 1st quartile  = 3 − 1 = −  IQR is the width of 50% middle value of data PROBABILITY & STATISTICS – Bui Duong Hai – NEU – www.mfe.edu.vn/buiduonghai 27 25% 25% 25% 25% Outlier  There are Lower Limit and Upper Limit for the data  Observations smaller than LL or greater than UL are Outlier  By Quartiles: Lower Limit is − 1.5 Upper Limit is + 1.5 PROBABILITY & STATISTICS – Bui Duong Hai – NEU – www.mfe.edu.vn/buiduonghai 28 Key-point and Boxplot  Find 5 key-point and Outliers PROBABILITY & STATISTICS – Bui Duong Hai – NEU – www.mfe.edu.vn/buiduonghai 29 Salary 10 11 12 13 14 15 16 17 18 No. of Worker 10 16 30 19 14 10 0 0 1 1.5 1.5  Boxplot Table, Histogram, Boxplot 0 5 10 15 20 25 30 35 10 11 12 13 14 15 16 17 18 Salary PROBABILITY & STATISTICS – Bui Duong Hai – NEU – www.mfe.edu.vn/buiduonghai 30 Value Freq. 10 10 11 16 12 30 13 19 14 14 15 10 16 0 17 0 18 1 10 11 12 13.5 18 Boxplot : Key values and Whiskers PROBABILITY & STATISTICS – Bui Duong Hai – NEU – www.mfe.edu.vn/buiduonghai 31 A B C D E F Max 6 6 7 9 6 4 Q3 5 4 6 6 4 3 Q2 4.5 2.5 5.5 4.5 2.5 2.5 Q1 3 2 4 4 1 2 Min 1 1 1 3 -1 1 ̅ 4.2 2.8 5.16 4.84 2.5 2.5 Boxplot 2014 2015 2016 2017 Max Q3 Q2 Q1 Min Mean PROBABILITY & STATISTICS – Bui Duong Hai – NEU – www.mfe.edu.vn/buiduonghai 32 3.9. Skewness (Sk) PROBABILITY & STATISTICS – Bui Duong Hai – NEU – www.mfe.edu.vn/buiduonghai 33 Sk = 0 Two-tail Sk = 0.3 Right short tail Sk = – 0.3 Left short tail Sk = 1.3 Right long tail Sk = – 1.3 Left long tail 3.10. Covariance & Correlation  Covariance: combined variability of , , in sample: , = = ∑ ( − ̅)( − ) − 1 PROBABILITY & STATISTICS – Bui Duong Hai – NEU – www.mfe.edu.vn/buiduonghai 34 M ea n o f Y Mean of X Positive covariance M ea n o f Y Mean of X Negative covariance Correlation Coefficient = (,) = ∑ ( − ̅)( − ) ∑ − ̅ ∑ −  −1 ≤ ≤ 1, no unit  measures linear relationship between and  = −1 : linear negative  −1 < < 0 : negatively correlated  = 0 : no correlated  0 < < 1 : positively correlated  = 1 : linear positive PROBABILITY & STATISTICS – Bui Duong Hai – NEU – www.mfe.edu.vn/buiduonghai 35 Correlation  Graph and Correlation Coefficient () PROBABILITY & STATISTICS – Bui Duong Hai – NEU – www.mfe.edu.vn/buiduonghai 36 Positively Week Strong Negatively No correlated r = 0.5 r = – 0.5 r = 0.8 r = 0 Correlation Coefficient − ̅ − − ̅ − − ̅ ∗ − Jan 5 10 Feb 6 15 Mar 8 10 Apr 9 18 May 12 32 Sum Mean PROBABILITY & STATISTICS – Bui Duong Hai – NEU – www.mfe.edu.vn/buiduonghai 37  X: Advertising; Y: sales 3.11. Standardized value  Z-score of one value in data, have no unit = − . Ex. Compare score of Microeconomics and Macroeconomics of one student in one class if:  Micro score = 7.5; Marcro score = 9  Mean of Micro in class = 6; Mean of Macro = 7  S.D of Micro = 1; S.D of Macro = 2 PROBABILITY & STATISTICS – Bui Duong Hai – NEU – www.mfe.edu.vn/buiduonghai 38 Excel: Statistic Functions PROBABILITY & STATISTICS – Bui Duong Hai – NEU – www.mfe.edu.vn/buiduonghai 39 Statistic Function Sum = SUM(array) Mean X = AVERAGE(array) Median = MEDIAN(array) Kth Quartile (Q1,Q2,Q3) = QUARTILE(array, k) Sample variance (S2) = VAR(array) Sample S.D (S) = STDEV(array) Covariance Cov(X,Y) = COVAR(array1, array2) Correlation rXY = CORREL(array1, array2) X ~ N(µ,σ2); P(X < b) = NORMDIST(b, µ, σ, 1) Exercise [1] Chapter 3:  (p110) 2, 3, 6, 7, 11, 13,  (p120) 26, 27, 29, 33,  (p133) 49, 50, 52,  (p143) 56, 58, 59,  (p152) 62, 63, 70,  Case Problem 1, 4 PROBABILITY & STATISTICS – Bui Duong Hai – NEU – www.mfe.edu.vn/buiduonghai 40