Professional Documents
Culture Documents
1 何謂統計學
統計(statistics)一詞乃源自拉丁字「status」,意指狀態(state)。
The word statistics is derived from the Latin word “status”, which means state.
The original meaning of state: the presentation of data and graphics related to a related economic,
demographic and political situation.
統計學是在不確定的情況下,提供人們能做出客觀決策的科學方法,其過程包括:對資料的蒐集、整
理、陳述,分析與解釋。 透過此一過程,進而根據分析的結果加以推論,從而可以獲得合理的研判
與有效的結論。
Statistics is a scientific method that provides people with the ability to make objective decisions under
uncertain circumstances. Through this process, and then according to the results of the analysis to
make inferences, so as to obtain reasonable judgments and effective conclusions.
統計的運用
例如,氣象報告、交通意外事件統計、公共安全事件統 計、國民所得、物價指數等等,皆與人類活
動有密切的 關係。
For example, weather reports, traffic accident statistics, public security incident statistics, national
income, price index, etc., are closely related to human activities.
舉凡社會科學與自然科學的研究、政府機構的施政,以 及工商企業的經營管理上,經常會用到統計
的概念與分 析方法。
For example, in the research of social and natural sciences, the administration of government agencies,
and the operation and management of industrial and commercial enterprises, statistical concepts and
analytical methods are often used.
統計學能幫助人們在面對不確定的情況下做出決策。
其目的在於解決問題。
統計學的主要內容包括統計理論及統計方法。
The main content of statistics includes statistical theory and statistical methods.
統計理論是研究與闡明統計方法的理論,故而成為統計 方法賴以建立和發展的基礎。
Statistical theory is the theory of research and clarification of statistical methods, so it has become the
basis for the establishment and development of statistical methods.
統計學解決問題的步驟:
1. 蒐集與問題有關的資料
2. 運用統計方法輔以統計理論
3. 提供一套合理的解決方法。
統計理論一般可分為:
○ 數理統計著重在以數學原理闡明統計方法的理論,證 明各種統計公式的來源;
○ Mathematical statistics focuses on clarifying the theory of statistical methods with mathematical
principles and proving the source of various statistical formulas;
◎ 應用統計則著重在如何將統計方法應用於各種科學研 究、企業經營,以及行政措施等,例如:生
物統計、經濟統計及政府統計等。
◎ Applied statistics focuses on how to apply statistical methods to various scientific research, business
operations, and administrative measures, such as biological statistics, economic statistics, and
government statistics.
統計方法則偏重於解決實際的問題,其步驟包括: 蒐集、整理、陳述、分析與解釋統計資料。
Statistical methods focus on solving practical problems, and the steps include: collecting, arranging,
presenting, analyzing and interpreting statistical data.
統計方法通常可根據分析的結果,進一步加以推論,以 推知全部研究對象的特性。
Statistical methods can usually be further inferred based on the results of the analysis to infer the
characteristics of all the research objects.
依據統計方法的幾個步驟,可將統計學的範圍區分為:
According to several steps of statistical methods, the scope of statistics can be divided into:
敘述統計(descriptive statistics)
推論統計(inferential statistics)
• 敘述統計乃包括統計方法中的蒐集、整理、陳述、分 析及解釋資料等步驟,亦即僅就統計資料本
身的特性 加以描述,並不將其意義推廣至更大範圍者。
• Descriptive statistics include the steps of collecting, arranging, presenting, analyzing and interpreting
data in statistical methods, that is, only describing the characteristics of the statistical data itself,
without extending its meaning to a wider range.
•根據敘述統計所分析的結果,進而推論某些事實現象 者,則屬於推論統計的範圍。
• Those who infer certain factual phenomena based on the analysis results of narrative statistics belong
to the scope of inferential statistics.
•推論統計是根據分析部分資料(樣本,sample)的結 果,對更大範圍資料(母體,population)的某些特
性,做一合理的推測與估計。
• Inferential statistics are based on the results of analyzing part of the data (sample, sample) to make a
reasonable guess and estimate for certain characteristics of a larger range of data (maternal,
population).
例題 1.1
Example 1.1
政府機構每年皆編有國民所得統計,根據所蒐集的資料加以整理後便可計算出每年的經濟成長,並
可做歷年的比較,由此解釋與分析經濟發展的情況;這是屬於敘述統計的 範圍。
Government agencies compile national income statistics every year, and after sorting out the collected
data, the annual economic growth can be calculated, and comparisons over the years can be made to
explain and analyze the economic development; this is a narrative statistics. Scope.
為了比較與分析上的便利,我們往往編製計圖、表,並計算某些統計量數(如平均數、變異數、比例
等),這些統計方法有助於資料的整理,並能迅速提供我們 藉以比較與分析的資訊;有關這方面的課
題乃第 2 章與第 3 章的主要內容。
For the convenience of comparison and analysis, we often compile charts, tables, and calculate certain
statistics (such as mean, variance, proportion, etc.). These statistical methods are helpful for data sorting
and can quickly provide us with Information for comparison and analysis; topics in this area are the
main content of Chapters 2 and 3.
例題 1.2
Example 1.2
In Example 1.1, if based on the data of the past years and some economic, social, political and world
economy-related factors, it is possible to estimate the growth rate of national income in the next year or
even in the next few years; this belongs to the scope of inferential statistics .
由於未來各種現象存在著不確定性,因此欲掌握這些不確定 性,必須設法衡量其不確定程度;此時機
率理論成為主要的工具,這是第 4、5、6 章的主題。 本書第 7 章以後,開始介紹 推論統計的基礎與
統計方法,並將依據統計推論的程序,做有系統的討論。
Since there are uncertainties in future phenomena, to grasp these uncertainties, one must try to
measure their degree of uncertainty; at this time probability theory becomes the main tool, which is the
subject of Chapters 4, 5, and 6. After Chapter 7 of this book, the foundation and statistical methods of
inferential statistics will be introduced, and a systematic discussion will be made based on the procedure
of statistical inference.
母體(population):調查者所欲研究的全部對象所成的集合。
Population: The collection of all the subjects that the investigator wants to study.
樣本(sample) :母體的部分集合。
例題 1.3
Example 1.3
某一研究者想瞭解全國失業率的問題,此時母體即為全體國民;同理,若他僅對台北市的失業率感興
趣,則全體台北市民即成為其所欲研究的母體。
If a researcher wants to know about the national unemployment rate, the matrix is all citizens. Similarly,
if he is only interested in the unemployment rate in Taipei City, then all Taipei citizens will become the
matrix he wants to study.
由此可知,母體的範圍可大可小,完全視「所欲研究的對象」而定。 當實際進行資料蒐集後,一般皆
僅從母體抽取其中一部分來觀察(詳細說明請參見本書第 7 章有關抽樣的課 題),此乃母體所包含的
某特定個體之集合,即稱為樣本。
It can be seen from this that the scope of the matrix can be large or small, and it depends entirely on the
“object to be studied”. When the actual data collection is carried out, generally only a part of the
population is sampled for observation (for details, please refer to Chapter 7 of this book on the subject
of sampling).
統計學中來闡釋母體和樣本特性的摘要性數值稱為統計表 徵數或統計測量數(statistical
measurements)。
Summary values used in statistics to illustrate the characteristics of a population and a sample are called
statistical representations or statistical measurements.
用來描述母體的特徵量數稱為參數或母體參數(population parameter),一般而言,母體參數是統計學
想要知道之核 心。 如母體平均數,母體比例、母體標準差。
The characteristic quantity used to describe the population is called the parameter or population
parameter. Generally speaking, the population parameter is the core that statistics wants to know. Such
as maternal mean, maternal proportion, maternal standard deviation.
用來描述樣本的特徵量數稱為樣本統計量(sample statistic),是用來推論母體參數之主要特徵量值。
如樣本平均數、樣本比例、樣本標準誤。
The number of features used to describe the sample is called sample statistic, which is the main feature
value used to infer the maternal parameters. Such as sample mean, sample proportion, sample
standard error.
統計學與機率論聯繫緊密,統計學常以機率論為理論基礎。
Statistics is closely related to probability theory, and statistics is often based on probability theory.
簡單地講,兩者不同點在於機率論-從母(群)體中推匯出樣本的機率。
Simply put, the difference between the two lies in probability theory – the probability of deriving a
sample from the parent (group).
正好相反地,統計學– 從小的樣本中推論出大的母(群)體的參數。
On the contrary, statistics – inferring parameters of a large population (group) from a small sample.
參數(parameters):
G²:母體變異數(population variance)、
P:母體比例(population proportion)等。
統計量(statistics):
※:樣本平均值(sample mean)、
P:樣本比例(sample proportion)等。
統計的功用
The Function of statistics
3.分析各種變項之間的關係:社會現象的變動往往受到許多因 素所影響,而這些因素彼此間可能具
有相互作用的關係, 其中可能存在某種因果關係或規律性。 如經濟成長的變 動,其影響因素有物
價水準,人口變動、政治與法律的因 素、貨幣供給量等。 統計方法的運用可幫助我們發現針對
一現象各種影響因素間的因果關係與規律性,進而加以分析與比較。
3.Analyze the relationship between various variables: The changes of social phenomena are often
affected by many factors, and these factors may have an interactive relationship with each other, and
there may be some causal relationship or regularity. Such as changes in economic growth, its
influencing factors include price levels, population changes, political and legal factors, money supply and
so on. The application of statistical methods can help us find the causal relationship and regularity
among various influencing factors for a phenomenon, and then analyze and compare.
4.預測:根據統計方法釐清相關變項間的關係,並進而運用統 計方法進行預測,藉以了解某一現象之
變動趨勢,做為籌 劃未來的依據。 (13~15 章)
4.Prediction: Clarify the relationship between relevant variables according to statistical methods, and
then use statistical methods to make predictions, so as to understand the changing trend of a
phenomenon and use it as a basis for future planning. (Chapters 13~15)
統計在經營決策中的應用
The application of statistics In business decision-making
舉凡一切能用數字表示(量化的社會現象與自然現 象,皆可利用統計學的原理與方法來研究,包括經
濟、政治、科學研究及工商企業等領域。
All social and natural phenomena that can be represented by numbers (quantified) can be studied using
the principles and methods of statistics, including economics, politics, scientific research, and industrial
and commercial enterprises.
1. 統計在經營決策中的應用範圍:
1. Scope of application of statistics in business decision-making:
Ex1.在生產製造活動上,運用統計方法執行抽樣檢 驗與品質管制,如此可達到以最低的成本或至最
高 的品質。
Ex1. In manufacturing activities, use statistical methods to perform sampling inspection and quality
control, so as to achieve the lowest cost or the highest quality.
Ex2.在企業人力資源管理上,可利用統計方法對員 工的訓練、績效的評鑑以及人力資源規劃等進
行統 計分析,以獲得有用的資訊,做為管理當局的決策 參考
Ex2. In enterprise human resource management, statistical methods can be used to conduct statistical
analysis on employee training, performance evaluation and human resource planning to obtain useful
information and serve as a reference for decision-making by management authorities.
例如,想知道班上同學統計學期中考成績的分布狀況。
For example, I want to know the distribution of the students’ statistical midterm test scores in the class.
統計資料通常是由一個或多個變數之值所組成的。
Statistics are usually composed of the values of one or more variables.
所謂變數(variable)乃是一種具有不同值或結果的特徵之衡 量,凡一切可計量的特徵皆稱為變數,而
變數的量(或值) 稱為變量(variate)。
The so-called variable is a measure of characteristics with different values or results. All measurable
characteristics are called variables, and the quantity (or value) of the variable is called a variable.
For example, age is a variable, and its variable may be 1~130; in addition, gender, income, length, price,
etc., are also variables.
(2) Quantitative data, also known as quantitative data, are data that are measured based on digital
scales.
例如,年齡、身高、體重、分數、溫度、速度、 意外事件次數等等,它們皆可用數字表達。
For example, age, height, weight, scores, temperature, speed, number of accidents, etc., can all be
expressed numerically.
Quantitative data can generally be divided into discontinuous data and continuous data. Intermittent
data is countable and has a minimum count unit.
例如,統計某班人數、產品良品個數、車禍件數等。
For example, count the number of people in a certain class, the number of good products, the number
of car accidents, etc.
連續資料是由量測(measure)而得,這是不可計數的。
試區分下列資料屬於質的資料或量的資料;若為量的資料 並指出其為連續或間斷資料:
Try to distinguish whether the following data are qualitative data or quantitative data; if quantitative
data, indicate whether it is continuous or discontinuous data:
(a)某公司員工之年齡。
(c)每通長途電話的時間。
(d) 季節。
(d) Season.
(e)某公司員工之職位等級。
質的資料: (d)、(e)
量的資料: (a)、(b)、(c);其中(b)為間斷資料,而(a)、(c)為連續資料。
Quantitative data: (a), (b), (c); where (b) is intermittent data, and (a), (c) are continuous data.
Note 1: To distinguish qualitative data or quantitative data, when someone asks you about a variable, if
you answer directly with numbers, it is quantitative data, otherwise it is qualitative Huatai data.
Note 2 :當質的資料經過量化之後,其代表的數字不能做某些統計計算,只能當類別的代號。
Note 2: After the qualitative data is quantified, the numbers it represents cannot be used for certain
statistical calculations, but can only be used as the code of the category.
連續資料
continuous data
量的資料 :
Amount of information:
質的資料 間斷資料
qualitative data intermittent data
統計資料
statistical data
According to the measurement level or measurement scale of the data, it is divided into four categories:
◎名目尺度(nominal scale)
◎ 順序尺度(ordinal scale)
◎ 等距尺度(interval scale)
◎ 比率尺度(ratio scale)
針對不同尺度的資料,所選用的統計方法會有所不同。
For data of different scales, the selected statistical methods will be different.
◎ 名目尺度(nominal scale):僅以簡單數字(代號)標示(代表)變數的屬性。
◎ nominal scale (nominal scale): the attributes of variables are only marked (represented) by simple
numbers (codes).
◎ 順序尺度(ordinal scale):用於衡量具有名目尺度性質且其代號(數字)具有等級排列之特性的資
料。 例如:學生成績名次,消費者對品牌的偏好程度等。
◎ Ordinal scale: It is used to measure the data with the properties of nominal scale and its code
(number) has the characteristics of rank arrangement. For example: student grades, consumer
preference for brands, etc.
順序尺度的數值資料,亦僅能衡量數值間的順序或等級,不 能衡量其間之距離。
Numerical data on an ordinal scale can only measure the order or level between values, but not the
distance between them.
(無法就順序上分辨第一、二位之間和第二、三位之間的差距)
(Can’t tell the difference between the first and second place and the second and third place in order)
※※上述兩種尺度的(數值)資料,均不能用來做四則運算。 ※※
※※The (numerical) data of the above two scales cannot be used for four arithmetic operations. ※※
◎ Interval scale: When the data has the nature of an ordinal scale and the difference between the
numbers indicates the size of the distance. But there is no absolute origin (zero).
例如:溫度、年份等。 等距尺度的資料必定為數值,而且可作加減, ,但不可做乘 除。
For example: temperature, year, etc. The data on the equidistant scale must be numerical, and can be
added and subtracted, but not multiplied and divided.
例如:可考慮溫差、分數差距,但溫度或分數的比例並未 具有實質上的意義。
For example: temperature difference and fractional difference can be considered, but the ratio of
temperature or fraction does not have substantial meaning.
A temperature of 30 degrees Celsius is 1.5 times that of 20 degrees Celsius. This sentence has no
substantial meaning, because the same situation is considered in Fahrenheit and the numbers
represented by each will be different, mainly because there is no fixed existence (no absolute zero).
• 比率尺度(ratio scale):
(1)資料具有等距尺度的所有性質。
(1) The data have all the properties of the equidistant scale.
(2)資料數值間的比值是有意義的。
比率尺度與等距尺度的區別在於,前者有絕對的原點(零點),而後者的原點是任意選定的。 比率尺
度的資料是可以做四則運算
The difference between ratio scales and equidistant scales is that the former has an absolute origin (zero
point), while the latter’s origin is arbitrarily chosen. The data of the ratio scale can do four arithmetic
operations
For example: length, weight, sales, income, etc. Because its distance or size can be calculated from an
absolute zero point.
A 1 million yuan car is 500,000 yuan more expensive than a 500,000 yuan car, and its price is twice as
high.
數值與非數值資料和四種衡量尺度之間的關係
Relationships between numerical and non-numerical data and the four measurement scales
名目 表示資料的類別 數值是用
來代表資料的類別,順序
Nomenclature Indicates the type of data, and the numerical value is used to represent the type and
order of the data
類別 表示資料的排序 數值可用
來代表資料的排序
Category Indicates the ordering of the data Values can be used to represent the ordering of the data
等距 — 數值之間的差異
代表距離大小,不具有絕對的原點
Equidistant — the difference between the values represents the magnitude of the distance and does not
have an absolute origin
比率 — 數值之間的差異
代表距離大小,且具有絕對的原點
Ratio — the difference between values Represents the distance and has an absolute origin
註:*可做加減運算;#可做加減乘除運算。
Note: * can do addition and subtraction; # can do addition, subtraction, multiplication and division.
分析名目尺度與順序尺度資料的主要統計方法為無母數統計學
The main statistical method for analyzing nominal scale and ordinal scale data is parentless statistics.
(nonparametric statistics)。
分析等距尺度與比率尺度資料的常用統計方法為母數統計學
A common statistical method for analyzing equidistant-scale and ratio-scale data is maternal statistics.
(parametric statistics)。
分組次數表:將全部變量依其大小次序,分為若干段落(分 (組),以每一段落為一組(class)所編製的
表。適用於連續資 料與規模較大的間斷資料。
Grouping times table: A table prepared by dividing all variables into several paragraphs (groups) in the
order of their size, and taking each paragraph as a group (class). It is suitable for continuous data and
large-scale intermittent data.
Statistical data represented by a table of grouping times is called grouped data. Although grouping will
lose the original details of the data, it can still help understand the regularity and distribution of the
original data, and has the ability to aggregate and simplify the original data. function.
次數分配表的編製
Preparation of frequency allocation table
1.分組次數表之編製步驟
(1)步驟 1:排列-將各觀測值依大小順序排成一序列。
(1) Step 1: Arrangement – Arrange the observations into a sequence in order of magnitude.
(2)步驟 2:求全距(range)R=最大值最小值
(2) Step 2: Find the full range (range) R = maximum and minimum
(3)Step 3: Determine the number of groups and determine the class interval
D= R/K(組距=全距/組數)
The effective digits of the group distance = the effective digits of the data value
N:50~99. K: 5-10
N:100~249 k:7~12
n>249 K: 10-20
(4)步驟 4:決定組限定組上限/組下限
D=U-L+最小計算單位
Assuming the observed value is 2.30.3.12.2.47…, the minimum count unit is 0.01. If Z=2.10, d=0.50,
then 0.50=U-2.10+0.01, namely U=2.59. That next set should be 2.60-3.09, and so on.
(5)步驟 5:歸類與劃記卅或正
例題 2.3
Example 2.3
假定某一班級 48 個學生的統計學學期成績如下,試編製一次數分配表。
Assuming that the statistic semester grades of 48 students in a certain class are as follows, try to
compile a number distribution table.
解
untie
(a)排列:依數值大小重新排列,如下所示:
(b) Find the total distance: the maximum value is 93 and the minimum value is 28, so the total distance
R=93-28=65
© Number of groups and distance between groups: Since the total number of observations = 48, the
number of groups = 5. Therefore, the group distance D=R/k = 65/5 = 13; in order to facilitate grouping,
the group distance is often a multiple of 5 or 10. For this example, it is more convenient to take d=15
(which needs to be greater than 13).
(f)計算次數:將劃記欄中各組的記號計數並記載於次數欄 中,然後將各組之次數加總,可得總次數
=48,與原始資 料總人數相符合。
(f) Counting times: Count the marks of each group in the marking column and record it in the frequency
column, then add up the times of each group to get the total number of times = 48, which is consistent
with the total number of people in the original data.
表 2.3
Table 2.3
48 位學生統計學成績之次數分配
2. 編製次數分配表之原則
(Sturge’s rule)
K = 1 + 3.32log10n
(2)關於組距:
優點:
Advantage:
1.便於比較各組的次數
2. 便於使用
2. Ease of use
3.便於計算各種統計量數
在某些特殊的分配上,基於事實需要,有時可採用不等組距。 例如:某地區的人口依其財富而編製次
數分配表,
In some special assignments, based on actual needs, sometimes unequal group spacing can be used. For
example: the population of a certain area is compiled according to its wealth and the frequency
distribution table,
若以千元為計數單位,可採用
(3)關於組限:
(a)組限的決定,要求其有效數字位數必須與觀察值之有效位數相同。
(a) The determination of the group limit requires that the number of significant digits must be the same
as the number of significant digits of the observed value.
(b) 分組的結果必須涵蓋所有觀察值。
© It is convenient to avoid using open classes as much as possible. Calculation of certain statistics.
Examples of open groups: below 37 or above 83.
• The relative frequency (relative frequency) refers to dividing the number of each group by the total
number of times. The relative frequency table in which the relative frequency of each group is listed in
turn is called the relative frequency distribution table. Its function is to display the proportion of each
set of reps to the total number of reps, and can be used to represent a reasonable comparison between
two or more reps assigned to the same group.
,其中 Fi 為第組的組次數。
• Cumulative frequency refers to accumulating the frequency of each group in turn; the table listing the
cumulative frequency is called the cumulative frequency distribution table. In many cases, we may not
be interested in the number of observations that fall within a group, but the number of observations
that fall above or below a particular value.
以下累積:是由數值较小的組之組次數,依序累加至數值較大的組之次數而得(取每組的組上限以
下);
The following accumulation: Is obtained by accumulating the group times of the group with a smaller
value to the times of the group with a larger value in order (take the group upper limit of each group
below);
以上累積:是由數值較大的組之組次數 ,依序累加至數值較小的組之組次數而得(取每組的組下限以
上)。
The above accumulation: the group times of the group with a larger value , and sequentially
accumulated to the group times of the group with the smaller value (take the group lower limit of each
group above).
表 2.4 表 23 之相對次數分配與累積次數分配
Table 2.4 Relative frequency allocation and cumulative frequency allocation in Table 23
由表 2.4 可知,48 名學生成績落在 3~67 分之區間的比重最大·約佔 1/3,根據以上累積次數知,統計學
成績 68 分以上的學生人數有 17 人,而由以下累積 次數知,統計學成績 67 分以下的學生有 31 人。
It can be seen from Table 2.4 that the 48 students whose scores fall in the range of 3 to 67 points have
the largest proportion, accounting for about 1/3. According to the above accumulated times, there are
17 students with statistical scores above 68 points, and the following cumulative According to the
number of times, there are 31 students with statistical scores below 67.
2.4. 次數分配的圖示法
次數分配圖是一種統計圖,用以繪示次數分配表中分配的情 況。 故欲繪製次數分配圖之前,必須
先編好次數分配表。
The frequency distribution chart is a statistical graph used to show the distribution in the frequency
distribution table. Therefore, before drawing the frequency distribution diagram, the frequency
distribution table must be compiled first.
次數分配圖的功能在於:使看圖的人僅花費少許時間,便可 對統計資料得到一明確的綜合性觀念。
亦具有比較的功能。
The function of the frequency distribution diagram is to enable the viewer to get a clear and
comprehensive idea of the statistical data with only a small amount of time. It also has a comparison
function.
常見的次數分配圖:
•用於連續型的統計資料,有直方圖、多邊形圖、累積次數分配圖等;
•Used for continuous statistical data, such as histograms, polygonal graphs, cumulative frequency
distribution graphs, etc.;
•用於問斷型的統計資料,有長條圖、面積圖等。
•Used for the statistical data of the broken type, there are bar graphs, area graphs, etc.
(1)直方圖
(1) Histogram
A histogram Is a kind of statistical graph* that represents the number of times of each group in the
distribution of times by the size of the rectangular area. Histograms are suitable for continuous data.
The abscissa represents the class boundary of each group, and the ordinate represents the number of
groups.
下组界=組下限-1/2(觀察值最小單位)
上組界=組上限+1/2(觀察值最小單位)
Upper group limit = group upper limit + ½ (minimum unit of observation value)
例題 2.5
refer to Table 2.3 to try to draw a histogram of the statistical results of 48 students. Untie
直方圖的縱座標若改為相對次數(?),則稱為相對次數分配圖(或相對次數直方圖),此時需特別注意的
是,每一矩形(直方)的面積代表各組的相對 次數,因此縱座標應改為相對次數除以組距,即每一矩形
之高度應為:
If the vertical coordinate of the histogram is changed to the relative degree (?), it is called the relative
degree distribution diagram (or the relative degree histogram). At this time, it should be noted that the
area of each rectangle (histogram) represents the relative degree of each group. The number of times,
so the ordinate should be changed to the relative times divided by the group distance, that is, the height
of each rectangle should be:
相對次數直方圖的矩型高度 =
相對次數/上組界-下組界
This adjustment can make the sum of all rectangular areas to 1, that is, the sum of relative times is 1.
Usually used to compare the ratio of each group to the total (that is, the area of each strip represents
the probability of occurrence of each group)
圖 2.3(a)48 位學生統計學成績之相對次數方面
直方圖與長條圖的區別在於,
The histogram uses the area to represent the relative frequency of the group, not the height to
represent the number of the group. The histogram consists of a set of rectangles, and the area of each
rectangle represents the percentage of samples in the corresponding interval.
The height of each rectangle represents the sample density, which is the percentage of samples in the
interval divided by the length of the interval (or the width of the rectangle). Its area is a percentage, and
the total area is 100%. The area between the two values under the histogram gives the percentage of
samples that fall within that interval.
但有些說法用次數(相對次數)當每組的高度,其直方圖涵義僅 顯示組間變化的趨勢。
However, some statements use times (relative times) as the height of each group, and its histogram
meaning only shows the trend of changes between groups.
(2)多邊形圖
Polygons can be drawn by histograms, with the class midpoint of each group as the abscissa and the
ordinate as the number of groups.
組中點=組下限+组上限/2=下組界+上組界 /2
Group midpoint = group lower limit + group upper limit/2 = lower group boundary + upper group
boundary/2
先將直方圖的第一組與最後一組各再延伸一組,此二組的組次數均設為 0,再將各矩形上端的組中點
連接起來成為一密開曲線,即可構成多邊形 圖。
First, extend the first group and the last group of the histogram by another group, set the group times of
these two groups to 0, and then connect the middle points of the groups at the upper end of each
rectangle to form a dense curve to form a polygon picture.
48 位學生統計學成績之次數分配
表 2.5,48 位學生統計學成績之次數分配
假想組(hypothetical group)
假想組(hypothetical group)
當資料數量多,其相對次數直方圖的組數多且組距非常小時,所得之相對次數 多邊形圖可視為次數
曲線(frequency curve),通常可用來和連續型的機率分配圖做對比比較。 (例如:常態分配)
When the number of data is large, the relative frequency histogram has a large number of groups and
the group distance is very small, the obtained relative frequency polygonal graph can be regarded as a
frequency curve, which can usually be used to compare and compare with the continuous probability
distribution diagram. (eg: normal distribution)
(3)累積次數分配圖(肩型圖)
The graph drawn according to the cumulative frequency (including the following accumulation and the
above accumulation) is the cumulative frequency distribution chart or the ogive plot.
其橫座標仍類似直方圖中的各個組界:
以上累積次數分配圖是以各组的下界為橫座標。
The above cumulative frequency distribution chart takes the lower bound of each group as the abscissa.
二者均以累積次數為縱座標。
Both take the cumulative number of times as the ordinate.
累積次數分配圖
Cumulative times distribution chart
值得注意的是,圖 2.6 中的二段曲線之相交點,其縱座標(累積次數)為 24,而橫座標為 60.94 (中位數
M),此一交點所代表的意義是,成績高於 60.94 分者約有 24 人,而成績低於 60.94 分者亦有 24 人左
右,皆有一半(50%)的人數,此即中位數的觀念,參見第 3 章的介紹。
It is worth noting that the intersection point of the two-segment curve in Figure 2.6, its ordinate
(cumulative times) is 24, and the abscissa is 60.94 (median M), the meaning of this intersection is that
the score is higher than There are about 24 people with a score of 60.94, and about 24 people with a
score lower than 60.94, all of which are half (50%) of the population. This is the concept of median, see
the introduction in Chapter 3.
•長條圖(bar chart),是一種以長方形的長度為變量的統計圖表。 有 長條圖用來比較兩個或以上的
價值(不同時間或者不同條件) 只有一個變量,通常利用於較小的數據集分析。 長條圖亦可橫向 排
列,或用多維方式表達。 類似的圖形表達為直方圖,不過後者 較長條圖而言更複雜
• A bar chart is a statistical chart that uses the length of a rectangle as a variable. There are bar charts
used to compare two or more values (at different times or under different conditions) with only one
variable, and are usually utilized for analysis of smaller data sets. Bar charts can also be arranged
horizontally, or expressed in a multi-dimensional manner. A similar graphical representation is a
histogram, although the latter is more complex for longer bars
• When drawing a bar graph, the center line of the bar or bar group must be aligned with the item
scale. When the numbers are large and close, the tilde ellipsis can be used to widen the gap between
the performance data. Enhance understanding and clarity.
直方圖和長條圖之差別
Difference Between Histogram and Bar Chart
○ Usually bar charts are used for discrete data, while histograms are used for continuous data
○直方圖的橫軸代表組界,其矩形的寬度表示組距的大小;至於長條圖並無組距和組界的觀念,其矩形
的寬度亦不具任何意義。 (僅為了觀察方便及圖表的美觀。)
○The horizontal axis of the histogram represents the group boundary, and the width of the rectangle
represents the size of the group interval; as for the bar graph, there is no concept of group interval and
group boundary, and the width of the rectangle does not have any meaning. (Only for the convenience
of observation and the beauty of the chart.)
○ 就外觀而言,直方圖的每一矩形(直方)是併排列著,而長 條圖的每一矩形(長條)之間可存在間隙
(gap)
○ In terms of appearance, each rectangle (bar) of the histogram is arranged side by side, and there may
be a gap (gap) between each rectangle (bar) of the bar graph
面積圖
Area chart
Circular chart: The number of times or relative times of frequency distribution is represented by the size
of the area of the plane figure. (pie chart)
2.5 結語
2.5 Conclusion
圖示法:直方圖、多邊形園、看形圖,枝葉圖”
Graphical method: histogram, polygon garden, shape chart, branch and leaf chart”
連續型
Continuous type
量的資料
Amount of information
列表法:次數分配、相對次數分配,累積次數分配
List method: frequency allocation, relative frequency allocation, cumulative frequency allocation
圖示法:長條更,圓形圖、枝葉圖
Graphic method: long bar, circular diagram, branch and leaf diagram
問斷型
Interrogative
列表法:次數分配、相對大數分配,累積次數分配
List method: frequency allocation, relatively large number allocation, cumulative frequency allocation
資料
Material
圖示法:長條圖,圓形圖
Graphical method: bar graph, circle graph
質的資料
Qualitative information
CHAPTER 3 敘述統計(II)——統計量數
•3.1 集中趨勢量數
•3.3 平均數與標準差的應用
•3.4 偏態量數、峰態量數與動美
•3.5 分組資料之各種統計量數的計算
•3.6 探索性資料分析
•3.7 結語
• 3.7 Conclusion
•雖然統計圖表有助於了解一組資料的特性,但為了獲 得具體的描述與比較分析資料,需進一步求出
資料之 位置或集中的量數,以及資料的差異量數。 這些量數 稱為統計量數,它是代表一組資料之
特質的具體數 字。
•Although statistical charts help to understand the characteristics of a set of data, in order to obtain
specific description and comparative analysis data, it is necessary to further obtain the position or
concentration of the data, as well as the difference of the data. These quantities are called statistics,
which are specific numbers that represent the characteristics of a set of data.
•敘述統計資料之特性的統計量數,主要如下:
◇差異量數(measure of variability)*
◇偏態量數(coefficient of skewness)
◇峰態量數(coefficient of kurtosis)
○集中趨勢量數亦簡稱為集中量數,代表一組資料中,各觀察值某種特性有共同趨勢存在之量數。
○The measure of central tendency, also referred to as the measure of concentration, represents the
measure of the existence of a common trend in a certain characteristic of each observation in a set of
data.
○ 因其可反映該組資料觀察值的位置的量數,又稱為位置量數(location measure),較常用的有下列數:
○ It is also called location measure because it can reflect the number of locations of the observed values
of this set of data. The more commonly used numbers are the following:
◇ measure of central tendency (measure of central tendency): mean (mean), median (median), mode
(mode).
◇ relative location measure (relative location measure): percentile (percentile), quartile (quartile), decile
(decile).
平均數(mean)
Mean (mean)
○ 平均數(mean)可說是最重要的集中量數,它可做為一組資料的代表值。 一般而言,平均數具有簡
化作用,代表作用 及比較作用等幾項功用。
○ The mean is arguably the most important concentration measure, and it can be used as a
representative value for a set of data. In general, the mean has several functions, such as simplification,
representation and comparison.
●簡化作用:平均數能以一個簡約的數字,概不一組責 料分配的特徵。
● Simplification effect: The average can be used as a simplified number, which is not characteristic of the
distribution of a set of materials.
●代表作用:平均數能代表一組資料的平均水準,此乃 因為平均數是一組資料的中心数值,故以它來
代表整個資料分配最為恰當。
●Representative role: The average can represent the average level of a group of data, because the
average is the central value of a group of data, so it is most appropriate to represent the entire data
distribution.
● 比較作用:平均數簡化所有的數值後,以該數代表整 組資料的平均水準,如此可便於兩個或兩個以
上的資料分配間做比較。
● Comparison function: After simplifying all the values, the average represents the average level of the
entire group of data, which can facilitate the comparison between two or more data distributions.
例如,以班級的平均分數來做成績好壞的參考。
For example, use the average grade of the class as a reference for grades.
• Harmonic mean: If the data presents a harmonic series (the reciprocal of the data is an arithmetic
series), the harmonic mean should be used for calculation. In practical applications, data such as the
average price under fixed prices and the average speed per hour under fixed distances are all suitable
for use.
● 平均數(mean)的意義
平均數是集中量數中,最簡單、最重要且最常採用的量數。
The mean is the simplest, most important, and most commonly used measure among the concentration
measures.
設一組資料有個數值,X1,X2,…,則其平均數為:
Suppose a group of data has a value, X1, X2,…, then its average is:
NOTE The harmonic mean is always less than the geometric mean, which in turn is less than the
arithmetic mean.
例題 3.1
Example 3.1
A certain class A and B groups of students, 5 people in group A and 4 people in group B, the results of a
certain statistical test are as follows:
甲組:89,72,55, 68, 78
乙組:88,63,76,69
該次測驗結果,二組成績優?
The calculation result shows that the average score of Group A is 72.4, which is lower than that of Group
B, which is 74, so the score of Group B is better in this test.
平均數具有一些重要的特性
Averages have some important properties
1)任一組資料中,各觀測值與其平均數之差(稱為離差(deviation))的代數和為 0,亦即:
1) In any set of data, the algebraic sum of the difference between each observation and its mean (called
deviation) is 0, that is:
可視為資料的重心。
Can be regarded as the center of gravity of the data.
(2)任一組資料中,各觀測值與其平均數之差的平方和·較各 觀測值與平均數以外的任何數值之差的
平方和為小亦 即:
(2) In any group of data, the sum of squares of the difference between each observation and its mean is
smaller than the sum of squares of the difference between each observation and any value other than
the mean, that is:
3)若有 k 組資料,其項數與平均數分別為(n1,x1),(n2,x2),…,(Nk,Xk)若將組資料合併成一組,其項數變為
n,而總平均數為 x,則:
(加權平均數)
(weighted average)
例題 3.2
Example 3.2
There are three classes A, B and C with 504,852 students. In a statistics exam, the average scores of the
three classes are 80, 76, 85, respectively, and the total average score of the three classes in statistics is
obtained.
解
Solution
依題意知:
According to the meaning of the question:
根據(3-4)式,總平均成績為:
例題 3.3
Example 3.3
It is known that the average weight of the original 15 players in the women’s tug-of-war team of Dali
High School is 53.8 catties. Since 5 of them graduated with an average weight of 52 kg, what is the
average weight of the current 10 students of the school’s women’s tug-of-war team?
解
Untie
Let (N1,X1) and (N2,X2) represent the average weights of 10 contestants and 5 graduate contestants
respectively, then the formula (3-4) can be obtained:
由此解得 X1=54.7;故目前大里高中 10 位女子拔河隊選手 的平均體重為 54.7 公斤。
From this solution, X1=54.7; therefore, the average weight of the 10 women's tug-of-war players in Dali
High School is 54.7 kg.
以平均數做為一組資料之集中量數,有其優缺點:
There are advantages and disadvantages of using the mean as a concentrated measure of a set of data:
• 優點
• Advantages
① 平均數的代表性易為人接受。
② 計算平均數時,該組資料內的所有数值皆列入計 算。
② When calculating the average, all the values in this group of data are included in the calculation.
③ 可用代数方法處理,頗適合數學的應用。
③ It can be dealt with by algebraic method, which is quite suitable for the application of mathematics.
• 缺點
• shortcoming
Ex. A 班五位同學的成績為 60,61, 62, 63, 64, B 班五位同學的成績為 38,40, 45, 92, 95,
Ex. The grades of five students in class A are 60, 61, 62, 63, 64, and the grades of five students in class B
are 38, 40, 45, 92, 95,
Although the average grades of the two groups of students are both 62, the grades of class A students
are relatively average, and the grades of class B students are quite different.
○ The meaning of the median. A set of data x1, x2,…, Xn arranged in order of magnitude, the median is
the value in the middle position, that is:
When it is an even number, the mean of the values in then/2 and n/2 +1 positions is the median.
Generally represented by M.
當一組資料中存在過多極端值時,其平均數的代表性變 得比較差,此時採用中位數為較佳的量。
When there are too many extreme values in a set of data, the representativeness of the mean becomes
poor, and the median is the better value.
Ex. The grades of five students in class A are 60, 61, 6263, 64, and the grades of five students in class B
are 38, 40, 45, 92, 95,
Although the average grades of the two groups of students are both 62, the grades of class A students
are relatively average, and the grades of class B students are quite different.
求下列二組資料之中位數:
I:13, 20,8,15,7
先將資料按大小順序排列,然後找中間位置:
First arrange the data in order of size, and then find the middle position:
二數平均 =11+15/2 = 13
The median of the data in both groups I and II was equal to 13.
例題 3.5
Example 3.5
平均數為:
15+3+46+623+126+64/6=877/6= 146.2 天,
計算中位數時,需先將資料依大小順序排列:
When calculating the median, it is necessary to arrange the data in order of size:
• 中位數的特性
任一組資料中,各觀測值與其中位數差之絕對值總和為最小。
In any group of data, the sum of the absolute values of the difference between each observation and its
median is the smallest.
其中 A 為該組資料中任一數值。
當 A 與 M 相差愈大,兩數值之間的差距亦愈大。
The greater the difference between A and M, the greater the gap between the two values.
例如,x1=2,5,7,8,10,其中 Me=7。
○ 中位數的優點如下:
① 性質簡單,易於瞭解。
② 不易受極端值的影響。
○中位數的缺點如下:
① 中位數只考慮居中位置的幾個數值,忽略了其他數值的大小,故缺乏敏感性。
① The median only considers a few values in the center, ignoring other values Size, so lack of
sensitivity.
② 基於上述①的理由,中位數不適合代數運算。
② For the reasons of ① above, the median is not suitable for algebraic operations.
○當資料存在極端值時,中位數的代表性較平均數為佳。
○When there are extreme values in the data, the representativeness of the median is better than that
of the mean.
• 眾數(mode)的意義
一組資料中,出現次數最多的數值即為眾數。
Generally represented by M.
眾數為最集中且以位置為主的一個量數,特別適用於質的資料。
The mode is the most concentrated and location-based quantity, which is especially suitable for
qualitative data.
該班之血型 A 型最多,故血型資料的眾數為 A
The class has the most blood type A, so the mode of the blood type data is A
例題 3.6
Example 3.6
試求出下列三組資料之眾數:
In the first group of data, 15 appeared 4 times the most, so the mode was 15.
In Group II data, both 10 and 12 appear three times (the most), so the mode is 10 and 12.
第 III 組資料中,各數值皆僅出現一次,故眾數不存在。
In the data of group III, each value appears only once, so the mode does not exist.
由此例子可知,眾數可能不存在或者不只一個,因此眾數並不具 有唯一存在的性質。
From this example, it can be seen that the mode may not exist or there may be more than one, so the
mode does not have the property of unique existence.
• 眾數的特性
• Properties of Modes
(1)眾數的優點如下:
① 性質簡單、易於瞭解。
② 眾數不受極端值的影響。
(2)眾數的缺點如下:
① 眾數與中位數一樣,僅考慮其中的幾個數值,故不適合代數運算。
①The mode, like the median, only considers a few values, so it is not suitable for algebraic operations.
② 當資料中的各數值皆只出現一次時,即不存在眾數:又兩個以上的眾數之情況,究竟選擇哪一個
為集中量 數的代表,頗難取捨。
② When each numerical value in the data appears only once, there is no mode; in the case of more
than two modes, it is quite difficult to choose which one is the representative of the concentrated
quantity.
O 當資料中有較多的數值向某一數值或其附近集中的情形時,採用眾數頗恰當。
O When there are many values in the data that are concentrated to a certain value or its vicinity, the
mode is appropriate.
Summary
• 尺度特性
• Scale properties
-名目尺度:眾數
-順序尺度:眾數或中位數
-Order scale: mode or median
-等距尺度及比率尺度:平均數
• 優缺點
-眾數:具有作為類別資料的判斷準則(例如在民意的 表達,少數服從多數)、不受極端值影響等之優
點。 但是如果觀察值的分佈並不集中,則不適用眾數為判 斷準則;另外眾數不適合數學運算。
-Mode: It has the advantage of being a criterion for judging category data (for example, in the
expression of public opinion, the minority obeys the majority), and it is not affected by extreme values.
However, if the distribution of the observed values is not concentrated, the mode is not suitable for the
judgment criterion; in addition, the mode is not suitable for mathematical operations.
-Median: It has the advantages of not being affected by extreme values, and representing a probability
of 50% when the probability accumulates to the median. But medians are just as bad for math.
百分位數(percentile)
O 百分位數的意義,將資料按大小順序排列後,若至少有 k%的觀測值位於某一數值以下,且至少有
(100-k)%的觀測值位於該值以上, 則此數值 稱為該組資料的第 4 個百分位數。 (k-th percentile),一
般常用 Pk 表示。
The meaning of O percentile, after arranging the data in order of size, if at least k% of the observations
are below a certain value, and at least (100-k)% of the observations are above this value, then this value
is Called the 4th percentile of the set of data. (k-th percentile), commonly used to represent Pk.
NOTE: The point is to be less than or equal to the value Pk. There must be k% of observations.
Ex. Divide the distribution range of a group of sorted data into 100 equal parts, and take the 32 nd
aliquot, the value of its location is P32.
O 百分位數的計算步驟
O Calculation steps for percentiles
(1)將資料依大小順序排列。
(2)求出百分位數(Pk)所在位置的指標 index),設為,則
(2) Find the index index of the position of the percentile (Pk), set it as , then
I=k/100 × n (n 表示觀測值的個數)。
(3)若為非整數,則為下一個整數位置的數值,
For example, if i=9.23, then take the value of the 10 th position as Px:
if it is an integer, then take the average of the two values of the 10th position and +1 position, which is
the required PK"
例題 3.7
Example 3.7
An airline recruited flight attendants, and a total of 50 candidates came, and their weights were
arranged in order from small to large, as shown in Table 3.1. Try to find P25, P30: P50, P75.
表 3.1 50 位應徵者的體重由小到大排列表(以公斤為單位)
Table 3.1 The weight of the 50 applicants in descending order (in kilograms)
特殊的百分位數
Special percentile
用同樣的分割概念,我們可得
四分位數(quartiles):Q1=P25,Q2=P50,Q3= P75.。
十分位數(deciles):D1=P10,D2=P20,D3=P3,…,D9=P90。
表 3.2 百分位數與中位數、四分位數,十分位數之對照
The concept and method of other quantiles (fractiles) are similar to percentiles. For example, tertiles,
7ths, etc.
雖然各組平均成績相同(80 分),但成績分佈各不相同
Although the average score was the same across the groups (80 points), the distribution of scores
varied
全距(range)
In a set of data, the difference between the largest value and the smallest value is called the range,
which is generally represented by R.
優點:意義簡明易解且計算容易。
Advantages: The meaning is concise and easy to understand and the calculation is easy.
缺點:易受極端值之影響,無法測出中間各個觀察值之間 的差異情形。
Disadvantages: It is easily affected by extreme values, and cannot measure the difference between each
observation in the middle.
例如,工廠在進行品管時,事先設定產品標準規格之上下 限。 若產品規格的差異均在全距範圍內,
則此製造過程處於 控制狀態內。 否則超出控制之外,就必須採取矯正行動。
For example, when the factory conducts quality control, the upper and lower limits of the product
standard specification are set in advance. If the differences in product specifications are all within the
full range, the manufacturing process is under control. Otherwise out of control, corrective action must
be taken.
例題 3.8
Example 3.8
設有二組資料如下:
B: 3, 8, 8, 9, 9, 9, 10, 15
試求出其全距、平均數與中位數,並做比較。
Try to find its full distance, mean and median, and compare them.
解
Untie
由此可知,僅由全距來測度其差異量數,則其結果甚不可靠。
It can be seen from this that the results are very unreliable if only the full distance is used to measure
the difference.
四分位差(quartile deviation)
例題 3. 9
Example 3.9
Try to calculate the interquartile range and interquartile range of the weight of 50 applicants in Table
3.1.
解
Untie
參考例題 3.7
1. 四分位差常配合中位數一起運用,即以中位數表 示一次數分配之集中趨勢時,可再輔以四分位
差 來表示其差異情形。
1.Interquartile range is often used together with the median, that is, when the median is used to
represent the central tendency of the primary distribution, it can be supplemented by the interquartile
range to represent the difference.
2. 四分位距的優點是計算簡便,易於瞭解,而且不 受極端值的影響。
2.The advantage of the interquartile range is that it is simple to calculate, easy to understand, and not
affected by extreme values.
3. 它僅考慮中間一半的數值,而對兩端之另一半的 數值皆未涉及,故不能表示全部數值之分散及
差異的情形,惟此項缺點並不如全距之甚。
4. It only considers the value of the middle half, and does not involve the other half of the two ends,
so it cannot express the dispersion and difference of all the values, but this shortcoming is not as
bad as the full distance.
○ 一組資料中各個數值與其集中量數的差,稱為離差 (deviation)。
○ The difference between each value in a set of data and its centralized quantity is called deviation.
○ The difference between each value and the mean is called deviation about the mean, and the
difference with the median is called deviation about the median. Since the use of deviation from the
mean is more common, the deviation is usually used to refer to the deviation from the mean.
◎ Mean absolute deviation (MAD), which is to take the absolute value of the deviation of each value,
and then find its average
例題 3.19
Example 3.19
The mean of the first set of data is 10, and the mean of the second set of data is 6.75, so
此二組資料之差異僅在於第二組少了一較大的值(23),但其 平均偏差卻有很大的差異。 由此可知,
平均偏差易受極端值 的影響。
The only difference between the two groups of data is that the second group is missing a larger value
(23), but the mean deviations are quite different. It can be seen that the average deviation is easily
affected by extreme values.
以平均偏差來衡量一組資料的差異情形的優、缺點如下:
The advantages and disadvantages of using the mean deviation to measure the difference of a set of
data are as follows:
●優點
●Advantages
1)意義簡明,計算容易。
2)平均偏差係根據全部數值求得,故可表示整組資料 之完整的差異情形,較全距與四分位距感應靈
敏。
2) The average deviation is obtained based on all the values, so it can represent the complete difference
of the whole set of data, and is more sensitive than the full range and the interquartile range.
●缺點
●Disadvantages
1)When calculating the average deviation, the absolute value of the deviation is summed up, and the
operation of the absolute value is not obvious. (In the process of processing mathematical expressions,
the operation of taking the absolute value is more complicated.)
2) Because the mean deviation has the meaning of the average, it is susceptible to the influence of
extreme values just like the disadvantage of the mean. (The smaller the mean deviation, the smaller the
difference between the values.)
變異數與標準差
Variation and Standard Deviation
設有一組資料(母體)x1,x2,…, Xy,其平均數
• 變異數(variance)則取離差的平方亦有一嚴重的缺點,離 差取平方後,資料的量測單位亦需跟著平
方,此時往往變 為無意義的單位。
• Taking the square of the variance also has a serious disadvantage. After the variance is squared, the
measurement unit of the data also needs to be squared, which often becomes a meaningless unit.
• The concept of standard deviation, which is the square root of the variance, so that the unit can be
restored to be consistent with the unit of the original data, and there is no suspicion of increasing the
original difference. The maternal standard deviation is generally expressed as,
例題 3.11
Example 3.11
設有二組資料(母體)如下,試計算其變異數,並做比較:
There are two groups of data (maternal) as follows, try to calculate the variance and compare them:
A: 8, 9, 10, 11, 12
B: 4, 7, 10, 13, 16
解
Untie
此二組資料的平均數均為 10,而其變異數分別計算如下:
The mean of the two groups of data is 10, and the variance is calculated as follows:
It can be seen from the above that although the averages of the data in groups A and B are the same,
the variance in group A is much smaller than that in group B, so the degree of difference in the data in
the four groups is small, and the averages are more representative.
樣本變異數(sample variance)
The denominator takes n-1 instead. The reasons are as follows: Comparing equations (3-9) and (3-11),
the dispersion of the former is xM, while the dispersion of the latter is xi; in general, the mother loses
one degree of freedom (d.f. ), so the denominator is unknown with n-1 body mean, so it is usually
estimated by the sample mean, which leads to division. S² is also an unbiased estimator of a².
例題 3.12
Example 3.12
求算下列樣本資料的變異數:
The sample mean is overline x = 2.95, and its variance can be calculated by formula (3-11):
S^ 2 = 1 6-1 [ (3.4 – 2.95) ^ 2 + (2.5 – 2.95) ^ 2 + (4.1 – 2.95) ^ 2+ (1.2 – 2.95) ^ 2 + (2.8 – 2.95) ^ 2 + (3.7 –
2.95) ^ 2 ]
= 1.075
變異數與標準差之特性
Characteristics of Variation and Standard Deviation
◎ 變異數與標準差具有如下重要的性質:
• 若一資料分配的標準差很小,則表示大部分的數值 集中於平均數附近,此時平均數的代表性高;反
之,平均數的代表性低。
• If the standard deviation of a data distribution is small, it means that most of the values are
concentrated around the mean, and the representativeness of the mean is high; otherwise, the
representativeness of the mean is low.
•標準差恆大於等於零,除非所有的數值皆相等,標準差才會為零。
• The standard deviation is always greater than or equal to zero, unless all values are equal, the
standard deviation will be zero.
已知兩組(母體)資料,要合併為一組資料,其相關數據分別為
Two groups of (maternal) data are known and should be combined into one group of data, and the
related data are
N = N1+N2
例題 3.13
Example 3.13
設有 A、B 二班,其統計學平均成績、標準差與人數如下所 示:
There are two classes A and B, and the statistical average score, standard deviation and number of
people are as follows:
試計算兩班全體同學之統計學平均成績與標準差。
Calculate the statistical mean and standard deviation of all students in the two classes.
解
Untie
總平均成績
Overall grade point average
全體成績之變異數
Variation of overall performance
故全體成績之標準差為√179.94=13.414(分)
例題 3.14
Example 3.14
所以,
So,
剔除一數(64)之後,平均數與平方和
after removing one number (64), the average and the sum of squares
變為:
Becomes:
由(3-10)式知
The average weight of girls in Mr. Ma’s class should be 55.667 kg, with a standard deviation of 1.163
變異數與標準差之優缺點:
2. 變異數或標準差皆與平均數具有密切關係,且均能 滿足代數運算,故應用範圍很廣泛。
2. Both variance and standard deviation are closely related to the mean, and both can satisfy algebraic
operations, so they have a wide range of applications.
3.至於變異數或標準差之缺點則為易受極端值之影響,此點與平均數相同。
3.The disadvantage of variance or standard deviation is that it is susceptible to extreme values, which is
the same as the mean.
3.3 變異係數
• 全距、四分位差(距) 平均偏差及標準差均帶有與原資料相同的單位,稱為絕對差異量數(measures
of absolute dispersion)。 凡性質相同、單位相同,平均數相差不大的 二組統計資料,皆可用絕對差
異量數來比較。
• The full range, interquartile range (range) mean deviation and standard deviation all have the same
units as the original data, which are called measures of absolute dispersion. For two groups of statistical
data with the same nature, the same unit, and little difference in mean, the absolute difference can be
used to compare.
•但如果兩種或以上的性質不同,或單位不同,或平均數相 差很大的二組資料,則需使用相對差異量
數(measures of relative dispersion)來比較。
• However, if two or more types of data are different in nature, or in different units, or in two sets of
data with very different means, the measures of relative dispersion should be used for comparison.
● 相對差異量數,乃為絕對差異量數與某一集中量數或其他 適當數值之比,且通常以百分比表示。
變異係數 (coefficient of variation),通常以 CV 表示。
● The relative variance is the ratio of the absolute variance to a concentrated measure or other
appropriate value, and is usually expressed as a percentage. The coefficient of variation, usually
expressed as CV.
• 在下面兩種場合欲比較兩組或以上的資料之差異情況,可採用變異係數:
• The coefficient of variation can be used to compare the differences between two or more data sets in
the following two situations:
1. 單位不同的資料。
2.單位相同,但平均數相差很大的資料。
2.Data with the same units but with very different averages.
例題 3.15
Example 3.15
In 2012, the average GDP per person per year in country A was $7,800, the median was $8,900, and the
variance was 9,000,000; the average in country B was $16,000, the median was $15,800, and the
variance was 16,000,000. Which country had the largest gap between the rich and the poor in 2012?
解
Untie
In this example, although the GDP of the two countries is calculated in US dollars, because the averages
of the two countries are quite different, it is not appropriate to directly compare the variation or
standard deviation. The coefficient of variation should be calculated before making a comparison:
Z 分數(Z-scores)
•Z 分數是透過平均數與標準差的組合,可決定一組資料之 各觀察值的相對位置。
• Z-score is a combination of mean and standard deviation to determine the relative position of
observations in a set of data.
• Z 分數的定義如下:
Zi=第項觀測值的分數
X 或 u=樣本(或母體)平均數
S 或 a=樣本(或母體)標準差
◎ The Z-score is often referred to as the standardized value, which can indicate how many standard
deviations the distance between an observation X and X is.
• Ex. Z = 1.2: 1.2 standard deviations above the mean Zy=-0.5: 0.5 standard deviations below the mean
例題 3.16
Example 3.16
The average score of the 2016 TOEIC test in Taiwan is 537 points, assuming that the standard deviation
is 150, Mr. Zhang scored 720 points, what is his Z score?
解
Untie
張先生的 Z 分數為:
Z=720-537/2=1.22
以平均值為中心(0), 標準差為單位(格), 720 相對於 537 的位置。
With the mean as the center (0) and the standard deviation as the unit (div), the position of 720 relative
to 537.
Chebyshev 定理與經驗法則
● 當一組資料之標準差較小時,表示其各數值間的差異較 小,且大多數的數值集中在平均值附近。
● When the standard deviation of a group of data is small, it means that the difference between its
values is small, and most of the values are concentrated around the mean value.
想知道到底有多少個數值落於平均值附近某一區間內?
Want to know how many values fall within a certain interval around the mean?
• Chebyshev 定理
• Chebyshev’s Theorem
K 區間 落於該區間內觀測值的比例
此定理適用於任何的資料分配,且不管資料為母體或樣本皆可適用。
This theorem applies to any distribution of data, regardless of whether the data is a matrix or a sample.
不過,它只說明至少有多少比例落在此區間內,至於實際上 最多卻有多少比例,則不得而知。
However, it only states how many percentages fall within this range at least, but it is not known how
many percentages are actually at most.
例題 3.17
Example 3.17
茲從一批柳丁產品隨機抽出 30 顆測量其重量(公克),記錄如下:
30 pieces are randomly selected from a batch of Liuding products to measure their weight (grams), and
the record is as follows:
試利用 Chebyshev 定理求出,有多少比例的柳丁重量落在 (68,128)的區間內。
Try to use Chebyshev’s theorem to find out what proportion of the weight of the lily falls within the
interval of (68,128).
解
Untie
First, calculate the mean and standard deviation, which are Yuan=98, S≈15. Before using Chebyshev’s
theorem, the value of k needs to be known first, and the value of k can be obtained from this interval,
see the figure below:
Calculated = 2. Therefore, according to Chebyshev's theorem, at least 1-1/22=3/4 (75%) proportions fall
within this interval; that is, at least 3/4×30=22.5(pieces) observations fall within ( 68,128) range. In fact,
according to the actual data, 28 of the 30 observations fall within the interval (68,128), which can verify
the Chebyshev theorem.
• 經驗法則(empirical rule)
當資料分配呈鐘形形狀(bell-shaped)時,亦即為對稱分配,則:
When the data distribution is bell-shaped, that is, the distribution is symmetrical, then:
○ 約有 68%的觀測值落於(x-S,x+S)的區間內。
○About 95% of the observations fall within the interval (x-2S, x+2S).
○ 約有 99.7%的觀測值落於(x-3S,x+3S’)的區間內。
○ About 99.7% of the observations fall within the interval (x-3S, x+3S’).
◎ 經驗法則可較具體告訴我們約有多少比例”的訊息, 但它的限制是,只能適用於(近似)對稱分配的
資料。
◎ The rule of thumb can tell us more specifically about what proportions”, but its limitation is that it
can only be applied to (approximately) symmetrically distributed data.
3.4 偏態量數、峰態量數與動差
● 動差
● Momentum
動差(moment)用來表達資料數據全體特性的數學模式,是 一種利用冪次方做其轉換基礎的平均數,
通常
Moment is a mathematical model used to express the overall characteristics of data data, and it is an
average that uses power as the basis for its transformation.
1) 觀察值都會先減去一特定數,
2) 再取冪次方後,
2) After exponentiation,
3)計算全體算術平均數, 一般稱為以特定數為中心之動差。
3) Calculate the arithmetic mean of the whole, which is generally called the momentary difference
centered on a specific number.
Momentum, also known as moment, its concept comes from physics. In physics, momentum is a
physical quantity used to express the shape of an object.
動差是用於物體形狀識別的重要參數指標。
Momentum is an important parameter index for object shape recognition.
1. 原動差(特定數為 0)
(a)對稱分配
(b)左偏分配 (相對大的資料比較多)
©右偏分配 (相對小的資料比較多)
Figure 3.1 (a) The curve is symmetrical distribution, at this time the center position is the mean, the
median and the mode are located, the three are the same point, showing the situation that the three
points are unified (b) The curve is a left-biased distribution, this When the mean is the smallest, and the
median is between the mean and the mode (c) The curve is a right-biased distribution, and the mean is
the largest at this time, and presents a positional distribution opposite to the left-biased distribution.
Pearsonian coefficient of skewness In the skewness distribution, X to M. The distance is 3 times the
distance from X to M.
相對偏態:
Relative skewness:
(1)當 SK=0 時,
由於 X=M,表示資料的分配會近似對稱分配。
Since X=M, it means that the distribution of data will be approximately symmetrical.
(2)當 SK<0 時,
Since X<M, the distribution of the data will be approximately left-biased distribution
(3)當 SK>0 時,
(負偏態分布)
由於文> M,表示資料的分配會近似右偏分配。
Since Wen > M, it means that the distribution of data will be approximately right-skewed.
(正偏態分布)
依據動差偏態量數可知:
(1) When aq=0, it means that the distribution of data will be approximately symmetrical.
(2) When a<0, it means that the distribution of data will be approximately left-biased.
(3)When a>0, it means that the distribution of data will be approximately right-biased.
峰態量數(kurtosis)
峰態圖形定義
Kurtosis Graphic Definition
依據量數值可知:
1.當=3 時,表示資料分布呈常態峰,形成一般正常的型態。
1. When = 3, it means that the data distribution presents a normal peak, forming a general normal
pattern.
2.當>3 時,表示資料分布呈高狹峰,集中於平均數或眾數附近。
2. When >3, it means that the data distribution has a high and narrow peak, and is concentrated near
the mean or mode.
3.當<3 時,表示資料分布呈低闊峰,平均地分散於兩端。
3.When <3, it means that the data distribution has a low and broad peak, and is evenly dispersed at both
ends.