You are on page 1of 72

1.

1 何謂統計學

1.1 What is Statistics

統計(statistics)一詞乃源自拉丁字「status」,意指狀態(state)。

The word statistics is derived from the Latin word “status”, which means state.

state 的原義:表現與一個有關的經濟、人口與政治情勢 的資料與圖形。

The original meaning of state: the presentation of data and graphics related to a related economic,
demographic and political situation.

統計學是在不確定的情況下,提供人們能做出客觀決策的科學方法,其過程包括:對資料的蒐集、整
理、陳述,分析與解釋。 透過此一過程,進而根據分析的結果加以推論,從而可以獲得合理的研判
與有效的結論。
Statistics is a scientific method that provides people with the ability to make objective decisions under
uncertain circumstances. Through this process, and then according to the results of the analysis to
make inferences, so as to obtain reasonable judgments and effective conclusions.

統計的運用

The use of statistics

例如,氣象報告、交通意外事件統計、公共安全事件統 計、國民所得、物價指數等等,皆與人類活
動有密切的 關係。

For example, weather reports, traffic accident statistics, public security incident statistics, national
income, price index, etc., are closely related to human activities.

舉凡社會科學與自然科學的研究、政府機構的施政,以 及工商企業的經營管理上,經常會用到統計
的概念與分 析方法。

For example, in the research of social and natural sciences, the administration of government agencies,
and the operation and management of industrial and commercial enterprises, statistical concepts and
analytical methods are often used.

統計學能幫助人們在面對不確定的情況下做出決策。

Statistics can help people make decisions in the face of uncertainty.

其目的在於解決問題。

Its purpose is to solve the problem.

統計學的主要內容包括統計理論及統計方法。

The main content of statistics includes statistical theory and statistical methods.

統計理論是研究與闡明統計方法的理論,故而成為統計 方法賴以建立和發展的基礎。

Statistical theory is the theory of research and clarification of statistical methods, so it has become the
basis for the establishment and development of statistical methods.
統計學解決問題的步驟:

Steps to solve problems in statistics:

1. 蒐集與問題有關的資料

1. Collect data related to the problem

2. 運用統計方法輔以統計理論

2. Use statistical methods supplemented by statistical theory

3. 提供一套合理的解決方法。

3. Provide a set of reasonable solutions.

統計理論一般可分為:

Statistical theory can generally be divided into:

數理統計(mathematical statistics), 應用統計(applied statistics)。

Mathematical statistics, applied statistics.

○ 數理統計著重在以數學原理闡明統計方法的理論,證 明各種統計公式的來源;

○ Mathematical statistics focuses on clarifying the theory of statistical methods with mathematical
principles and proving the source of various statistical formulas;

◎ 應用統計則著重在如何將統計方法應用於各種科學研 究、企業經營,以及行政措施等,例如:生
物統計、經濟統計及政府統計等。
◎ Applied statistics focuses on how to apply statistical methods to various scientific research, business
operations, and administrative measures, such as biological statistics, economic statistics, and
government statistics.

統計方法則偏重於解決實際的問題,其步驟包括: 蒐集、整理、陳述、分析與解釋統計資料。

Statistical methods focus on solving practical problems, and the steps include: collecting, arranging,
presenting, analyzing and interpreting statistical data.

統計方法通常可根據分析的結果,進一步加以推論,以 推知全部研究對象的特性。

Statistical methods can usually be further inferred based on the results of the analysis to infer the
characteristics of all the research objects.

依據統計方法的幾個步驟,可將統計學的範圍區分為:

According to several steps of statistical methods, the scope of statistics can be divided into:

敘述統計(descriptive statistics)

推論統計(inferential statistics)
• 敘述統計乃包括統計方法中的蒐集、整理、陳述、分 析及解釋資料等步驟,亦即僅就統計資料本
身的特性 加以描述,並不將其意義推廣至更大範圍者。

• Descriptive statistics include the steps of collecting, arranging, presenting, analyzing and interpreting
data in statistical methods, that is, only describing the characteristics of the statistical data itself,
without extending its meaning to a wider range.

•根據敘述統計所分析的結果,進而推論某些事實現象 者,則屬於推論統計的範圍。

• Those who infer certain factual phenomena based on the analysis results of narrative statistics belong
to the scope of inferential statistics.

•推論統計是根據分析部分資料(樣本,sample)的結 果,對更大範圍資料(母體,population)的某些特
性,做一合理的推測與估計。

• Inferential statistics are based on the results of analyzing part of the data (sample, sample) to make a
reasonable guess and estimate for certain characteristics of a larger range of data (maternal,
population).

例題 1.1

Example 1.1

政府機構每年皆編有國民所得統計,根據所蒐集的資料加以整理後便可計算出每年的經濟成長,並
可做歷年的比較,由此解釋與分析經濟發展的情況;這是屬於敘述統計的 範圍。

Government agencies compile national income statistics every year, and after sorting out the collected
data, the annual economic growth can be calculated, and comparisons over the years can be made to
explain and analyze the economic development; this is a narrative statistics. Scope.

為了比較與分析上的便利,我們往往編製計圖、表,並計算某些統計量數(如平均數、變異數、比例
等),這些統計方法有助於資料的整理,並能迅速提供我們 藉以比較與分析的資訊;有關這方面的課
題乃第 2 章與第 3 章的主要內容。

For the convenience of comparison and analysis, we often compile charts, tables, and calculate certain
statistics (such as mean, variance, proportion, etc.). These statistical methods are helpful for data sorting
and can quickly provide us with Information for comparison and analysis; topics in this area are the
main content of Chapters 2 and 3.

例題 1.2

Example 1.2

在例題 1.1 中,如果根據過去多年的資料, 再參考一些經濟、 社會、政治以及世界經濟等有關的因


素,或可預估明年甚至未來幾年的國民所得成長率;此乃屬於推論統計的範圍 。

In Example 1.1, if based on the data of the past years and some economic, social, political and world
economy-related factors, it is possible to estimate the growth rate of national income in the next year or
even in the next few years; this belongs to the scope of inferential statistics .
由於未來各種現象存在著不確定性,因此欲掌握這些不確定 性,必須設法衡量其不確定程度;此時機
率理論成為主要的工具,這是第 4、5、6 章的主題。 本書第 7 章以後,開始介紹 推論統計的基礎與
統計方法,並將依據統計推論的程序,做有系統的討論。

Since there are uncertainties in future phenomena, to grasp these uncertainties, one must try to
measure their degree of uncertainty; at this time probability theory becomes the main tool, which is the
subject of Chapters 4, 5, and 6. After Chapter 7 of this book, the foundation and statistical methods of
inferential statistics will be introduced, and a systematic discussion will be made based on the procedure
of statistical inference.

1.2 統計學的兩個基本概念— 母體與樣本


1.2 Two basic concepts of statistics—matrix and sample

母體(population):調查者所欲研究的全部對象所成的集合。

Population: The collection of all the subjects that the investigator wants to study.

樣本(sample) :母體的部分集合。

Sample : A partial collection of the parent.

例題 1.3

Example 1.3

某一研究者想瞭解全國失業率的問題,此時母體即為全體國民;同理,若他僅對台北市的失業率感興
趣,則全體台北市民即成為其所欲研究的母體。

If a researcher wants to know about the national unemployment rate, the matrix is all citizens. Similarly,
if he is only interested in the unemployment rate in Taipei City, then all Taipei citizens will become the
matrix he wants to study.

由此可知,母體的範圍可大可小,完全視「所欲研究的對象」而定。 當實際進行資料蒐集後,一般皆
僅從母體抽取其中一部分來觀察(詳細說明請參見本書第 7 章有關抽樣的課 題),此乃母體所包含的
某特定個體之集合,即稱為樣本。

It can be seen from this that the scope of the matrix can be large or small, and it depends entirely on the
“object to be studied”. When the actual data collection is carried out, generally only a part of the
population is sampled for observation (for details, please refer to Chapter 7 of this book on the subject
of sampling).

統計學中來闡釋母體和樣本特性的摘要性數值稱為統計表 徵數或統計測量數(statistical
measurements)。

Summary values used in statistics to illustrate the characteristics of a population and a sample are called
statistical representations or statistical measurements.

用來描述母體的特徵量數稱為參數或母體參數(population parameter),一般而言,母體參數是統計學
想要知道之核 心。 如母體平均數,母體比例、母體標準差。
The characteristic quantity used to describe the population is called the parameter or population
parameter. Generally speaking, the population parameter is the core that statistics wants to know. Such
as maternal mean, maternal proportion, maternal standard deviation.

用來描述樣本的特徵量數稱為樣本統計量(sample statistic),是用來推論母體參數之主要特徵量值。
如樣本平均數、樣本比例、樣本標準誤。
The number of features used to describe the sample is called sample statistic, which is the main feature
value used to infer the maternal parameters. Such as sample mean, sample proportion, sample
standard error.

統計學與機率論聯繫緊密,統計學常以機率論為理論基礎。

Statistics is closely related to probability theory, and statistics is often based on probability theory.

簡單地講,兩者不同點在於機率論-從母(群)體中推匯出樣本的機率。

Simply put, the difference between the two lies in probability theory – the probability of deriving a
sample from the parent (group).

正好相反地,統計學– 從小的樣本中推論出大的母(群)體的參數。

On the contrary, statistics – inferring parameters of a large population (group) from a small sample.

參數(parameters):

☆1: 母體平均值(population mean)

G²:母體變異數(population variance)、

⑤:母體標準差(population standard deviation)

P:母體比例(population proportion)等。

統計量(statistics):

※:樣本平均值(sample mean)、

S²: 樣本變異數(sample variance)、

S: 樣本標準差(sample standard deviation)

P:樣本比例(sample proportion)等。
統計的功用
The Function of statistics

3.分析各種變項之間的關係:社會現象的變動往往受到許多因 素所影響,而這些因素彼此間可能具
有相互作用的關係, 其中可能存在某種因果關係或規律性。 如經濟成長的變 動,其影響因素有物
價水準,人口變動、政治與法律的因 素、貨幣供給量等。 統計方法的運用可幫助我們發現針對
一現象各種影響因素間的因果關係與規律性,進而加以分析與比較。
3.Analyze the relationship between various variables: The changes of social phenomena are often
affected by many factors, and these factors may have an interactive relationship with each other, and
there may be some causal relationship or regularity. Such as changes in economic growth, its
influencing factors include price levels, population changes, political and legal factors, money supply and
so on. The application of statistical methods can help us find the causal relationship and regularity
among various influencing factors for a phenomenon, and then analyze and compare.

4.預測:根據統計方法釐清相關變項間的關係,並進而運用統 計方法進行預測,藉以了解某一現象之
變動趨勢,做為籌 劃未來的依據。 (13~15 章)

4.Prediction: Clarify the relationship between relevant variables according to statistical methods, and
then use statistical methods to make predictions, so as to understand the changing trend of a
phenomenon and use it as a basis for future planning. (Chapters 13~15)
統計在經營決策中的應用
The application of statistics In business decision-making

舉凡一切能用數字表示(量化的社會現象與自然現 象,皆可利用統計學的原理與方法來研究,包括經
濟、政治、科學研究及工商企業等領域。
All social and natural phenomena that can be represented by numbers (quantified) can be studied using
the principles and methods of statistics, including economics, politics, scientific research, and industrial
and commercial enterprises.

1. 統計在經營決策中的應用範圍:
1. Scope of application of statistics in business decision-making:

Ex1.在生產製造活動上,運用統計方法執行抽樣檢 驗與品質管制,如此可達到以最低的成本或至最
高 的品質。
Ex1. In manufacturing activities, use statistical methods to perform sampling inspection and quality
control, so as to achieve the lowest cost or the highest quality.

Ex2.在企業人力資源管理上,可利用統計方法對員 工的訓練、績效的評鑑以及人力資源規劃等進
行統 計分析,以獲得有用的資訊,做為管理當局的決策 參考

Ex2. In enterprise human resource management, statistical methods can be used to conduct statistical
analysis on employee training, performance evaluation and human resource planning to obtain useful
information and serve as a reference for decision-making by management authorities.

2.1 資料的型態(TYPE OF DATA)


敘述統計最基本的目的在於彙總與描述一群統計資料的重要 特徵。 其方式不外乎利用統計圖表
或統計量數。
The most basic purpose of descriptive statistics is to summarize and describe the important
characteristics of a group of statistics. The way is nothing more than the use of statistical charts or
statistics.

例如,想知道班上同學統計學期中考成績的分布狀況。
For example, I want to know the distribution of the students’ statistical midterm test scores in the class.
統計資料通常是由一個或多個變數之值所組成的。
Statistics are usually composed of the values of one or more variables.

所謂變數(variable)乃是一種具有不同值或結果的特徵之衡 量,凡一切可計量的特徵皆稱為變數,而
變數的量(或值) 稱為變量(variate)。

The so-called variable is a measure of characteristics with different values or results. All measurable
characteristics are called variables, and the quantity (or value) of the variable is called a variable.

例如,年齡為一變數,其變量可能為 1~130;另外,性別、 所得、長度、物價等等,亦皆為變數。

For example, age is a variable, and its variable may be 1~130; in addition, gender, income, length, price,
etc., are also variables.

(2)量的資料(quantitative data)亦稱為屬量資料,是依據數 字尺度所衡量得出的資料。

(2) Quantitative data, also known as quantitative data, are data that are measured based on digital
scales.

例如,年齡、身高、體重、分數、溫度、速度、 意外事件次數等等,它們皆可用數字表達。

For example, age, height, weight, scores, temperature, speed, number of accidents, etc., can all be
expressed numerically.

量的資料一般又可區分為間斷資料(discrete data)與連續資料(continuous data)。 間斷資料是可計


數的(countable)且具有最小計數單位。

Quantitative data can generally be divided into discontinuous data and continuous data. Intermittent
data is countable and has a minimum count unit.

例如,統計某班人數、產品良品個數、車禍件數等。

For example, count the number of people in a certain class, the number of good products, the number
of car accidents, etc.

連續資料是由量測(measure)而得,這是不可計數的。

Continuous data is obtained by measurement (measure), which is not countable.

例如,身高、體重、時間等資料。 凡屬於物理基本單位所得的資料皆屬連續資料。For example,


height, weight, time and other information. All data belonging to the basic unit of physics are
continuous data.

試區分下列資料屬於質的資料或量的資料;若為量的資料 並指出其為連續或間斷資料:

Try to distinguish whether the following data are qualitative data or quantitative data; if quantitative
data, indicate whether it is continuous or discontinuous data:

(a)某公司員工之年齡。

(a) The age of a company employee.


(b)某月長途電話的通數。

(b) The number of long distance calls in a month.

(c)每通長途電話的時間。

(c)The time of each long distance call.

(d) 季節。

(d) Season.

(e)某公司員工之職位等級。

(e)The job rank of an employee of a company.



Untie

質的資料: (d)、(e)

Qualitative data: (d)、(e)

量的資料: (a)、(b)、(c);其中(b)為間斷資料,而(a)、(c)為連續資料。

Quantitative data: (a), (b), (c); where (b) is intermittent data, and (a), (c) are continuous data.

Note 1 :欲區分質的資料或量的資料,只要當別人問你某一變 數時,若直接以數字回答,則為量的資


料,否則便是質的華泰資料。

Note 1: To distinguish qualitative data or quantitative data, when someone asks you about a variable, if
you answer directly with numbers, it is quantitative data, otherwise it is qualitative Huatai data.

Note 2 :當質的資料經過量化之後,其代表的數字不能做某些統計計算,只能當類別的代號。

Note 2: After the qualitative data is quantified, the numbers it represents cannot be used for certain
statistical calculations, but can only be used as the code of the category.
連續資料
continuous data

量的資料 :

Amount of information:
質的資料 間斷資料
qualitative data intermittent data
統計資料
statistical data

2.2 衡量的尺度(MEASUREMENT SCALE)


根據資料的量測水準(measurement level)或衡量尺度(measurement scale)不同而分為四類:

According to the measurement level or measurement scale of the data, it is divided into four categories:

◎名目尺度(nominal scale)

◎ 順序尺度(ordinal scale)

◎ 等距尺度(interval scale)

◎ 比率尺度(ratio scale)

針對不同尺度的資料,所選用的統計方法會有所不同。

For data of different scales, the selected statistical methods will be different.

◎ 名目尺度(nominal scale):僅以簡單數字(代號)標示(代表)變數的屬性。

◎ nominal scale (nominal scale): the attributes of variables are only marked (represented) by simple
numbers (codes).

例如:性別的屬性分男性(代號 1)以及女性(代號 2)。 名目尺度中的數字只有名義而無實質內容,主


要用於分類。
For example: the attribute of gender is divided into male (code 1) and female (code 2). The numbers in
the nominal scale are only nominal but have no substantive content, and are mainly used for
classification.

◎ 順序尺度(ordinal scale):用於衡量具有名目尺度性質且其代號(數字)具有等級排列之特性的資
料。 例如:學生成績名次,消費者對品牌的偏好程度等。

◎ Ordinal scale: It is used to measure the data with the properties of nominal scale and its code
(number) has the characteristics of rank arrangement. For example: student grades, consumer
preference for brands, etc.

順序尺度的數值資料,亦僅能衡量數值間的順序或等級,不 能衡量其間之距離。

Numerical data on an ordinal scale can only measure the order or level between values, but not the
distance between them.

(無法就順序上分辨第一、二位之間和第二、三位之間的差距)

(Can’t tell the difference between the first and second place and the second and third place in order)

※※上述兩種尺度的(數值)資料,均不能用來做四則運算。 ※※

※※The (numerical) data of the above two scales cannot be used for four arithmetic operations. ※※

◎ 等距尺度(interval scale): 當資料具有順序尺度的性質且數 字與數字之間之差異表示距離的大


小。 但沒有絕對的原點 (零點)。

◎ Interval scale: When the data has the nature of an ordinal scale and the difference between the
numbers indicates the size of the distance. But there is no absolute origin (zero).
例如:溫度、年份等。 等距尺度的資料必定為數值,而且可作加減, ,但不可做乘 除。

For example: temperature, year, etc. The data on the equidistant scale must be numerical, and can be
added and subtracted, but not multiplied and divided.

例如:可考慮溫差、分數差距,但溫度或分數的比例並未 具有實質上的意義。

For example: temperature difference and fractional difference can be considered, but the ratio of
temperature or fraction does not have substantial meaning.

溫度熱到攝氏 30 度是攝氏 20 度的 1.5 倍。 此句話並無實質意 義,因為同樣的狀況用華氏來考慮


各所代表的數字將會不相 同,主要是因為沒有固定的存在之緣故(無絕對零點)。

A temperature of 30 degrees Celsius is 1.5 times that of 20 degrees Celsius. This sentence has no
substantial meaning, because the same situation is considered in Fahrenheit and the numbers
represented by each will be different, mainly because there is no fixed existence (no absolute zero).

• 比率尺度(ratio scale):

(1)資料具有等距尺度的所有性質。

(1) The data have all the properties of the equidistant scale.

(2)資料數值間的比值是有意義的。

(2)The ratio between the data values is meaningful.

比率尺度與等距尺度的區別在於,前者有絕對的原點(零點),而後者的原點是任意選定的。 比率尺
度的資料是可以做四則運算
The difference between ratio scales and equidistant scales is that the former has an absolute origin (zero
point), while the latter’s origin is arbitrarily chosen. The data of the ratio scale can do four arithmetic
operations

例如:長度、重量、銷售量、所得等等。 因為其距離或大小都 可以從一個絕對的零點算起。

For example: length, weight, sales, income, etc. Because its distance or size can be calculated from an
absolute zero point.

一部 100 萬元的汽車比 50 萬元的汽車貴 50 萬元,且其價格為 後者的 2 倍。

A 1 million yuan car is 500,000 yuan more expensive than a 500,000 yuan car, and its price is twice as
high.

數值與非數值資料和四種衡量尺度之間的關係

Relationships between numerical and non-numerical data and the four measurement scales

衡量尺度 非數值資料 數值資料

Scale Non-numerical data,Numerical data

名目 表示資料的類別 數值是用
來代表資料的類別,順序

Nomenclature Indicates the type of data, and the numerical value is used to represent the type and
order of the data

類別 表示資料的排序 數值可用

來代表資料的排序
Category Indicates the ordering of the data Values can be used to represent the ordering of the data
等距 — 數值之間的差異

代表距離大小,不具有絕對的原點

Equidistant — the difference between the values represents the magnitude of the distance and does not
have an absolute origin

比率 — 數值之間的差異

代表距離大小,且具有絕對的原點

Ratio — the difference between values Represents the distance and has an absolute origin

註:*可做加減運算;#可做加減乘除運算。

Note: * can do addition and subtraction; # can do addition, subtraction, multiplication and division.
分析名目尺度與順序尺度資料的主要統計方法為無母數統計學
The main statistical method for analyzing nominal scale and ordinal scale data is parentless statistics.

(nonparametric statistics)。
分析等距尺度與比率尺度資料的常用統計方法為母數統計學
A common statistical method for analyzing equidistant-scale and ratio-scale data is maternal statistics.

(parametric statistics)。

2.3 次數分配(FREQUENCY DISTRIBUTION)

◎ 統計資料經蒐集完成後,必須加以整理或彙總才能顯示 資料所具有的特徵。 最基本的整理或


彙總方法就是編制 成統計圖表,而製表或繪圖之前,一般皆先將其歸類分 組。 次數分配,乃此一工
作所進行的步驟。
◎ After the statistical data has been collected, it must be sorted or aggregated in order to display the
characteristics of the data. The most basic method of organizing or summarizing is to compile statistical
charts, and before tabulation or graphing, they are generally grouped into groups. Frequency allocation
is the step in this work.

◎ 次數分配(frequency distribution)是將資料依數量大小或 類別而分成若干組,並計算各組的資料


個數(次數), 由此可顯示資料分布的情況。 次數分配一般可用表或圖 來表示,分別稱為次數分配表
(table)或次數分配圖 (chart)。
◎ Frequency distribution is to divide the data into several groups according to the quantity or type, and
calculate the data number (frequency) of each group, so that the data distribution can be displayed. The
frequency allocation can generally be represented by a table or a graph, which is called a frequency
allocation table (table) or frequency allocation chart (chart), respectively.
次數分配表包括簡單次數表與分組次數表。
The frequency allocation table includes a simple frequency table and a grouping frequency table.

簡單次數表:將變量相同者(或屬於同一類別者)逐一歸併彙總 而得。 適用於質的資料與規模不大


的間斷資料。
Simple frequency table: It is obtained by merging and summarizing the variables with the same variable
(or belonging to the same category) one by one. It is suitable for qualitative data and small-scale
intermittent data.

分組次數表:將全部變量依其大小次序,分為若干段落(分 (組),以每一段落為一組(class)所編製的
表。適用於連續資 料與規模較大的間斷資料。
Grouping times table: A table prepared by dividing all variables into several paragraphs (groups) in the
order of their size, and taking each paragraph as a group (class). It is suitable for continuous data and
large-scale intermittent data.

以分組次數表所表示之統計資料,稱為分組資料(grouped data) 雖然分組會喪失資料原有的細節,但


仍可助了解原始資料的規律性及其分布情況,具有彙整與簡化原始資料的 功用。

Statistical data represented by a table of grouping times is called grouped data. Although grouping will
lose the original details of the data, it can still help understand the regularity and distribution of the
original data, and has the ability to aggregate and simplify the original data. function.
次數分配表的編製
Preparation of frequency allocation table

1.分組次數表之編製步驟

1. Compilation steps of the grouping times table

(1)步驟 1:排列-將各觀測值依大小順序排成一序列。

(1) Step 1: Arrangement – Arrange the observations into a sequence in order of magnitude.

(2)步驟 2:求全距(range)R=最大值最小值

(2) Step 2: Find the full range (range) R = maximum and minimum

(3)步驟 3:決定組數與決定組距(class interval)

(3)Step 3: Determine the number of groups and determine the class interval

D= R/K(組距=全距/組數)

D= R/K (group distance = full distance/group number)


組距的有效位數=資料數值的有效位數

The effective digits of the group distance = the effective digits of the data value

N:50~99. K: 5-10

N:100~249 k:7~12

n>249 K: 10-20

(4)步驟 4:決定組限定組上限/組下限

(4) Step 4: Determine the group upper limit/group lower limit

(Upper class limit / Lower class limit)

D=U-L+最小計算單位

D=U-L+minimum calculation unit

假設觀察值為 2.30.3.12.2.47…,則最小計數單位為 0.01。 若取 Z=2.10, d=0.50 則 0.50=U-2.10+0.01,


即 U=2.59。 那下一組應為 2.60-3.09,依此類推。

Assuming the observed value is 2.30.3.12.2.47…, the minimum count unit is 0.01. If Z=2.10, d=0.50,
then 0.50=U-2.10+0.01, namely U=2.59. That next set should be 2.60-3.09, and so on.

(5)步驟 5:歸類與劃記卅或正

(5) Step 5: Classification and marking 30 or positive

(6)步驟 6:計算次數-計算各組次數稱為組次數(class frequency) 登載於表之最後一行,並加總總次


數。
(6) Step 6: Calculate the frequency – Calculate the frequency of each group called the group frequency
(class frequency) and post it in the last row of the table, and add up the total frequency.

例題 2.3

Example 2.3

假定某一班級 48 個學生的統計學學期成績如下,試編製一次數分配表。

Assuming that the statistic semester grades of 48 students in a certain class are as follows, try to
compile a number distribution table.

untie

(a)排列:依數值大小重新排列,如下所示:

(a) Arrange: Rearrange according to the size of the value, as follows:

(b)求全距:最大值為 93,最小值為 28,故全距

(b) Find the total distance: the maximum value is 93 and the minimum value is 28, so the total distance

R=93-28=65

(c)定組數及組距:由於總觀測值個數=48,故取組數=5。 於是組距 D=R/k = 65/5 = 13;為便利分組,組


距往往會取 5 或 10 的倍數。 就本例而言,取 d=15(需大於 13)較為方便。

© Number of groups and distance between groups: Since the total number of observations = 48, the
number of groups = 5. Therefore, the group distance D=R/k = 65/5 = 13; in order to facilitate grouping,
the group distance is often a multiple of 5 or 10. For this example, it is more convenient to take d=15
(which needs to be greater than 13).

(f)計算次數:將劃記欄中各組的記號計數並記載於次數欄 中,然後將各組之次數加總,可得總次數
=48,與原始資 料總人數相符合。

(f) Counting times: Count the marks of each group in the marking column and record it in the frequency
column, then add up the times of each group to get the total number of times = 48, which is consistent
with the total number of people in the original data.
表 2.3

Table 2.3

48 位學生統計學成績之次數分配

Frequency distribution of statistical scores of 48 students

2. 編製次數分配表之原則

2. Principles of preparation of frequency allocation table

(1)關於組數:因旨在顯示一群資料的集中情形與分佈狀態,且為 便於計算與運用。 故組數不宜太


多或太少。
(1) About the number of groups: because it aims to show the concentration and distribution of a group
of data, and for the convenience of calculation and application. Therefore, the number of groups should
not be too many or too few.

(Sturge’s rule)

K = 1 + 3.32log10n

(2)關於組距:

(2) About the group distance:


一般皆採用相等的組距。
Generally, equal group spacing is used.

優點:

Advantage:
1.便於比較各組的次數

1. Easy to compare the number of times between groups

2. 便於使用

2. Ease of use

3.便於計算各種統計量數

3. It is convenient to calculate various statistics

在某些特殊的分配上,基於事實需要,有時可採用不等組距。 例如:某地區的人口依其財富而編製次
數分配表,

In some special assignments, based on actual needs, sometimes unequal group spacing can be used. For
example: the population of a certain area is compiled according to its wealth and the frequency
distribution table,

若以千元為計數單位,可採用

If the unit of calculation is in thousands, you can use

0-50 51-100 101-500… 100,001-500,000

(3)關於組限:

(3) About the group limit:

(a)組限的決定,要求其有效數字位數必須與觀察值之有效位數相同。

(a) The determination of the group limit requires that the number of significant digits must be the same
as the number of significant digits of the observed value.

(b) 分組的結果必須涵蓋所有觀察值。

(b) The results of the grouping must cover all observations.

© 儘可能地避免採用敞開組(open class),如此才便利。 某些統計量數的計算。 敞開組的例子:37


以下或 83 以上。

© It is convenient to avoid using open classes as much as possible. Calculation of certain statistics.
Examples of open groups: below 37 or above 83.

可利用電腦統計軟體(如 excel)來試驗各種不同的組數、組距與 組限等之選擇值,以選出最符合個


人研究目的的次數分配表。
Computer statistical software (such as excel) can be used to test various selection values of the number
of groups, group distance and group limit, etc., in order to select the frequency distribution table that
best suits the purpose of personal research.
相對次數分配與累積次數分配
Relative frequency allocation and cumulative frequency allocation
相對次數分配
Relative frequency allocation

•相對次數(relative frequency)是指將各組次數除以總次數。 將各組之相對次數依次列出的相對次


數表,此表 稱為相對次數分配表(relative frequency distribution table)。 其功用在於顯示各組次數對
總次數所佔的比 重,並能用以表示兩個或更多個次數分配同組間之合 理的比較。

• The relative frequency (relative frequency) refers to dividing the number of each group by the total
number of times. The relative frequency table in which the relative frequency of each group is listed in
turn is called the relative frequency distribution table. Its function is to display the proportion of each
set of reps to the total number of reps, and can be used to represent a reasonable comparison between
two or more reps assigned to the same group.

第 1 組相對次數=Fi/n = Fi/sigma Fi.

Group 1 relative times = Fi/n = Fi/sigma Fi.

,其中 Fi 為第組的組次數。

, where Fi is the number of groups in the first group.


累積次數分配
Cumulative times allocation

• 累積次數(cumulative frequency)是指將各組次數依次累 加;列出累加次數的表即稱為累積次數分


配表 (cumulative frequency distribution table)。 在許多情 況,我們所感興趣的可能不是落在某一組
內的觀察值個 數,而是落在某一特定值之上或之下的觀察值之個數。

• Cumulative frequency refers to accumulating the frequency of each group in turn; the table listing the
cumulative frequency is called the cumulative frequency distribution table. In many cases, we may not
be interested in the number of observations that fall within a group, but the number of observations
that fall above or below a particular value.

以下累積:是由數值较小的組之組次數,依序累加至數值較大的組之次數而得(取每組的組上限以
下);

The following accumulation: Is obtained by accumulating the group times of the group with a smaller
value to the times of the group with a larger value in order (take the group upper limit of each group
below);

以上累積:是由數值較大的組之組次數 ,依序累加至數值較小的組之組次數而得(取每組的組下限以
上)。

The above accumulation: the group times of the group with a larger value , and sequentially
accumulated to the group times of the group with the smaller value (take the group lower limit of each
group above).

表 2.4 表 23 之相對次數分配與累積次數分配

Table 2.4 Relative frequency allocation and cumulative frequency allocation in Table 23
由表 2.4 可知,48 名學生成績落在 3~67 分之區間的比重最大·約佔 1/3,根據以上累積次數知,統計學
成績 68 分以上的學生人數有 17 人,而由以下累積 次數知,統計學成績 67 分以下的學生有 31 人。

It can be seen from Table 2.4 that the 48 students whose scores fall in the range of 3 to 67 points have
the largest proportion, accounting for about 1/3. According to the above accumulated times, there are
17 students with statistical scores above 68 points, and the following cumulative According to the
number of times, there are 31 students with statistical scores below 67.

2.4. 次數分配的圖示法

2.4. Graphical method of frequency distribution

次數分配圖是一種統計圖,用以繪示次數分配表中分配的情 況。 故欲繪製次數分配圖之前,必須
先編好次數分配表。
The frequency distribution chart is a statistical graph used to show the distribution in the frequency
distribution table. Therefore, before drawing the frequency distribution diagram, the frequency
distribution table must be compiled first.

次數分配圖的功能在於:使看圖的人僅花費少許時間,便可 對統計資料得到一明確的綜合性觀念。
亦具有比較的功能。
The function of the frequency distribution diagram is to enable the viewer to get a clear and
comprehensive idea of the statistical data with only a small amount of time. It also has a comparison
function.

常見的次數分配圖:

Common frequency distribution diagrams:

•用於連續型的統計資料,有直方圖、多邊形圖、累積次數分配圖等;

•Used for continuous statistical data, such as histograms, polygonal graphs, cumulative frequency
distribution graphs, etc.;

•用於問斷型的統計資料,有長條圖、面積圖等。
•Used for the statistical data of the broken type, there are bar graphs, area graphs, etc.

(1)直方圖

(1) Histogram

直方圖(histogram)是以長方形面積大小,表示次數分配中 各組次數的一種統計圖*。 直方圖適用


於連續型資料。 橫坐標代表各組的組界(class boundary)縱坐標代表組次數。

A histogram Is a kind of statistical graph* that represents the number of times of each group in the
distribution of times by the size of the rectangular area. Histograms are suitable for continuous data.
The abscissa represents the class boundary of each group, and the ordinate represents the number of
groups.

下组界=組下限-1/2(觀察值最小單位)

Lower group bound = lower group limit – ½ (minimum unit of observation)

上組界=組上限+1/2(觀察值最小單位)

Upper group limit = group upper limit + ½ (minimum unit of observation value)

例題 2.5

For example 2.5

參考表 2.3 試繪製 48 位學生 統計學成績之直方圖。 解

refer to Table 2.3 to try to draw a histogram of the statistical results of 48 students. Untie

首先就表 2.3 中的組限計算其相對應的組界,如表 2.5 所示由於觀測值之最小單位為 1(分),故第 1 组


的下界等於 23 - 1/2(1) = 22.5 ,而第 1 組的上組界等於 37+1/2 (1) = 37.5;其他各組組界,依此類推。
表 2.5 之直方圖,如圖 2.3 所示。 由圖中可看出各組界皆已呈連續。 此外,成績介於 52.5~67.5 分者
最多。
First, calculate the corresponding group boundaries for the group limits in Table 2.3, as shown in Table
2.5. Since the minimum unit of the observed value is 1 (minute), the lower bound of the first group is
equal to 23 – ½(1) = 22.5 , while the upper group boundary of group 1 is equal to 37+1/2(1) = 37.5; the
other group boundaries, and so on. The histogram of Table 2.5 is shown in Figure 2.3. It can be seen
from the figure that the groups are all continuous. In addition, the scores between 52.5 and 67.5 were
the most.

直方圖的縱座標若改為相對次數(?),則稱為相對次數分配圖(或相對次數直方圖),此時需特別注意的
是,每一矩形(直方)的面積代表各組的相對 次數,因此縱座標應改為相對次數除以組距,即每一矩形
之高度應為:

If the vertical coordinate of the histogram is changed to the relative degree (?), it is called the relative
degree distribution diagram (or the relative degree histogram). At this time, it should be noted that the
area of each rectangle (histogram) represents the relative degree of each group. The number of times,
so the ordinate should be changed to the relative times divided by the group distance, that is, the height
of each rectangle should be:

相對次數直方圖的矩型高度 =

Rectangular height of relative degree histogram =

相對次數/上組界-下組界

Relative reps/upper group bound-lower group bound

如此調整可使所有矩形面積總和為 1,亦即相對次數的總和為 1. 通常用在比較各組佔總數的比率


(即各長條面積代表各組發生之機率)

This adjustment can make the sum of all rectangular areas to 1, that is, the sum of relative times is 1.
Usually used to compare the ratio of each group to the total (that is, the area of each strip represents
the probability of occurrence of each group)
圖 2.3(a)48 位學生統計學成績之相對次數方面

Figure 2.3(a) The relative number of statistical scores of 48 students

直方圖與長條圖的區別在於,

The difference between a histogram and a bar chart is that,

直方圖是用面積來表示該組的相對次數,而非以高度來表示該 組的數量。 直方圖由一組矩形組


成,每一個矩形的面積表示在 相應的區間中樣本百分數。

The histogram uses the area to represent the relative frequency of the group, not the height to
represent the number of the group. The histogram consists of a set of rectangles, and the area of each
rectangle represents the percentage of samples in the corresponding interval.

每個矩形的高度表示樣本密度,即區間中樣本百分數除以該區 間長度(或稱矩形寬度)。 其面積為


百分數,總面積為 100%。 直方圖下兩個數值之間的面積給出了落在那個區間內樣本百分 數。

The height of each rectangle represents the sample density, which is the percentage of samples in the
interval divided by the length of the interval (or the width of the rectangle). Its area is a percentage, and
the total area is 100%. The area between the two values under the histogram gives the percentage of
samples that fall within that interval.

但有些說法用次數(相對次數)當每組的高度,其直方圖涵義僅 顯示組間變化的趨勢。

However, some statements use times (relative times) as the height of each group, and its histogram
meaning only shows the trend of changes between groups.

(2)多邊形圖

(2) Polygon map


多邊形圖(polygon)可由直方圖繪出,是以各組組中點(class midpoint)為橫坐標,縱坐標仍為組次數。

Polygons can be drawn by histograms, with the class midpoint of each group as the abscissa and the
ordinate as the number of groups.

組中點=組下限+组上限/2=下組界+上組界 /2

Group midpoint = group lower limit + group upper limit/2 = lower group boundary + upper group
boundary/2

先將直方圖的第一組與最後一組各再延伸一組,此二組的組次數均設為 0,再將各矩形上端的組中點
連接起來成為一密開曲線,即可構成多邊形 圖。

First, extend the first group and the last group of the histogram by another group, set the group times of
these two groups to 0, and then connect the middle points of the groups at the upper end of each
rectangle to form a dense curve to form a polygon picture.

48 位學生統計學成績之次數分配

Frequency distribution of statistical scores of 48 students

表 2.5,48 位學生統計學成績之次數分配

Table 2.5, the frequency distribution of statistical scores of 48 students

假想組(hypothetical group)

假想組(hypothetical group)
當資料數量多,其相對次數直方圖的組數多且組距非常小時,所得之相對次數 多邊形圖可視為次數
曲線(frequency curve),通常可用來和連續型的機率分配圖做對比比較。 (例如:常態分配)

When the number of data is large, the relative frequency histogram has a large number of groups and
the group distance is very small, the obtained relative frequency polygonal graph can be regarded as a
frequency curve, which can usually be used to compare and compare with the continuous probability
distribution diagram. (eg: normal distribution)

(3)累積次數分配圖(肩型圖)

(3) Cumulative times distribution chart (shoulder chart)

根據累積次數所繪製的圖(包含以下累積與以上累積),即為 累積次數分配圖(cummulative frequency


distribution chart)或肩型圖(ogive plot)。

The graph drawn according to the cumulative frequency (including the following accumulation and the
above accumulation) is the cumulative frequency distribution chart or the ogive plot.

其橫座標仍類似直方圖中的各個組界:

Its abscissa still resembles the group boundaries in the histogram:


以下累積次數分配圖是以各組的上組界為橫座標。
The following cumulative frequency distribution chart takes the upper group boundary of each group as
the abscissa.

以上累積次數分配圖是以各组的下界為橫座標。

The above cumulative frequency distribution chart takes the lower bound of each group as the abscissa.
二者均以累積次數為縱座標。
Both take the cumulative number of times as the ordinate.
累積次數分配圖
Cumulative times distribution chart
值得注意的是,圖 2.6 中的二段曲線之相交點,其縱座標(累積次數)為 24,而橫座標為 60.94 (中位數
M),此一交點所代表的意義是,成績高於 60.94 分者約有 24 人,而成績低於 60.94 分者亦有 24 人左
右,皆有一半(50%)的人數,此即中位數的觀念,參見第 3 章的介紹。

It is worth noting that the intersection point of the two-segment curve in Figure 2.6, its ordinate
(cumulative times) is 24, and the abscissa is 60.94 (median M), the meaning of this intersection is that
the score is higher than There are about 24 people with a score of 60.94, and about 24 people with a
score lower than 60.94, all of which are half (50%) of the population. This is the concept of median, see
the introduction in Chapter 3.
•長條圖(bar chart),是一種以長方形的長度為變量的統計圖表。 有 長條圖用來比較兩個或以上的
價值(不同時間或者不同條件) 只有一個變量,通常利用於較小的數據集分析。 長條圖亦可橫向 排
列,或用多維方式表達。 類似的圖形表達為直方圖,不過後者 較長條圖而言更複雜

• A bar chart is a statistical chart that uses the length of a rectangle as a variable. There are bar charts
used to compare two or more values (at different times or under different conditions) with only one
variable, and are usually utilized for analysis of smaller data sets. Bar charts can also be arranged
horizontally, or expressed in a multi-dimensional manner. A similar graphical representation is a
histogram, although the latter is more complex for longer bars

•繪製長條圖時,長條柱或柱 組中線須對齊項目刻度。 在 數字大且接近時,可使用 波浪形省略符


號, 以擴大表現數據間的差距。 增強理解和清晰度。

• When drawing a bar graph, the center line of the bar or bar group must be aligned with the item
scale. When the numbers are large and close, the tilde ellipsis can be used to widen the gap between
the performance data. Enhance understanding and clarity.
直方圖和長條圖之差別
Difference Between Histogram and Bar Chart

○通常長條圖用在間斷型資料(discrete data),而直方圖用在 連續型資料(continuous data)

○ Usually bar charts are used for discrete data, while histograms are used for continuous data

○直方圖的橫軸代表組界,其矩形的寬度表示組距的大小;至於長條圖並無組距和組界的觀念,其矩形
的寬度亦不具任何意義。 (僅為了觀察方便及圖表的美觀。)

○The horizontal axis of the histogram represents the group boundary, and the width of the rectangle
represents the size of the group interval; as for the bar graph, there is no concept of group interval and
group boundary, and the width of the rectangle does not have any meaning. (Only for the convenience
of observation and the beauty of the chart.)

○ 就外觀而言,直方圖的每一矩形(直方)是併排列著,而長 條圖的每一矩形(長條)之間可存在間隙
(gap)

○ In terms of appearance, each rectangle (bar) of the histogram is arranged side by side, and there may
be a gap (gap) between each rectangle (bar) of the bar graph
面積圖
Area chart

圓形圖:以平面圖形面積的大小,表示次數分配之次數或相對次數。 (pie chart)

Circular chart: The number of times or relative times of frequency distribution is represented by the size
of the area of the plane figure. (pie chart)
2.5 結語

2.5 Conclusion

圖示法:直方圖、多邊形園、看形圖,枝葉圖”

Graphical method: histogram, polygon garden, shape chart, branch and leaf chart”
連續型
Continuous type
量的資料
Amount of information

列表法:次數分配、相對次數分配,累積次數分配

List method: frequency allocation, relative frequency allocation, cumulative frequency allocation

圖示法:長條更,圓形圖、枝葉圖

Graphic method: long bar, circular diagram, branch and leaf diagram
問斷型
Interrogative

列表法:次數分配、相對大數分配,累積次數分配

List method: frequency allocation, relatively large number allocation, cumulative frequency allocation
資料
Material
圖示法:長條圖,圓形圖
Graphical method: bar graph, circle graph
質的資料
Qualitative information

列表法:次數分配,相對次數分配 List method: frequency allocation, relative frequency


allocation

CHAPTER 3 敘述統計(II)——統計量數

CHAPTER 3 Descriptive Statistics (II) – Number of Statistics


本章網要
This chapter’s website

•3.1 集中趨勢量數

• 3.1 Number of Central Trends


•3.2 差異量數

•3.2 Number of Differences

•3.3 平均數與標準差的應用

•3.3 Application of mean and standard deviation

•3.4 偏態量數、峰態量數與動美

•3.4 Skewed quantity, kurtosis quantity and dynamic beauty

•3.5 分組資料之各種統計量數的計算

•3.5 Calculation of various statistics of grouped data

•3.6 探索性資料分析

•3.6 Exploratory Data Analysis

•3.7 結語

• 3.7 Conclusion

•雖然統計圖表有助於了解一組資料的特性,但為了獲 得具體的描述與比較分析資料,需進一步求出
資料之 位置或集中的量數,以及資料的差異量數。 這些量數 稱為統計量數,它是代表一組資料之
特質的具體數 字。
•Although statistical charts help to understand the characteristics of a set of data, in order to obtain
specific description and comparative analysis data, it is necessary to further obtain the position or
concentration of the data, as well as the difference of the data. These quantities are called statistics,
which are specific numbers that represent the characteristics of a set of data.

•敘述統計資料之特性的統計量數,主要如下:

• Statistics describing the characteristics of statistical data, mainly as follows:

◇集中趨勢量數(measure of central tendency)*

◇差異量數(measure of variability)*

◇偏態量數(coefficient of skewness)

◇峰態量數(coefficient of kurtosis)

3.1 集中趨勢量數(measure of central tendency)

○集中趨勢量數亦簡稱為集中量數,代表一組資料中,各觀察值某種特性有共同趨勢存在之量數。

○The measure of central tendency, also referred to as the measure of concentration, represents the
measure of the existence of a common trend in a certain characteristic of each observation in a set of
data.

○ 因其可反映該組資料觀察值的位置的量數,又稱為位置量數(location measure),較常用的有下列數:
○ It is also called location measure because it can reflect the number of locations of the observed values
of this set of data. The more commonly used numbers are the following:

◇集中趨勢量數(measure of central tendency): 平均數(mean)、中位數(median)、眾數(mode)。

◇ measure of central tendency (measure of central tendency): mean (mean), median (median), mode
(mode).

◇相對位置量數(relative location measure): 百分位數(percentile)、四分位數(quartile)、 十分位數


(decile)。

◇ relative location measure (relative location measure): percentile (percentile), quartile (quartile), decile
(decile).

平均數(mean)

Mean (mean)

○ 平均數(mean)可說是最重要的集中量數,它可做為一組資料的代表值。 一般而言,平均數具有簡
化作用,代表作用 及比較作用等幾項功用。

○ The mean is arguably the most important concentration measure, and it can be used as a
representative value for a set of data. In general, the mean has several functions, such as simplification,
representation and comparison.

●簡化作用:平均數能以一個簡約的數字,概不一組責 料分配的特徵。

● Simplification effect: The average can be used as a simplified number, which is not characteristic of the
distribution of a set of materials.

●代表作用:平均數能代表一組資料的平均水準,此乃 因為平均數是一組資料的中心数值,故以它來
代表整個資料分配最為恰當。
●Representative role: The average can represent the average level of a group of data, because the
average is the central value of a group of data, so it is most appropriate to represent the entire data
distribution.

● 比較作用:平均數簡化所有的數值後,以該數代表整 組資料的平均水準,如此可便於兩個或兩個以
上的資料分配間做比較。
● Comparison function: After simplifying all the values, the average represents the average level of the
entire group of data, which can facilitate the comparison between two or more data distributions.

例如,以班級的平均分數來做成績好壞的參考。

For example, use the average grade of the class as a reference for grades.

• 幾何平均數(geometric mean):幾何平均數適用於平均改 變率、平均成長率、平均比率或是對數


分配等之資料的平 均之求算。 常見的幾何平均數有平均經濟成長率、物價等 具有基期之資料。
• Geometric mean: The geometric mean is suitable for averaging data such as average rate of change,
average growth rate, average ratio, or logarithmic distribution. Common geometric averages include
the average economic growth rate, prices and other data with a base period.

• 調和平均數(harmonic mean):若資料呈現調和級數(責 料的倒數為等差級數)時,適用調和平均數來


計。 在實際 的應用中,如物價固定下的平均物價、距離固定下之平均 時速等資料皆適合使用。

• Harmonic mean: If the data presents a harmonic series (the reciprocal of the data is an arithmetic
series), the harmonic mean should be used for calculation. In practical applications, data such as the
average price under fixed prices and the average speed per hour under fixed distances are all suitable
for use.

● 平均數(mean)的意義

● Meaning of mean (mean)

平均數是集中量數中,最簡單、最重要且最常採用的量數。
The mean is the simplest, most important, and most commonly used measure among the concentration
measures.

設一組資料有個數值,X1,X2,…,則其平均數為:

Suppose a group of data has a value, X1, X2,…, then its average is:

NOTE 調和平均數永遠小於幾何平均數,而幾何平均數又 小於算術平均數。

NOTE The harmonic mean is always less than the geometric mean, which in turn is less than the
arithmetic mean.

例題 3.1

Example 3.1

某班甲、乙兩組學生·甲組 5 人、乙組 4 人、某次統計學測驗成績如下:

A certain class A and B groups of students, 5 people in group A and 4 people in group B, the results of a
certain statistical test are as follows:

甲組:89,72,55, 68, 78

Group A: 89, 72, 55, 68, 78

乙組:88,63,76,69

Group B: 88, 63, 76, 69

該次測驗結果,二組成績優?

The result of this test, the two groups are excellent?



Solution
計算結果,甲組平均成績為 72.4 比乙組平均成績 74 為低所以此次測驗成績乙組較優。

The calculation result shows that the average score of Group A is 72.4, which is lower than that of Group
B, which is 74, so the score of Group B is better in this test.
平均數具有一些重要的特性
Averages have some important properties

1)任一組資料中,各觀測值與其平均數之差(稱為離差(deviation))的代數和為 0,亦即:

1) In any set of data, the algebraic sum of the difference between each observation and its mean (called
deviation) is 0, that is:

可視為資料的重心。
Can be regarded as the center of gravity of the data.

(2)任一組資料中,各觀測值與其平均數之差的平方和·較各 觀測值與平均數以外的任何數值之差的
平方和為小亦 即:

(2) In any group of data, the sum of squares of the difference between each observation and its mean is
smaller than the sum of squares of the difference between each observation and any value other than
the mean, that is:
3)若有 k 組資料,其項數與平均數分別為(n1,x1),(n2,x2),…,(Nk,Xk)若將組資料合併成一組,其項數變為
n,而總平均數為 x,則:

(加權平均數)

(weighted average)

例題 3.2

Example 3.2

設有三個班級甲、乙、丙、其學生人數分別為 504852 人。 某次統計學考試,此三個班級的平均成


績分別為 80 76、85、試求出此三個班級統計學之總平均成績。

There are three classes A, B and C with 504,852 students. In a statistics exam, the average scores of the
three classes are 80, 76, 85, respectively, and the total average score of the three classes in statistics is
obtained.

Solution

依題意知:
According to the meaning of the question:

根據(3-4)式,總平均成績為:

According to formula (3-4), the total average score is:

例題 3.3

Example 3.3

已知大里高中女子拔河隊原 15 位選手的平均體重為 53.8 公 斤,因其中 5 位同學畢業,其平均體重


52 公斤,試問目前 該校女子拔河隊 10 位同學之平均體重為何?

It is known that the average weight of the original 15 players in the women’s tug-of-war team of Dali
High School is 53.8 catties. Since 5 of them graduated with an average weight of 52 kg, what is the
average weight of the current 10 students of the school’s women’s tug-of-war team?

Untie

令(N1,X1)與(N1,X1)分別代表 10 位選手與 5 位畢業選手之平 均體重,則由(3-4)式可得:

Let (N1,X1) and (N2,X2) represent the average weights of 10 contestants and 5 graduate contestants
respectively, then the formula (3-4) can be obtained:
由此解得 X1=54.7;故目前大里高中 10 位女子拔河隊選手 的平均體重為 54.7 公斤。

From this solution, X1=54.7; therefore, the average weight of the 10 women's tug-of-war players in Dali
High School is 54.7 kg.

以平均數做為一組資料之集中量數,有其優缺點:

There are advantages and disadvantages of using the mean as a concentrated measure of a set of data:

• 優點

• Advantages

① 平均數的代表性易為人接受。

① The representativeness of the mean is easily accepted.

② 計算平均數時,該組資料內的所有数值皆列入計 算。

② When calculating the average, all the values in this group of data are included in the calculation.

③ 可用代数方法處理,頗適合數學的應用。

③ It can be dealt with by algebraic method, which is quite suitable for the application of mathematics.

• 缺點

• shortcoming

•若存在幾個特別大或特別小的數值(稱為極端值 (extreme value)),則很容易受其影響而削弱平均


數的代表性。
•If there are several very large or very small values (called extreme values), it is easy to be affected by
them and weaken the representativeness of the average.

Ex. A 班五位同學的成績為 60,61, 62, 63, 64, B 班五位同學的成績為 38,40, 45, 92, 95,

Ex. The grades of five students in class A are 60, 61, 62, 63, 64, and the grades of five students in class B
are 38, 40, 45, 92, 95,

雖然兩組同學的平均成績均為 62. 但 A 班同學成績比較平均, B 班同學成績落差卻很大。

Although the average grades of the two groups of students are both 62, the grades of class A students
are relatively average, and the grades of class B students are quite different.

○ 中位數(median)的意義。 一組按大小順序排列的資料 x1, x2,…, Xn,其中位數為位於中間位置的數


值,亦即:

○ The meaning of the median. A set of data x1, x2,…, Xn arranged in order of magnitude, the median is
the value in the middle position, that is:

當 n 為奇數時,第 n+1/2 位置的數值為其中位數:


When n is odd, the value in the n+1/2 position is the median:

當為偶數時,第 n/2 和 n/2+1 位置之二數值的平均為其 中位數。 一般以 M,表示。

When it is an even number, the mean of the values in then/2 and n/2 +1 positions is the median.
Generally represented by M.

當一組資料中存在過多極端值時,其平均數的代表性變 得比較差,此時採用中位數為較佳的量。

When there are too many extreme values in a set of data, the representativeness of the mean becomes
poor, and the median is the better value.

Ex. A 班五位同學的成績為 60,61,6263, 64, B 班五位同學的成績為 38,40,45, 92,95,

Ex. The grades of five students in class A are 60, 61, 6263, 64, and the grades of five students in class B
are 38, 40, 45, 92, 95,

雖然兩組同學的平均成績均為 62. 但 A 班同學成績比較平均, B 班同學成績落差卻很大。

Although the average grades of the two groups of students are both 62, the grades of class A students
are relatively average, and the grades of class B students are quite different.

求下列二組資料之中位數:

Find the median of the following two sets of data:

I:13, 20,8,15,7

II: 5,10,19, 23,11,15



Solution

先將資料按大小順序排列,然後找中間位置:

First arrange the data in order of size, and then find the middle position:

I: 13, 20, 8, 15, 7

II: 5,10,19, 23,11,15

二數平均 =11+15/2 = 13

Binary Average = 11+15/2 = 13

I、II 兩組資料之中位數均等於 13。

The median of the data in both groups I and II was equal to 13.

例題 3.5

Example 3.5

某家醫院報導其 6 位移植心臟的病人在手術完成後,其活存的時間分別是 15,3,46,623,126,64 天,試


求出該醫院換心病人之活存時間的平均數 與中位數,並加以比較
A hospital reported that the 6 patients with heart transplants survived for 15, 3, 46, 623, 126, and 64
days after the operation. Try to find the average of the survival time of heart transplant patients in the
hospital. With the median, and compared to the

solution

平均數為:

The average is:

15+3+46+623+126+64/6=877/6= 146.2 天,

15+3+46+623+126+64/6=877/6= 146.2 days.

計算中位數時,需先將資料依大小順序排列:

When calculating the median, it is necessary to arrange the data in order of size:

3, 15, 46, 64, 126, 623

• 中位數的特性

• Properties of the median

任一組資料中,各觀測值與其中位數差之絕對值總和為最小。

In any group of data, the sum of the absolute values of the difference between each observation and its
median is the smallest.

其中 A 為該組資料中任一數值。

Where A is any value in the group of data.

當 A 與 M 相差愈大,兩數值之間的差距亦愈大。

The greater the difference between A and M, the greater the gap between the two values.

例如,x1=2,5,7,8,10,其中 Me=7。

For example, x1=2,5,7,8,10,Me=7.


因為 5 比 8 離中位數 7 還要送

Because 5 is more than 8, it is still far from the median 7.

○ 中位數的優點如下:

○ The advantages of the median are as follows:

① 性質簡單,易於瞭解。

① The nature is simple and easy to understand.

② 不易受極端值的影響。

② Not easily affected by extreme values.

○中位數的缺點如下:

○ The disadvantages of the median are as follows:

① 中位數只考慮居中位置的幾個數值,忽略了其他數值的大小,故缺乏敏感性。

① The median only considers a few values in the center, ignoring other values Size, so lack of
sensitivity.

② 基於上述①的理由,中位數不適合代數運算。

② For the reasons of ① above, the median is not suitable for algebraic operations.

例如:已知兩組資料之中位數 Me1,Me2 但兩組資料合併之中位數無法由個別之中位數 Me1,Me2 未


出。
For example: the median Me1,Me2 of the two sets of data is known, but the combined median of the
two sets of data cannot be determined from the individual medians Me1,Me2.

○當資料存在極端值時,中位數的代表性較平均數為佳。

○When there are extreme values in the data, the representativeness of the median is better than that
of the mean.

例如,政府發布所得資料常使用中位數。 因擁有超高所得的人數為極少 數,若採用平均數,無法表


現出一般的所得水準。
For example, medians are often used in government-published data. Because the number of people
with super high income is very small, if the average is used, it cannot show the general income level.

• 眾數(mode)的意義

• The meaning of mode In a set of data

一組資料中,出現次數最多的數值即為眾數。

The value that occurs most frequently is the mode.


一般以 M 表示。

Generally represented by M.

眾數為最集中且以位置為主的一個量數,特別適用於質的資料。

The mode is the most concentrated and location-based quantity, which is especially suitable for
qualitative data.

Ex. 某班級 40 位同學的血型資料如下:

Ex. The blood type information of 40 students in a class is as follows:

該班之血型 A 型最多,故血型資料的眾數為 A

The class has the most blood type A, so the mode of the blood type data is A

例題 3.6

Example 3.6

試求出下列三組資料之眾數:

Find the mode of the following three sets of data:

I: 15, 18, 20, 15, 15, 20, 25, 15

II 10, 12, 10, 10, 8, 12, 12, 14

III: 2, 7, 5, 9, 16, 20, 8, 10



Untie

第 I 組資料中,15 出現 4 次最多,故眾數為 15。

In the first group of data, 15 appeared 4 times the most, so the mode was 15.

第 II 組資料中,10 與 12 皆出現 3 次(最多),故眾數為 10 與 12。

In Group II data, both 10 and 12 appear three times (the most), so the mode is 10 and 12.

第 III 組資料中,各數值皆僅出現一次,故眾數不存在。
In the data of group III, each value appears only once, so the mode does not exist.

由此例子可知,眾數可能不存在或者不只一個,因此眾數並不具 有唯一存在的性質。

From this example, it can be seen that the mode may not exist or there may be more than one, so the
mode does not have the property of unique existence.

• 眾數的特性

• Properties of Modes

(1)眾數的優點如下:

(1) The advantages of mode are as follows:

① 性質簡單、易於瞭解。

① The nature is simple and easy to understand.

② 眾數不受極端值的影響。

②The mode is not affected by extreme values.

(2)眾數的缺點如下:

(2) The disadvantages of mode are as follows:

① 眾數與中位數一樣,僅考慮其中的幾個數值,故不適合代數運算。

①The mode, like the median, only considers a few values, so it is not suitable for algebraic operations.

② 當資料中的各數值皆只出現一次時,即不存在眾數:又兩個以上的眾數之情況,究竟選擇哪一個
為集中量 數的代表,頗難取捨。

② When each numerical value in the data appears only once, there is no mode; in the case of more
than two modes, it is quite difficult to choose which one is the representative of the concentrated
quantity.

O 當資料中有較多的數值向某一數值或其附近集中的情形時,採用眾數頗恰當。

O When there are many values in the data that are concentrated to a certain value or its vicinity, the
mode is appropriate.

Summary

• 尺度特性

• Scale properties

-名目尺度:眾數

-Nominal scale: Mode

-順序尺度:眾數或中位數
-Order scale: mode or median

-等距尺度及比率尺度:平均數

-Isometric scale and ratio scale: mean

• 優缺點

• Advantages and disadvantages

-眾數:具有作為類別資料的判斷準則(例如在民意的 表達,少數服從多數)、不受極端值影響等之優
點。 但是如果觀察值的分佈並不集中,則不適用眾數為判 斷準則;另外眾數不適合數學運算。

-Mode: It has the advantage of being a criterion for judging category data (for example, in the
expression of public opinion, the minority obeys the majority), and it is not affected by extreme values.
However, if the distribution of the observed values is not concentrated, the mode is not suitable for the
judgment criterion; in addition, the mode is not suitable for mathematical operations.

-中位數:具有不受極端值的影響,代表機率累積到中 位數時所佔之機率值為 50%等優點。 但是中


位數一樣 不適合數學運算。
-Median: It has the advantages of not being affected by extreme values, and representing a probability
of 50% when the probability accumulates to the median. But medians are just as bad for math.

-算術平均數:具有可進行四則運算、誤差平方和 (Error sum of squares)最小、母體平均數的最佳估


計式等優點。 但是容易受極端值影響及資料分配呈現 雙峰分配時,無法代表集中趨勢。

-Median: It has the advantages of not being affected by extreme values, and representing a probability
of 50% when the probability accumulates to the median. But medians are just as bad for math.

百分位數(percentile)

O 百分位數的意義,將資料按大小順序排列後,若至少有 k%的觀測值位於某一數值以下,且至少有
(100-k)%的觀測值位於該值以上, 則此數值 稱為該組資料的第 4 個百分位數。 (k-th percentile),一
般常用 Pk 表示。

The meaning of O percentile, after arranging the data in order of size, if at least k% of the observations
are below a certain value, and at least (100-k)% of the observations are above this value, then this value
is Called the 4th percentile of the set of data. (k-th percentile), commonly used to represent Pk.

NOTE: 重點在於小於或等於數值 Pk 。 一定要有 k%的觀察值。

NOTE: The point is to be less than or equal to the value Pk. There must be k% of observations.

Ex. 將一組已排好序的資料之分布範圍大小分割為相等的 100 等分,取第 32 個等分點,則其所在位


置的數值即為 P32。

Ex. Divide the distribution range of a group of sorted data into 100 equal parts, and take the 32 nd
aliquot, the value of its location is P32.

O 百分位數的計算步驟
O Calculation steps for percentiles

(1)將資料依大小順序排列。

(1) Arrange the data in order of size.

(2)求出百分位數(Pk)所在位置的指標 index),設為,則

(2) Find the index index of the position of the percentile (Pk), set it as , then

I=k/100 × n (n 表示觀測值的個數)。

(n represents the number of observations).

(3)若為非整數,則為下一個整數位置的數值,

(3) If it is a non-integer, it is the value of the next integer position,

例如 i=9.23,則取第 10 個位置之數值為為 Pk:

For example, if i=9.23, then take the value of the 10 th position as Px:

若 i 為整數,則取第與 i+1 位置的兩個數值之平均,即 為所求的 Pk”

if it is an integer, then take the average of the two values of the 10th position and +1 position, which is
the required PK"

例題 3.7

Example 3.7

某航空公司招募空姐,共來了 50 位應徵者,其體重由小到大 依序排列,如表 3.1 所示。 試求出 P25,


P30: P50, P75 。

An airline recruited flight attendants, and a total of 50 candidates came, and their weights were
arranged in order from small to large, as shown in Table 3.1. Try to find P25, P30: P50, P75.

表 3.1 50 位應徵者的體重由小到大排列表(以公斤為單位)

Table 3.1 The weight of the 50 applicants in descending order (in kilograms)

特殊的百分位數
Special percentile
用同樣的分割概念,我們可得

Using the same segmentation concept, we can get

四分位數(quartiles):Q1=P25,Q2=P50,Q3= P75.。

Quartiles: Q1=P25, Q2=P50, Q3=P75.

十分位數(deciles):D1=P10,D2=P20,D3=P3,…,D9=P90。

Deciles: D1=P10, D2=P20, D3=P3,…,D9=P90.

表 3.2 百分位數與中位數、四分位數,十分位數之對照

Table 3.2 Comparison of percentiles, medians, quartiles, and deciles

其他分位數(fractiles)的觀念和求法,和百分位數類似。 例如, 三分位數、七分位數等。

The concept and method of other quantiles (fractiles) are similar to percentiles. For example, tertiles,
7ths, etc.

3.2 差異量數(measures of variability) 。

差異量數(measures of variability, dispersion measures)則在衡量一組資料中,各個觀測值之間的差異


或離散的程度(故差異量數亦稱為離散量數),由此可表現出資料的 散布範圍及分布的型態,並可反
映出平均數的代表性。
The measure of variability (measures of variability, dispersion measures) is to measure the degree of
difference or dispersion between the observations in a set of data (so the measure of variance is also
called the number of dispersion), which can show the variability of the data. The distribution range and
distribution pattern, and can reflect the representativeness of the mean.

雖然各組平均成績相同(80 分),但成績分佈各不相同
Although the average score was the same across the groups (80 points), the distribution of scores
varied

全距(range)

一組資料中,數值最大者與最小者之差稱為全距(range), 一般以 R 表示。

In a set of data, the difference between the largest value and the smallest value is called the range,
which is generally represented by R.

優點:意義簡明易解且計算容易。

Advantages: The meaning is concise and easy to understand and the calculation is easy.

缺點:易受極端值之影響,無法測出中間各個觀察值之間 的差異情形。

Disadvantages: It is easily affected by extreme values, and cannot measure the difference between each
observation in the middle.

例如,工廠在進行品管時,事先設定產品標準規格之上下 限。 若產品規格的差異均在全距範圍內,
則此製造過程處於 控制狀態內。 否則超出控制之外,就必須採取矯正行動。

For example, when the factory conducts quality control, the upper and lower limits of the product
standard specification are set in advance. If the differences in product specifications are all within the
full range, the manufacturing process is under control. Otherwise out of control, corrective action must
be taken.

例題 3.8

Example 3.8

設有二組資料如下:

There are two sets of data as follows:

A:3, 4, 5, 6, 7, 9, 9, 10, 12, 15

B: 3, 8, 8, 9, 9, 9, 10, 15

試求出其全距、平均數與中位數,並做比較。

Try to find its full distance, mean and median, and compare them.

Untie

A、B 二組資料之全距均為 12。

The full distance of the data in groups A and B is 12.

在 4 組中,其平均數與中位數皆為 8,但各項數值分散於 3 至 15 之間;在 B 組中,其平均數與中位數皆


為 9,但大部分之數值趨 於中央。
In group 4, the mean and median were both 8, but the values were scattered between 3 and 15; in
group B, the mean and median were both 9, but most of the values tended to in the center.

由此可知,僅由全距來測度其差異量數,則其結果甚不可靠。

It can be seen from this that the results are very unreliable if only the full distance is used to measure
the difference.

四分位差(quartile deviation)

四分位距(interquartile-range, IQR): IQR=Q3-Q1

其中 Q1: lower quartile,Q3 : upper quartile。

WhereQ1: lower quartile, Q3: upper quartile.

四分位距的意義:一組資料中間一半的觀察值之全距。 PS. 此可克服極端值差異很大時,使用全距


測度所產生的缺點。
The meaning of interquartile range: the full range of the observations in the middle half of a set of data.
PS. This can overcome the disadvantage of using the full range measurement when the extreme values
are very different.

四分位差(quartile deviation, Q.D.):Q.D.= Q3-Q1/2

當資料為對稱分配時,Me-Q1=Q3-Me=Q.D.。 此時 Q.D.較 IQR 更為實用。 我們可利用此等式做為


判別 某一資料分配是否為對稱的必要條件。
When the data is distributed symmetrically, Me-Q1=Q3-Me=Q.D.. At this time Q.D. is more practical
than IQR. We can use this equation as a necessary condition for determining whether a data
distribution is symmetric.

例題 3. 9

Example 3.9

試計算表 3.1 中,50 位應徵者體重的四分位距與四分位差。

Try to calculate the interquartile range and interquartile range of the weight of 50 applicants in Table
3.1.

Untie

參考例題 3.7

已知 Q1=P25=57.2,Q3=P75=64.6,因此 IQR=Q3-Q1=64.6-57.2=7.4。 Q.D.=Q3-Q1/2 =7.4/2=3.7

Referring to example 3.7

it is known that Q1=P25=57.2, Q3=P75=64.6, so IQR=Q3-Q1=64.6-57.2=7.4. Q.D.=Q3-Q1/2 =7.4/2=3.7


O 四分位距與四分位差具有下列的特性:

O Interquartile range and interquartile range have the following properties:

1. 四分位差常配合中位數一起運用,即以中位數表 示一次數分配之集中趨勢時,可再輔以四分位
差 來表示其差異情形。
1.Interquartile range is often used together with the median, that is, when the median is used to
represent the central tendency of the primary distribution, it can be supplemented by the interquartile
range to represent the difference.

2. 四分位距的優點是計算簡便,易於瞭解,而且不 受極端值的影響。

2.The advantage of the interquartile range is that it is simple to calculate, easy to understand, and not
affected by extreme values.

3. 它僅考慮中間一半的數值,而對兩端之另一半的 數值皆未涉及,故不能表示全部數值之分散及
差異的情形,惟此項缺點並不如全距之甚。
4. It only considers the value of the middle half, and does not involve the other half of the two ends,
so it cannot express the dispersion and difference of all the values, but this shortcoming is not as
bad as the full distance.

平均偏差(mean absolute deviation)

○ 一組資料中各個數值與其集中量數的差,稱為離差 (deviation)。

○ The difference between each value in a set of data and its centralized quantity is called deviation.

○ 各個數值與平均數之差稱離均差(deviation about the mean),而與中位數之差稱離中差(deviation


about the median)。 由於離均差使用較為普遍,因此離差通常用來 指離均差。

○ The difference between each value and the mean is called deviation about the mean, and the
difference with the median is called deviation about the median. Since the use of deviation from the
mean is more common, the deviation is usually used to refer to the deviation from the mean.

◎ 平均偏差(mean absolute deviation, MAD),乃是各個數值 之離差取絕對值,然後再求其平均數

◎ Mean absolute deviation (MAD), which is to take the absolute value of the deviation of each value,
and then find its average
例題 3.19

Example 3.19

求算 5,6,7,9,23 與 5,6,7,9 兩組資料之平均偏差。

Calculate the average deviation of the data of 5, 6, 7, 9, 23 and 5, 6, 7, 9.



Untie

第一組資料的平均數為 10,第二組資料的平均數為 6.75,於是

The mean of the first set of data is 10, and the mean of the second set of data is 6.75, so
此二組資料之差異僅在於第二組少了一較大的值(23),但其 平均偏差卻有很大的差異。 由此可知,
平均偏差易受極端值 的影響。
The only difference between the two groups of data is that the second group is missing a larger value
(23), but the mean deviations are quite different. It can be seen that the average deviation is easily
affected by extreme values.

以平均偏差來衡量一組資料的差異情形的優、缺點如下:

The advantages and disadvantages of using the mean deviation to measure the difference of a set of
data are as follows:

●優點

●Advantages

1)意義簡明,計算容易。

1)The meaning is concise and the calculation is easy.

2)平均偏差係根據全部數值求得,故可表示整組資料 之完整的差異情形,較全距與四分位距感應靈
敏。
2) The average deviation is obtained based on all the values, so it can represent the complete difference
of the whole set of data, and is more sensitive than the full range and the interquartile range.

●缺點

●Disadvantages

1) 計算平均偏差時,是取離差的絕對值加總,而有關 絕對值的運算其意義不明顯。 (以數學式子


之處理 過程中,取絕對值運算較複雜。)

1)When calculating the average deviation, the absolute value of the deviation is summed up, and the
operation of the absolute value is not obvious. (In the process of processing mathematical expressions,
the operation of taking the absolute value is more complicated.)

2) 平均偏差因具有平均的意涵,故與平均數的缺點一 樣,易受極端值的影響。 (平均偏差愈小,則


表示 各數值間差異愈小。)

2) Because the mean deviation has the meaning of the average, it is susceptible to the influence of
extreme values just like the disadvantage of the mean. (The smaller the mean deviation, the smaller the
difference between the values.)
變異數與標準差
Variation and Standard Deviation

設有一組資料(母體)x1,x2,…, Xy,其平均數

There is a set of data (matrix) x1, x2,…, Xy, its mean


則變異數(variance)的計算式為

Then the formula for calculating the variance is


變異數與標準差
Variation and Standard Deviation

• 變異數(variance)則取離差的平方亦有一嚴重的缺點,離 差取平方後,資料的量測單位亦需跟著平
方,此時往往變 為無意義的單位。

• Taking the square of the variance also has a serious disadvantage. After the variance is squared, the
measurement unit of the data also needs to be squared, which often becomes a meaningless unit.

• 標準差(standard deviation)的觀念,它是變異數取平方 根,如此便可將單位還原與原來資料的單位


相一致,且無 加大原來差數之嫌。 母體標準差一般以表示,

• The concept of standard deviation, which is the square root of the variance, so that the unit can be
restored to be consistent with the unit of the original data, and there is no suspicion of increasing the
original difference. The maternal standard deviation is generally expressed as,
例題 3.11

Example 3.11

設有二組資料(母體)如下,試計算其變異數,並做比較:

There are two groups of data (maternal) as follows, try to calculate the variance and compare them:

A: 8, 9, 10, 11, 12

B: 4, 7, 10, 13, 16

Untie

此二組資料的平均數均為 10,而其變異數分別計算如下:

The mean of the two groups of data is 10, and the variance is calculated as follows:

由上可知,A、B 二組資料的平均數雖相同,但 A 組的變異數 遠小於 B 組,故 4 組資料的差異程度較


小,而且其平均數較具 代表性。

It can be seen from the above that although the averages of the data in groups A and B are the same,
the variance in group A is much smaller than that in group B, so the degree of difference in the data in
the four groups is small, and the averages are more representative.
樣本變異數(sample variance)

分母取 n-1,而不是。 其理由如下:比較(3-9)與(3-11)式, 前者的離差為 xM,而後者的離差為 xi 一面;一


般而言,母使失去一個自由度(degree of freedom, d.f. ),因此分母以 n-1 體平均數皆為未知,故通常以
樣本平均數來推估,致來除。 S² 亦是 a² 的一個不偏估計式。

The denominator takes n-1 instead. The reasons are as follows: Comparing equations (3-9) and (3-11),
the dispersion of the former is xM, while the dispersion of the latter is xi; in general, the mother loses
one degree of freedom (d.f. ), so the denominator is unknown with n-1 body mean, so it is usually
estimated by the sample mean, which leads to division. S² is also an unbiased estimator of a².

樣本標準差(sample standard deviation):S=√S2= √sample variance

例題 3.12

Example 3.12

求算下列樣本資料的變異數:

Find the variance of the following sample data:

3.4, 2.5, 4.1, 1.2, 2.8, 3.7



Untie

樣本平均數 overline x = 2.95 ,利用(3-11)式可計算其變異數:

The sample mean is overline x = 2.95, and its variance can be calculated by formula (3-11):

S^ 2 = 1 6-1 [ (3.4 – 2.95) ^ 2 + (2.5 – 2.95) ^ 2 + (4.1 – 2.95) ^ 2+ (1.2 – 2.95) ^ 2 + (2.8 – 2.95) ^ 2 + (3.7 –
2.95) ^ 2 ]

= 1.075
變異數與標準差之特性
Characteristics of Variation and Standard Deviation

◎ 變異數與標準差具有如下重要的性質:

◎ Variation and standard deviation have the following important properties:

• 若一資料分配的標準差很小,則表示大部分的數值 集中於平均數附近,此時平均數的代表性高;反
之,平均數的代表性低。

• If the standard deviation of a data distribution is small, it means that most of the values are
concentrated around the mean, and the representativeness of the mean is high; otherwise, the
representativeness of the mean is low.

•標準差恆大於等於零,除非所有的數值皆相等,標準差才會為零。

• The standard deviation is always greater than or equal to zero, unless all values are equal, the
standard deviation will be zero.

已知兩組(母體)資料,要合併為一組資料,其相關數據分別為

Two groups of (maternal) data are known and should be combined into one group of data, and the
related data are

N = N1+N2
例題 3.13

Example 3.13

設有 A、B 二班,其統計學平均成績、標準差與人數如下所 示:

There are two classes A and B, and the statistical average score, standard deviation and number of
people are as follows:

試計算兩班全體同學之統計學平均成績與標準差。
Calculate the statistical mean and standard deviation of all students in the two classes.

Untie
總平均成績
Overall grade point average

全體成績之變異數
Variation of overall performance
故全體成績之標準差為√179.94=13.414(分)

Therefore, the standard deviation of the overall scores is √179.94=13.414 (points)

例題 3.14

Example 3.14

衛保組提供馬老師其導師班學生體重的資料,25 位女生平均 體重為 56 公斤,標準差為 2 公斤。 但


馬老師發現該班僅有 24 位女生,經比對資料後發現其中一位體重 64 公斤是男生的體 重。 請替馬
老師計算該班女生真實的平均體重與標準差。
The health protection team provided information on the weight of the students in Mr. Ma’s tutor class.
The average weight of the 25 girls was 56 kg, with a standard deviation of 2 kg. However, Mr. Ma found
that there were only 24 girls in the class, and after comparing the data, it was found that one of them
weighed 64 kilograms, which was the weight of a boy. Please calculate the real average weight and
standard deviation of the girls in this class for Teacher Ma.

Untie
已知,
It is known
利用(3-10)式,可得

that by using the formula (3-10), we can get

所以,

So,
剔除一數(64)之後,平均數與平方和

after removing one number (64), the average and the sum of squares

變為:

Becomes:

由(3-10)式知

From the formula (3-10) we know

於是, a24 = V1.352= 1.163

So, a24 = V1.352= 1.163

馬老師班級女生的平均體重應為 55.667 公斤,標準差為 1.163 公反

The average weight of girls in Mr. Ma’s class should be 55.667 kg, with a standard deviation of 1.163

變異數與標準差之優缺點:

Advantages and disadvantages of variance and standard deviation:


1. 變異數或標準差計算與所有數值有關,故感應靈敏;亦即,資料中若有任一數值變動,則其變異數
或標準差亦必隨之變動。
1. The variance or standard deviation calculation is related to all values, so it is sensitive; that is, if
any value in the data changes, the variance or standard deviation must also change.

2. 變異數或標準差皆與平均數具有密切關係,且均能 滿足代數運算,故應用範圍很廣泛。

2. Both variance and standard deviation are closely related to the mean, and both can satisfy algebraic
operations, so they have a wide range of applications.

3.至於變異數或標準差之缺點則為易受極端值之影響,此點與平均數相同。

3.The disadvantage of variance or standard deviation is that it is susceptible to extreme values, which is
the same as the mean.

3.3 變異係數

3.3 Coefficient of variation

• 全距、四分位差(距) 平均偏差及標準差均帶有與原資料相同的單位,稱為絕對差異量數(measures
of absolute dispersion)。 凡性質相同、單位相同,平均數相差不大的 二組統計資料,皆可用絕對差
異量數來比較。
• The full range, interquartile range (range) mean deviation and standard deviation all have the same
units as the original data, which are called measures of absolute dispersion. For two groups of statistical
data with the same nature, the same unit, and little difference in mean, the absolute difference can be
used to compare.

•但如果兩種或以上的性質不同,或單位不同,或平均數相 差很大的二組資料,則需使用相對差異量
數(measures of relative dispersion)來比較。

• However, if two or more types of data are different in nature, or in different units, or in two sets of
data with very different means, the measures of relative dispersion should be used for comparison.

● 相對差異量數,乃為絕對差異量數與某一集中量數或其他 適當數值之比,且通常以百分比表示。
變異係數 (coefficient of variation),通常以 CV 表示。

● The relative variance is the ratio of the absolute variance to a concentrated measure or other
appropriate value, and is usually expressed as a percentage. The coefficient of variation, usually
expressed as CV.

• 在下面兩種場合欲比較兩組或以上的資料之差異情況,可採用變異係數:
• The coefficient of variation can be used to compare the differences between two or more data sets in
the following two situations:

1. 單位不同的資料。

1.Information on different units.

2.單位相同,但平均數相差很大的資料。

2.Data with the same units but with very different averages.

例題 3.15

Example 3.15

2012 年甲國每人每年 GDP 的平均數是 7,800 美元,中位數是 8,900 美元,變異數是 9,000,000;乙國的


平均數是 16,000 美 元,中位數是 15,800 美元,變異數是 16,000,000。 請問哪一國 在 2012 年的貧富
差距較大?

In 2012, the average GDP per person per year in country A was $7,800, the median was $8,900, and the
variance was 9,000,000; the average in country B was $16,000, the median was $15,800, and the
variance was 16,000,000. Which country had the largest gap between the rich and the poor in 2012?

Untie

本例題中,二個國家 GDP 的計算單位雖然都是美元,但因二 者的平均數相差甚大,不宜直接比較變


異數或標準差,應求 出其變異係數後再做比較:

In this example, although the GDP of the two countries is calculated in US dollars, because the averages
of the two countries are quite different, it is not appropriate to directly compare the variation or
standard deviation. The coefficient of variation should be calculated before making a comparison:

Z 分數(Z-scores)

•Z 分數是透過平均數與標準差的組合,可決定一組資料之 各觀察值的相對位置。

• Z-score is a combination of mean and standard deviation to determine the relative position of
observations in a set of data.
• Z 分數的定義如下:

• Z-scores are defined as follows:

Zi=第項觀測值的分數

Zi = fraction of the th observation

X 或 u=樣本(或母體)平均數

X or u = sample (or population) mean

S 或 a=樣本(或母體)標準差

S or a = sample (or maternal) standard deviation

◎ Z 分數通常稱為標準化值(standardized value),它可表示某一觀察值 X 與的距離有幾個標準差。

◎ The Z-score is often referred to as the standardized value, which can indicate how many standard
deviations the distance between an observation X and X is.

• Ex. Z = 1.2:大於平均數 1.2 個標準差 Zy=-0.5:小於平均數 0.5 個標準差

• Ex. Z = 1.2: 1.2 standard deviations above the mean Zy=-0.5: 0.5 standard deviations below the mean

例題 3.16

Example 3.16

2016 年 TOEIC 測驗台灣地區之平均成績為 537 分,假定標準 差為 150,張先生考了 720 分,請問其 Z


分數為多少?

The average score of the 2016 TOEIC test in Taiwan is 537 points, assuming that the standard deviation
is 150, Mr. Zhang scored 720 points, what is his Z score?

Untie

張先生的 Z 分數為:

Mr. Zhang’s Z-score is:

Z=720-537/2=1.22
以平均值為中心(0), 標準差為單位(格), 720 相對於 537 的位置。

With the mean as the center (0) and the standard deviation as the unit (div), the position of 720 relative
to 537.

Chebyshev 定理與經驗法則

Chebyshev’s theorem and rules of thumb

● 當一組資料之標準差較小時,表示其各數值間的差異較 小,且大多數的數值集中在平均值附近。

● When the standard deviation of a group of data is small, it means that the difference between its
values is small, and most of the values are concentrated around the mean value.

想知道到底有多少個數值落於平均值附近某一區間內?

Want to know how many values fall within a certain interval around the mean?

• Chebyshev 定理

• Chebyshev’s Theorem

在任何資料分配中,觀測值落於平均數左右 k 個標準差的 區間內之比例,至少為(1-1/K2) ,k 必須


大於 1。
In any data distribution, the proportion of observations that fall within an interval of k standard
deviations from the mean, at least(1-1/K2) , k must be greater than 1.

各種不同值之 Chebyshev 定理的應用

Application of Chebyshev’s Theorem for Various Values

K 區間 落於該區間內觀測值的比例

Interval The proportion of observations that fall within the interval

此定理適用於任何的資料分配,且不管資料為母體或樣本皆可適用。

This theorem applies to any distribution of data, regardless of whether the data is a matrix or a sample.

不過,它只說明至少有多少比例落在此區間內,至於實際上 最多卻有多少比例,則不得而知。

However, it only states how many percentages fall within this range at least, but it is not known how
many percentages are actually at most.

例題 3.17

Example 3.17

茲從一批柳丁產品隨機抽出 30 顆測量其重量(公克),記錄如下:

30 pieces are randomly selected from a batch of Liuding products to measure their weight (grams), and
the record is as follows:
試利用 Chebyshev 定理求出,有多少比例的柳丁重量落在 (68,128)的區間內。

Try to use Chebyshev’s theorem to find out what proportion of the weight of the lily falls within the
interval of (68,128).

Untie

首先計算平均數與標準差,分別為元=98,S≈15。 利用 Chebyshev 定理之前,需先知道 k 值,而 k 值則


可由該區 間求出,參見下圖知:

First, calculate the mean and standard deviation, which are Yuan=98, S≈15. Before using Chebyshev’s
theorem, the value of k needs to be known first, and the value of k can be obtained from this interval,
see the figure below:

算出=2。 於是,由 Chebyshev 定理可知,至少有 1-1/22=3/4 (75%)的比例落於此區間內;亦即,至少有


3/4×30=22.5(個)觀測值落在( 68,128)區間內。 事實上,根據實際的資料顯示,此 30 個觀測值有 28 個
落在(68,128)區間內,由此可驗證 Chebyshev 定理。

Calculated = 2. Therefore, according to Chebyshev's theorem, at least 1-1/22=3/4 (75%) proportions fall
within this interval; that is, at least 3/4×30=22.5(pieces) observations fall within ( 68,128) range. In fact,
according to the actual data, 28 of the 30 observations fall within the interval (68,128), which can verify
the Chebyshev theorem.

• 經驗法則(empirical rule)

當資料分配呈鐘形形狀(bell-shaped)時,亦即為對稱分配,則:

When the data distribution is bell-shaped, that is, the distribution is symmetrical, then:

○ 約有 68%的觀測值落於(x-S,x+S)的區間內。

○ About 68% of the observations fall within the interval (x-S,x+S).


○約有 95%的觀測值落於(x-2S,x+2S)的區間內。

○About 95% of the observations fall within the interval (x-2S, x+2S).

○ 約有 99.7%的觀測值落於(x-3S,x+3S’)的區間內。

○ About 99.7% of the observations fall within the interval (x-3S, x+3S’).

◎ Chebyshev 定理的限制是,只能說出“多少比例以上”的 訊息,但可適用於任何形式的資料分


配。
◎ The limitation of Chebyshev’s theorem is that it can only say “how much or more” information, but it
can be applied to any form of data distribution.

◎ 經驗法則可較具體告訴我們約有多少比例”的訊息, 但它的限制是,只能適用於(近似)對稱分配的
資料。
◎ The rule of thumb can tell us more specifically about what proportions”, but its limitation is that it
can only be applied to (approximately) symmetrically distributed data.
3.4 偏態量數、峰態量數與動差

3.4 Skewed quantity, kurtosis quantity and momentary difference

● 動差

● Momentum

動差(moment)用來表達資料數據全體特性的數學模式,是 一種利用冪次方做其轉換基礎的平均數,
通常
Moment is a mathematical model used to express the overall characteristics of data data, and it is an
average that uses power as the basis for its transformation.

1) 觀察值都會先減去一特定數,

1) Observations are first subtracted by a specific number

2) 再取冪次方後,

2) After exponentiation,

3)計算全體算術平均數, 一般稱為以特定數為中心之動差。

3) Calculate the arithmetic mean of the whole, which is generally called the momentary difference
centered on a specific number.

動差,又稱矩,其概念來自於物理學。 在物理學中,動差是用來表示 物體形狀的物理量。

Momentum, also known as moment, its concept comes from physics. In physics, momentum is a
physical quantity used to express the shape of an object.
動差是用於物體形狀識別的重要參數指標。
Momentum is an important parameter index for object shape recognition.

1. 原動差(特定數為 0)

1. Prime Momentum (specified number is 0)

稱為第一級原動差(moment about zero)。

It is called the first-order prime mover (moment about zero).


2. 主動差(特定數為 11)

2. Active difference (specific number is 11)

稱為第一級主動差 (rth principal moment)

Called the first-order active difference (rth principal moment)


偏態量數
Skewed quantity

偏態量數(coefficient of skewness, SK),是用來衡量資料分布的形狀。 其形狀若以中心位置呈對稱形


狀時稱為對稱分 配;或是偏向中心位置左邊,稱為左偏分配;或是偏向中心 位置的右邊,稱為右偏分
配。
The coefficient of skewness (SK) is used to measure the shape of the data distribution. If its shape is
symmetrical with the center position, it is called symmetrical distribution; or if it is biased to the left of
the center position, it is called left biased distribution; or if it is biased to the right of the central position,
it is called right biased distribution.

(a)對稱分配

(a) Symmetrical distribution

(b)左偏分配 (相對大的資料比較多)

(b) Left-biased distribution (more relatively large data)

©右偏分配 (相對小的資料比較多)

© Right skewed distribution (more relatively small data)


圖 3.1(a)曲線為對稱分配,此時中心位置就是平均數,中位數與眾數的所在,三者為同一點,呈現三點
合一的情形(b)曲線為左偏分配,此 時平均數最小,中位數則介於平均數與眾數之間(c)曲線為右偏分
配,此時平均數為最大,且呈現與左偏分配相反之位置分布

Figure 3.1 (a) The curve is symmetrical distribution, at this time the center position is the mean, the
median and the mode are located, the three are the same point, showing the situation that the three
points are unified (b) The curve is a left-biased distribution, this When the mean is the smallest, and the
median is between the mean and the mode (c) The curve is a right-biased distribution, and the mean is
the largest at this time, and presents a positional distribution opposite to the left-biased distribution.

皮爾森偏態量數(Pearsonian coefficient of skewness) 在偏態分配中,X 至 M。 的距離是 X 至 M.距離


的 3 倍。

Pearsonian coefficient of skewness In the skewness distribution, X to M. The distance is 3 times the
distance from X to M.

相對偏態:

Relative skewness:

(1)當 SK=0 時,

(1) When SK=0,

由於 X=M,表示資料的分配會近似對稱分配。

Since X=M, it means that the distribution of data will be approximately symmetrical.

(2)當 SK<0 時,

(2) When SK<0,


由於 X<M,表示資料的分配會近似左偏分配

Since X<M, the distribution of the data will be approximately left-biased distribution

(3)當 SK>0 時,

(3) When SK>0,

(負偏態分布)

(negatively skewed distribution)

由於文> M,表示資料的分配會近似右偏分配。

Since Wen > M, it means that the distribution of data will be approximately right-skewed.

(正偏態分布)

(positively skewed distribution)

2.動差偏態量數(moment coefficient of skewness)

依據動差偏態量數可知:

According to the number of momentum skewness, it can be known that:

(1)當 aq=0 時,表示資料的分配會近似對稱分配。

(1) When aq=0, it means that the distribution of data will be approximately symmetrical.

(2)當 a<0 時,表示資料的分配會近似左偏分配。

(2) When a<0, it means that the distribution of data will be approximately left-biased.

(3)當 a>0 時,表示資料的分配會近似右偏分配。

(3)When a>0, it means that the distribution of data will be approximately right-biased.

峰態量數(kurtosis)

峰態量數(coefficient of kurtosis, KC),是用來衡量資 料分配的峰態高低。 在統計學上觀察各種分配


的峰態 時,通常會先依據母體資料定義出四級動差為母體峰 態,而母體峰態量數為:
The coefficient of kurtosis (KC) is used to measure the kurtosis of the data distribution. When observing
the kurtosis of various distributions statistically, the fourth-order momentary difference is usually
defined as the parent kurtosis based on the parent data, and the number of parent kurtosis is:

峰態圖形定義
Kurtosis Graphic Definition

依據量數值可知:

According to the quantitative value, it can be known that:

1.當=3 時,表示資料分布呈常態峰,形成一般正常的型態。

1. When = 3, it means that the data distribution presents a normal peak, forming a general normal
pattern.

2.當>3 時,表示資料分布呈高狹峰,集中於平均數或眾數附近。

2. When >3, it means that the data distribution has a high and narrow peak, and is concentrated near
the mean or mode.

3.當<3 時,表示資料分布呈低闊峰,平均地分散於兩端。

3.When <3, it means that the data distribution has a low and broad peak, and is evenly dispersed at both
ends.

You might also like