You are on page 1of 11

4.

1: INTRODUCTION TO DATA relative frequencies (percentage) of the data belonging to Bar chart
MANAGEMENT different categories. 7. Time series graph- Represents data step1: Draw and label the x- axis
that occur over specific period of time under observation. 8. step2: Make a bar with the same width and draw the hieght
ESSENTIAL TERMS IN UNDERSTANDING THE Pictograph- Immediately suggest the nature of the data corresponding to the frequencies.
NATURE OF DATA: being shown. Scatter plot- Is used to examine possible Pie chart
1.Raw data- data collected in original form. 2.Range- the relationships between two numerical variables. The two step1: Since there are 360⁰ in a circle, the frequency of each
difference of the highest value and the lowest value in a variables are plot in x- axis and y-axis. class must be converted into a proportional part of the
distribution 3.Frequency Distribution- the organization of GUIDELINES IN MAKING GRAPH/ CHARTS circle. This conversion is done by applying a formula;
data in a tabular form, using mutually exclusive classes  The graph/ chart should include a title degrees=(f/n)(360⁰)
showing the number of observations in each 4.Class limits/  The scales for all axis should be included where f= frequency of each class
Apparent limits- the highest and lowest values describing a  The scale on the y- axis should start at zero n= sum of fequencies
class 5.Class Boundaries/ Real limits- the upper and lower  The x- axis and y- axis should be properly labeled Pictograph
values of a class for group frequency distribution whose  The graph/ chart should not contain unnecessary step1: Draw and label the x-axis and y- axis
values has additional decimal place more than the class decorations step2: label the x-axis for years and y-axis for number of
limits and end with the digit 56.Interval/ width- the distance  The simplest possible graph/ chart should be used for houses
between the class lower boundary and the class upper any data set step3: Draw a house to represent the number of houses
boundary and its denoted by the symbol I 7.Frequency- the DEFINITION OF STATISTICS formulas
number of values in a specific class of a frequency  Statistics can be define in two ways. *Percentage= (f/n) (100%) where f = frequency n = sum of
distribution 8.Percentage- obtained by multiplying the Constructing a histogram frequencies
relative frequency by 100% 9.Cumulative frequency- is the step 1: Find the midpoint of each class step2: Draw the Rule 1. *Suggested class interval = ___range =
sum of the frequencies accumulated up to the upper label the x- axis and y- axis.step3: Represent the frequency HV-LV
boundary of a class in a frequency distribution on the y-axis and the midpoints of the x- axis step4: Use the Number of
10.Midpoint- is the halfway between the class limits of frequency to represent the height and draw the vertical bars. Classes k
each class and is representative of the data within that class. Constructing a cumulative frequency polygon (ogive)
GRAPHING STATISTICAL DATA step1: Find the cumulative distribution of the data set. Where: HV = Highest valve in a data set
1.histogram- is graph in which the classes are marked on step2: Draw and label the x- axis and y- axis step3: LV = Lowest value in a data set
the horizontal axis Represent the frequency on the y- axis and the upper class k = number of classes
9. (x- axis) and the class frequencies on the vertical axis (y- boundaries on the x- axis step4: Connect the adjacent I = suggested class interval
axis). 2.Frequency polygon- Is a graph that displays the points with the line segments Rule 2.
data using points which are connected by lines. The Constructing a cumulative frequency polygon (ogive)
frequencies are represented by the height of the points at step1: Find the cumulative distribution of the data set. Suggested Class Interval =
the midpoints of the classes. 3.Cumulative frequency step2: Draw and label the x- axis and y- axis __________Range_________________
polygon- or ogive (read as oh'-jive) is a graph that displays step3: Represent the frequency on the y- axis and the upper 1+3.322
the cumulative frequencies for the classes in a frequency class boundaries on the x- axis (logarithm of
distribution. 4.Pareto chart - Is a graph used to represent a step4: Connect the adjacent points with the line segments total
frequency distribution for a categorical data (nominal-level) Pareto chart frequencies)
and frequencies are displayed by the heights of vertical step1: Arrange the data from highest to lowest.
bars, which are arranged in order from the highest to step2: Draw and label the x- axis(products) y- axis (sales) Degrees = (f/n)(360֯)
lowest. 5. Bar graph or bar chart- Is similar to bar step3: Construct the chart by arranging the frequency fom Where f = frequency of each class, and n = sum of
histogram. The bases of the rectangles are arbitrary highest to lowest and from the left to right. make the bar frequencies.
intervals whose centers are the codes. 6.pie graph or pie with the the same width and draw the hieght corresponding Hence, the following conversations are obtained. The
chart- Is a circle divided into portions that represents the to the frequencies degrees should total to 360֯
distribution on the right is more spread out. It is measures for measuring the “dispersion” or
4.2 Measure by Central Tendency, Mean, crucial to understand that the central tendency “variation” of the data are required.
Median,Mode summarizes only one aspect of a distribution and that (I) Range:
A measure of central tendency is a summary it provides an incomplete picture by itself. range=(largest value of the data)-(smallest
statistic that represents the center point or typical value of the data).
value of a dataset. These measures indicate where Mean Example 2 (continue): Range of lifetime data for
most values in a distribution fall and are also referred The mean is the arithmetic average, and it is probably factory 1=10.1-9.9=0.2 Range of lifetime data for
to as the central location of a distribution. You can the measure of central tendency that you are most factory 2=16-3=13
think of it as the tendency of data to cluster around a familiar. Calculating the mean is very simple. You just
The range of battery lifetimes for factory 1 is
middle value. In statistics, the three most common add up all of the values and divide by the number of
much smaller than the one for factor 2.
measures of central tendency are the mean, median, observations in your dataset.
Median Note: the range is seldom used as the only measure of
and mode. Each of these measures calculates the dispersion. The range is highly influenced by an
location of the central point using a different method. The median is the middle value. It is the value that
splits the dataset in half. To find the median, order extremely large or an extremely small data value.
 Locating the Center of Your Data
your data from smallest to largest, and then find the (II) Interquartile Range:
The three distributions below represent different data point that has an equal amount of values above it Interquartile is the difference between the third and the
data conditions. In each distribution, look for the and below it. The method for locating the median first quartiles. That is,
region where the most common values fall. Even varies slightly depending on whether your dataset has Example 2 (continue):
though the shapes and type of data are different, an even or odd number of values. The first quartile and the third quartile for the data
you can find that central location. That’s the area Mode from factory 1 are 9.9 and 10.1, respectively, and 6
in the distribution where the most common values The mode is the value that occurs the most frequently and 14 for the data from factory 2. Therefore,
are located. Histogram that shows a continuous, in your data set. On a bar chart, the mode is the IQR (factory 1)=10.1-9.9=0.2
symmetric distribution. The area of central highest bar. If the data have multiple values that are IQR (factory 2)=14-6=8.
tendency is circled.Histogram that shows a tied for occurring the most frequently, you have a The interquartile of battery lifetimes for
continuous, skewed distribution. The area of multimodal distribution. If no value repeats, the data factory 1 is much smaller than the one for factor 2.
central tendency is circled.Bar chart of ice cream do not have a mode. (III) Variance and Standard Deviation:
preference to illustrate the central tendency for 4.3 Measure of Desperation population deviation about the mean: sample
categorical data. Example 2: deviation about the mean: Intuitively, the population
The central tendency of a distribution represents one Suppose there are two factories producing the deviation and the sample deviation can measure how
characteristic of a distribution. Another aspect is the batteries. From each factory, 10 batteries are drawn to far the data is from the “center“ of the data. Then,
variability around that central value. While measures test for the lifetime (in hours). These lifetimes are: population variance and sample variance are the sum
of variability is the topic of a different article (link Factory 1: 10.1, 9.9, 10.1, 9.9, 9.9, 10.1, of square of the population deviation and sample
below), this property describes how far away the data 9.9, 10.1, 9.9, 10.1 deviation.
points tend to fall from the center. The graph below Factory 2: 16, 5, 7, 14, 6, 15, 3, 13, 9, 12. (IV) Coefficient of Variation:
shows how distributions with the same central The mean lifetimes of the two factories are both 10. The coefficient of variation is another useful statistic
tendency (mean = 100) can actually be quite different. However, by looking at the data, it is obvious that the for measuring the dispersion of
The panel on the left displays a distribution that is batteries produced by factory 1 are much more reliable the data. The coefficient of variation is
tightly clustered around the mean, while the than the ones by factory 2. This implies other
The coefficient of variation is invariant with respect to given value. continues that described data that cluster around the
the scale of the data. On the other hand, the standard Formula: Where l = location of the data value mean.The graph of the data associated probability
deviation is not scale-invariant. p = percentile as a whole number density function is a bell shaped.With a peak of the
n = sample size mean and its known as Gaussian function or bell
4.4 MEASURES OF RELATIVE POSITION: Z- Z-SCORES/STANDARD SCORE curved.
SCORES, PERCENTILES, A standard score or z-score is used when direct Normal Curve
QUARTILES AND BOX - comparison of raw scores is imposible. Value is -was developed mathematically in 1973 by Abraham
AND- WHISKERS PLOTS obtained by subtracting the mean from the value and de Moivre (1667-1754) as an approximation to the
PERCENTILES dividing the result by the binomial distribution.
A percentiles is a measure used in standard deviation. A.Standard Normal Distribution
statistics indicating the value below The symbol for the standard -can be converted into a standard normal distribution
which a given score is z by obtaining the z value.A z value is a signed distance
percentage of \ between a selected value,designed x, and the mean
observations in a divided by the standard deviation.
group of Characteristics of Normal Distribution
observations fall Find the Standard Score: The four characteristics of a normal distribution are
Suppose that the mean on test 1 was 80.1 with a symmetric, and asymptotic ,the mean ,median and the
The formula of percentiles is standard deviation of 6.3 points. If a student made a mode are all equal.
92.5, what is the student’s standard score? Symmetric
The 25th percentile is also called the first -characterized by or exhibiting,well proportioned as a
body or whole,regular in form on arrangement of
quartile. Quartiles corresponding pats.
The 50th percentile is generally the A values that divide a list of numbers into quarters: Asymptotic
median (if your using the third definition see below.  Put the list of numbers in order -of or relating to an asymptote
The 75th percentile is also called the third quartile.  Then cut the list into four equal parts -approaching a given value as an expression
The difference between the third and first quartiles is
 The quartiles are at the “cuts” containing a variable tendto infinity.
the interquartile range.
Mean,Median,Mode
Example: the scores of student are 40, 45, 49, 53, 61, Example: 5,7,4,4,6,2,8 -is equal ,three kinds of average
65, 71, 79, 85, 91. What is the percentile for score 71?.  Put them in order- 2,4,4,5,6,7,8 Mean is the average ,the Median is the middle ,the
Solution:  Cut the list into quarters Mode is the value that occurs more often.
No. of scores below 71 =6 And the result is: 4. 6 HYPOTHESIS TESTING
Total no. of scores =10
Is an act in statistics whereby an analyst tests an
Percentile of 71 Q1= 4
assumption regarding population parameter. The
=6/10 x 100
Q2, which is also the median, =5 methodology employed by the analyst depends on the
=0.6x100
nature of the data used and the reason for the analysis.
=60 Q3= 7 Hypothesis testing is used to infer the result of a
Percentiles divide the data
hypothesis performed on sample data from a larger
into 100 equal parts. 4.5 Probabilities and Normal Distribution population.Statistical analyst test a hypothesis by
At the nth percentile, n% of the data lies at or below a The normal distribution or Gaussion Distribution is a measuring and examining a random sample of the
population being analyzed. All analysts use a random 3. If I ( give exam at noon instead of 7) then Scatter Diagram is useful tool for checking the
population sample to test two different hypotheses: the ( student test score will improve). assumptions in a regression analysis.
null hypothesis and the alternative hypothesis
Differences and similarity of null and alternative 4.7 LINEAR REGRESSION AND PEARSON PRODUCT-MOMENT
hypothesis: CORRELATION CORRELATION
 Null hypothesis  Is the most widely used in statistics to measure
Correlation the degree of the relationship between the
Is the hypothesis the analysts believe to be true. Is a statistical method used to determine whether a linear related variables.
 Alternative hypothesis relationship between variables exists.
 Formula used to calculate the Pearson r
Is the hypothesis the analysts believe to be untrue and A variable here is characteristic of the population
the opposite of the null hypothesis. being observed or measured.
Thus they are mutually exclusive and only one can be 
true. However one of the two hypotheses will always The sample then consists of random Observations  The Pearson r correlation would require both
be true. of the variable describing a given population. variables to be normally distributed.
4 steps of Hypothesis Testing  Correlation refers to the departure of two
REGRESSION ANALYSIS
All hypotheses are tested using four-step process: random variables from independence.
Is a statistical method used to describe the nature
1. The first step is for the analyst is to state the
of the relationship between variables, that is, either  Correlation Coefficient is defined as the
two hypotheses so that only one can be right.
positive or negative, linear or non linear. covariance divided by the standard deviation of
2. The next step is to formulate an analysis plan,
which outlines how the data will be evaluated. There are two types of relationships: simple and the variables.
3. The third step is to carry out the plan and multiple.  Pearson`s product-moment correlation of
physically analyze the sample data. simply correlation coefficient (or Pearson`s
4. The fourth and the final step is to analyze the In a SIMPLE RELATIONSHIP, there are two r) is a measure of the linear strength of the
results and either accept or reject the null variables- an independent variable (or explanatory
association between two variables.
hypothesis. variable or predictor variable) and a Dependent
Variable (or responsive variable.  It is founded by Karl Pearson
HYPOTHESIS STATEMENT  A test of significance for the coefficient of
-If you are going to propose a hypothesis, its Simple linear relationship can be positive or correlation may be used to find out if the
customary to write a statement. You’re statement will negative. computed Pearson`s r could have occurred in a
look like this:
A positive relationship exists when either population in which the two variables are
“if I ( do this to an independent variable)… then ( this
will happen to the dependent variable)”. variables increase at the same time or both related or not.
For example: decrease at the same time.  The formula of t-test
1. If I ( decrease the amount of water given to On contrary, in a negative relationship, as one  Where t= t-test for correlation coefficient
herbs) then ( the herbs will increase in size). variable increases, the other variable decrease or
2. If I give ( patients counseling in addition to vice versa.
medication) then ( their overall depression 
scale will decrease).
SIMPLE LINEAR REGRESSION ANALYSIS having a normal distribution, then there is a All the above values will help us find the P-value.
 Regression analysis is a simple statistical tool result (see distribution of the sample variance) Degree of Freedom Calculation: let’s calculate df (r---
used to model the dependence of a variable on which allows a test to be made of whether the 1) * (c---1), so in the given table, we have r (rows)=2
one (or more) explanatory variables. variance of the population has a pre- and c(column)=3
determined value. For example, a df = (2-1) *(3-1) = 1*2=2
 Simple linear regression is the least estimator manufacturing process might have been in CALCULATION:
of a linear regression model with single stable condition for long period, allowing a LET Eij, represent expected values of the two
predictor (or one independent variable). value for the variance to be determined variables are independent of one another.
essentially without error. Eij=ith (rows total X jth column total) /grand total.
 Least square model determines a regression
Chi-square test explained with Example:
equation by minimizing the sum of squares of
We will cover following important steps in our 5.1 SIMPLE AND COMPOUND
the vertical distances between the actual y
journey of chi-square test for independence of two INTEREST
values and the predicted values of y.
variables. Interest is the cost of borrowing money, where the borrower
 The difference between an observed and 1.State the hypothesis
and the interest that accumulates on it in every period.
predicted value is called residual. 2.Formulate Data Analysis plan
3.Analayzed The Sample Data Since simple interest is calculated only on the
 The mean of the residual is always zero. 4.Interpret The outcome principal amount of a loan or deposit, it's easier to
Step 1: State the Hypothesis: determine than compound interest
 The points that fall outside the overall pattern Here we need to start by establishing a null
of the other points are known as outliers. Simple Interest Simple interest is calculated
hypothesis and counter hypothesis using the following formula:
 The scores whose removal greatly changes the (alternative hypothesis) as given below.
Null hypothesis: Simple interest= P × r × n
regression line are called influential scores. Where:
Ho: Gender and voting preferences are independent.
Step2: let’s Build Our Data Analysis Plan P= Principal amount
4.8 CHI SQUARE / USES r= Annual interest rate
A chi- squared test , also written as 𝑥 2 test ,is Here we will try to find out p value and match it
with the significance level n= term of loan in years
any hypothesis statistical test where the
Let’s take the standard and accepted level of Real-Life Simple Interest Loans
sampling distribution of test statistic is chi-
significance to be 0.05. Two good examples of simple interest loans are auto
squared distribution when the null hypothesis
Given the sample data in the table above, let’s try loans and the interest owed on lines of credit such as
is true .Without other qualification , ‘chi-
to employ Chi-square test for independence and credit cards. A person could take out a simple interest
squared ‘ test often is used as short for
deduce the probability value. car loan, for example. If the car cost a total of $100,
Pearson’s chi squared .The Chi -squared test is
Step3: Let’s Do Sample Analysis Plan: to finance it the buyer would need to take out a loan
used to determine whether there is a significant
Here we will analyze the given sample data to with a $100 principal, and the stipulation could be that
difference between the expected frequencies
compute the loan has an annual interest rate of 5% and must be
and observed frequencies in one or more
 Degree of freedom paid back in one year.
categories
Chi-squared test for variance in a normal population if  Expected Frequency Count of sample Compound Interest
a sample. variable Compound interest accrues and is added to the
If a sample of size n is taken from a population accumulated interest of previous periods; it includes
 Calculate Chi -Square test static value,
interest on interest, in other words. The formula for
compound interest is: Then, finance charge is being computed based on the Given: m = 6 r = 7% = 0.07
Compound Interest=P × (1+r) t – P average daily balance times the interest rate per month
where: Lastly, the current balance will be computed, Step 1: Determine the finance charge.
P=Principal amount Computation for each advances also include in this Finance Charge = Amount Finance x Finance
r=Annual interest rate section; Rate
t=Number of years interest is applied Finance Charge = 66,000.00 x 0.07 =
It is calculated by multiplying the principal amount by Average Daily Balance = P4,620.00
one plus the annual interest rate raised to the number 𝑆𝑢𝑚 𝑜𝑓 𝑡ℎ𝑒 𝐴𝑚𝑜𝑢𝑛𝑡
( ) +
of compound periods, and then minus the reduction in 𝐸𝑎𝑐ℎ 𝐷𝑎𝑦 𝑜𝑓 𝑡ℎ𝑒 𝐴𝑚𝑜𝑢𝑛𝑡 Step 2: Estimate the Annual Percentage Rate.
the principal for that year. 𝑁𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝐷𝑎𝑦𝑠
( ) 2𝑚𝑟
Examples of Simple and Compound Interest 𝑖𝑛 𝑡ℎ𝑒 𝐵𝑖𝑙𝑙𝑖𝑛𝑔 𝑃𝑒𝑟𝑖𝑜𝑑 APR ≈
Below are some examples of simple and compound 𝑚+1
Finance Charge= ADB x Periodic Rate
interest. 2(6)(0.07)
Example 1: Suppose you plunk $5,000 into a one-year APR ≈ ≈ 0.12 or 12%
Cash Advance Fee = Amount of Cash Advance x Rate 6+1
certificate of deposit (CD) that pays simple interest at
3% per annum. The interest you earn after one year 𝑃𝑟𝑒𝑣𝑖𝑜𝑢𝑠 The finance charge is P4,620.00 and the APR
Current balance = ( ) − (𝑃𝑎𝑦𝑚𝑒𝑛𝑡𝑠) is 12%.
would be $150: 𝑏𝑎𝑙𝑎𝑛𝑐𝑒
$5,000×3%×1 𝑁𝑒𝑤 𝐹𝑖𝑛𝑎𝑛𝑐𝑒
+( )+( )
𝐶ℎ𝑎𝑟𝑔𝑒𝑠 𝐶ℎ𝑎𝑟𝑔𝑒 C. Consumer Loans
5.2 Credit Cards and Consumer Loans B. Annual Percentage Rate Most consumer loans apply the APR interest
The Annual Percentage Rate (APR) is the rate, such as a car loan, housing loan, furniture loan,
effective annual interest rate which is the basis of appliance loan, and other types of loan with regular
A. Credit Card credit payments. The principle behind the APR is that
The credit card transaction is one of the most payment schedule. The payment amount for these
only the unlaid balance will earn interest. The Annual loans is given by the following formula.
widely used by individuals to pay purchase without
Percentage Rate (APR) of a simple interest rate loan
bringing cash on hand. The government requires credit 𝐴𝑖
can be estimated by the formula;
card companies to fully disclose to the costumer in R=
writing, the cost of the credit. The finance charge is 1−(1+𝑖 ) −𝑛
2𝑚𝑟
the peso amount that is paid for the credit, while the APR ≈ 𝑚+1 Where
annual percentage rate (APR) is the amount effective
interest rate being charged. R = regular payment m = number of
This section provides some insights on how to Where m is the number of payments and r is the payments per year
compute finance charges wherever we used our credit simple interest rate.
card and its monthly balance computation. A = loan amount n = tm
Credit card provides; Example 1: Redentor borrows P66,000 from the bank
r = Annual interest rate i= r÷m
First, compute the advance daily balance of a credit that advertises a 7% simple interest rate and repays the
card which is enough to the quotient between the sum loan in six equal monthly payments. Find the finance
of the amounts owned each day of the given month charge and estimate the annual percentage rate.
and the number of days in the billing period. Solution:
Example 1: RFS Electronics is offering costumers The three most common types of growth investment Average was 12 points but that the actual performance
who purchase the latest model of smartphones an products are stocks, some bonds, and some mutual of individual stocks in the index may have been higher
annual interest rate of 9% for 3 years. If Mirasol funds. The three are very different, but they share the or lower.
purchases an Iphone 8 for P46,700 from RFS common characteristic of being investment tools that In addition to tracking indices, sometimes it is helpful
Electronics, determine her quarterly payment. are often good for long-term saving. to track the performance of a sector of the economy,
Solution: For example, while the return on many income such as high technology or the automobile industry, or
investments is under 5%, it is not uncommon for the a certain category of stocks, such as large cap or small
Given: A = 46,700 m = 4 (quarterly returns on growth investments to be 10% or higher. cap stocks.
payment) i = r ÷ m = 0.09 ÷ 4 = 0.0225 While the volatility on growth investments makes it
risky to anticipate any positive return in the short term,
r = 9% = 0.09 t = 3 years most growth investments have shown steady growth
n = tm = 3(4) =12 over the decades, another reason they are good long-
𝐴𝑖 term investments.
R=
1−(1±𝑖) −𝑛
Bonds can be purchased in a variety of ways, the most
(46,700)(.0225) common being from a full-service or discount broker.
R ≈ 4,484.01 A bond broker can also purchase bonds but usually
1−(1+0.0225) −12
requires a minimum purchase of $5,000.
Stocks are considered a volatile investment because If you are purchasing bonds, it is a good idea to check
The quarterly payment for the smartphones is their value depends on how well the company is the credit rating of the companies whose bonds you
P4,484.01. doing—and many different factors, often unexpected, are purchasing.
can affect how a company is faring. Natural disasters, One common way to reduce risk when purchasing
5.3 STOCKS, BONDS, and MUTUAL economic downturns, increased competition, new bonds is to invest in a mutual fund that includes a
FUNDS inventions, and many other factors can affect a diverse portfolio of bonds. Investing in many bonds at
One of the key roles of a financial planner is to company either negatively or positively. once reduces the impact on the investment if one
provide advice to clients on how to invest their money company or government entity is unable to pay.
for the long term, such as saving for retirement. One 5.4 HOME OWNERSHIP
of the best ways to save for the long term is through
growth investment products.
There are two main categories of investments: growth
investments and income investments. Growth
investments have the greatest potential for growth
over the long term, while income investments have A market index (indices is the plural) is a grouping of
lower risk and are excellent for short-term investments a number of stocks whose performance is tracked
or if regular income is needed from the investment. together. For example, when you hear that the Dow
Jones fell 12 points, it means that taken together, the
average decline of stocks in the Dow Jones Industrial
These weighted edges size V times V, where V is the number of vertices in a
can be used to compute graph. For example, if we have an array (M), M{i,j} =
shortest path. It consists 1 indicates that there is an edge from vertex i to vertex
of: j. An adjacency matrix for an undirected graph is
A set of vertices V. always symmetric. An adjacency matrix can also be
A set of edges E. used to represent weighted graphs. For example, if
A number w (weight) that is assigned to each edge. M{i,j} = w, then there is an edge from vertex i to
Weights might represent things such as costs, lengths vertex j with weight w.
or capacities. Adjacency List
In a simple graph, the assumption is that the sum of all An adjacency list uses an array of linked lists. The size
the weights is equal to 1. of the array is equal to the number of vertices. For
Graph Types example, if we have an array (V), V{i} represents the
In addition to simple and weighted descriptions, there linked list of vertices adjacent to the i-th vertex. This
two types of graphs: representation can also be applied to a weighted graph.
Directed Graph: In a directed graph, edges have The weights of the edges can be stored in nodes of
direction (edges with arrows connect one vertex to linked lists.
another). 6.0 Graph Coloring
Undirected Graph: In an undirected graph, edges
have no direction (arrowless connections). It is What is graph coloring?
basically the same as a directed graph but has bi- * graph coloring is an assignment of colors (or
directional connections between nodes. any distinct marks) to the verticles of a graph. Strictly
Figure 1 outlines an example of directed and speaking , a coloring is a proper coloring if no two
undirected weighted graphs. adjacent vertices have the same color.
Figure 1: Examples of directed and undirected graphs. Why graph coloring?
*many problems can be formulated as a graph
Why Use Graphs? coloring problem including Time Tabling, Channel
Using graphs, we can clearly and precisely model a Assignment etc.
wide range of problems. For example, we can use * a lot of research has been done in this area.
graphs for: Channel Assignment
Coloring maps (modeling cities and roads) * Find a channel assignment to R radio stations such
Social Relations (sociology) that no station has a conflict (there is a conflict if they
5.8 Weighted Graph Protein interactions (biology) are vicinity)
Social networking (e.g. Facebook and Twitter) * Vertices- radio stations, edges- conflict,colors-
A weighted graph refers to one where weights are Graph Representation available channels.
assigned to each edge. Weighted graphs can be Graphs can be represented in two specific ways: 1) by
represented in two ways: Directed graphs where the using an adjacency matrix and 2) by using an Terminology
edges have arrows that show path direction. adjacency list. *k-coloring
Undirected graphs where edges are bi-directional and Adjacency Matrix = a k-coloring of a graph G Is a mapping of V(G)
have no arrows. An adjacency matrix is a two dimensional array of
onto integers 1..k such that adjacent vertices map into v, assign a new color to it. Proof of 1.
different integers. Let (a mod n)= Ra and ( b mod n)= Rb. Then, we
= a k-coloring partitions V(G) into k disjoint 6.1 Modular Arithmetic can write
subsets such that vertices from different subsets have DEFINITION: a= Ra + jn for some integer j and b = Rb + kn for
different colors.  Let a,b and n are integers and n>O. some integer k
*k-colorable ( a+b) mod n = (Ra + jn +Rb + kn) mod n
we write a = b mod n if and only if n divides
= a graph G Is k-colorable if it has a k-coloting =[Ra + Rb+ (k + j) n ] mod n
a-b.
*Chromatic Number =(Ra + Rb) mod n
n is called the modulos.
= The smallest integer k for which G Is k- =[(a mod n) + (b mod n) mod n
b is called the remainder.
colorable is called the chromatic number G.
Example:
Example Problem: Property Expression
29=15 mood 7 because 7/(29-15)
A state legislature had a number of committees Commutative (W+x) mod n =
12=3 mod 9;3 is a valid remainder since 9
that meet each week for an hour.How can we schedule laws (x+w) mod n
21=21 mod 9; 21 is a valid remainder since 9 divides
the committee meetings times such that the least Associative laws [(W + x) + y/
-6-3
amount of time is used but such that two committees mod n = ( w
12=-6 mod 9;-6 is a valid remainder since 9 divides -
with overlapping membership do not meet at the same +(x+y) mod n
6-3
time. [( w x x ) ] mod
1. a=b (mod n) if n/ (n- b)
n = ( w x y) mod
* Graph Colouring Algorithm 2. a=b (mod n) implies b = a (mod n) n
=there is no efficient algorithm available for a 3. a= (mod n) and b=c (mod n) imply a=c (mod n) Distributarive [ wx( x+y) mod n
graph with minimum number of colors. law = [(w x x )+ ( w
= graph coloring problem is a known NP Complete x y )] mod n
problem. Identifies (o + w) mod n =
Proof of 1.
w mod n
*NP Complete Problem If n/( a-b), then (a-b) = in for some k. Thus , we can
(1 x w) mod n =
=The interesting part is,if any one of the NP write a=b + kn. Therefore,
w mon
complete problems can be solved in polynomial (a mod n) = (remainder when b + kn is divided by
Add the Inverse (-w) For each w
time,then all of them can be solved. n)= (remainder when b is divided by n) (b mod n)
Zn, there exists a
=Although graph coloring problem is NP 23=8 (8 mod 5) because 23-8 = 15= 5x3
z such that w + z
Complete problem there ara some approximate -11=5 (mod 8) because -11-5= 16=8x(-2)
a 0 mod n.
algorithms to solve the graph coloring problem. 81=0(mod 27) because 81-0=81=27x3
Karl Friedrich Gauss,1801.
Basic Greedy Algorithm Types of Modular Arithmetic
 Modular Arithmetic = "wrap - around"
1. colors first vertex with first color >we can add subtract congruent elements without
losing congruence: computations
2. Do following for remaining V-1 vertices.
a) Consider currently picked vertex and color it (a mod n)+ (b mod n) mod n = (a+ b) mod n
[(a mod n) = ( b mod n ) mod n= ( a-b) mod n Example: Start at 12 o'clock .5 hours plus 8 hours
with the lowest numbered color that has not been used equals 1 o'clock.
on any previously colored vertices adjacent to it.If all >Multiplication also works:
[(a mod n) x (b mod n) mod n= (a x b) mod n 5 + 8 = 1 ( mod 12)
previously used colors appear on vertices adjacenf to
Anyone can encrypt, because n and e are public.
 To encrypt, convert your message into a set of
plaintext numbers P, each less than n.
 For each P, compute C=P (mod n)
 The numbers C are your ciphartext.

Ronald Divest, Adi Shamir, Leonard Adleman, 1977.


 Pick two primes p and q.
 Compute n= pq.
 Pick encrypt exponent e such that e and ( p - 1)
(q - 1) don't have any common prime
 Make n and e public. Keep p and q private.

6.2 Applications of Modular


Arithmetic
Hashing, pseudo-random numbers, ciphers.

You might also like