You are on page 1of 16

Further Mathematics – Core Material Notes

Categorical data
Type of data Categorical Graph Bar chart Segmented bar chart Qualifications on use No more than 4-5 categories.

Categorical data: data obtained when classifying or naming some quality or attribute. Bar chart Segmented bar chart

Other variants include: ‘Percentage segmented bar chart’

» Analysing Categorical Data

Writing up a Report Skills check: - Write a brief report to describe the distribution of a numerical variable in terms of shape, centre, spread and outliers (if any). - Write a brief report to describe the distribution of a categorical variable in terms of the dominant category (if any), the order of occurrence of each category and their relative important

Numerical data
Type of data Numerical Graph Histogram Stem plot Dot plot 1 Qualifications on use Medium to large data sets Best for small to medium sized data sets Suitable for only small data sets

g.Further Mathematics – Core Material Notes Numerical data: data obtained by measuring or counting some quantity. number of people. temperature. height. Continuous: data that can have any value. • Discrete: distinct values that can be counted (e. even with decimals (e. you cannot have • or of a person). anything that requires a measuring device). Histogram (continuous) (discrete) Histogram Stem-and-leaf plot Dot plot Split stem Back-to-back stem plot » Analysing Numerical Data Shape Symmetric 2 .g.

Mode: the most commonly occurring value/s (only used when there is a high number of scores) Measures of spread Range: the difference between the smallest value and the largest. Measures of centre Mean: the average value.Further Mathematics – Core Material Notes Perfectly symmetrical data has an equal mean. median and mode. Negatively skewed Note: when the graph is skewed. IQR (interquartile range): the range in which 50% of the values lie. → 75th percentile whilst → 25th percentile 3 . Median: the midpoint of a distribution (50th percentile). Bimodal data shows two equal modes (usually indicates there are Positively skewed two groups of data that need to be separated such as height of boys and girls). the median and IQR are used when measuring centre and spread.

and the smallest and largest data values of a distribution: Minimum. . . the 2nd and 3rd quartile. » Five-number summary and Box Plots A listing of the median.Further Mathematics – Core Material Notes Outliers Any data value/s that stands out from the main part of the data (values that are unusually high or low). M. in that the distribution may just have a very long tail and there is not enough data to pick up other values within the tail). a box plot can be constructed: General Box Plot Box Plot with outlier/s Note: the lower fence and upper fence are not drawn in but it must be understood that values that lie outside these fences are classified as ‘possible outliers’ (possible. Maximum From this five-number summary. » Box Plots and Distribution Shape 4 .

The standard deviation can be estimated by assuming that around 95% of the data values lie within two standard deviations of the mean (four in total): Note: now that all of the summary statistics have been explained. the following summary statistics are usually used together: o Mean and standard deviation o Median and IQR o Mode and range » 68-95-99. For student A: For student B: 5 .Further Mathematics – Core Material Notes Symmetric Skewed Positively skewed Negatively » Standard Deviation Standard Deviation: used to measure the spread of a data distribution around the mean. Calculate the z-scores for both students and comment. while student B received a score of 23. Example: The mean study score for Further Mathematics in 2011 was 30 with a standard deviation of 7.7% Rule » Standard scores/z-scores Standard scores/z-scores: transformed data values that show the number of standard deviations that the values lie from the mean of the distribution. Student A received a study score of 47.

scoring lower than 84% of the rest of the state. whilst student B lies in the bottom 16% of the state. In order to accurately compare the two variables. student A lies at the BLUE point and student B lies at the GREEN point: From this graph. it is incorrect to make a judgement regarding the relationship between these variables based on the first table. it is important that the table entries are converted into percentages (as shown below) 6 . BIVARIATE DATA Bivariate data sets can be of three types: o Categorical – Categorical o Numerical – Categorical o Numerical – Numerical For any data set. The independent variable explains the change in the dependent variable. Categorical— » Two-way categorical frequency table Note: Unless the two column sums are equal.5% of the state. variable is bivariate one dependent and the other independent: The dependent variable responds to change in the independent variable.Further Mathematics – Core Material Notes Drawing this out graphically. it can be understood that student A’s score lies within the top 2.

compare the following features: .Medians .Further Mathematics – Core Material Notes » Segmented Bar Chart The relationship between two categorical variables can also be compared by using a percentaged segmented bar chart: Numerical—categorical » Parallel box plots Note: When analysing parallel box plots.Shapes (symmetric or skewed) Numerical—numerical » Scatterplots 7 .IQRs and/or ranges .

It should be used with caution if outliers are present. weak or no relationship . Note: 1. shows that [ %] of the variation in [dependent variable] can be explained by variation in [independent variable].strong. The other [(100-100r²)]% of variation in [dependent variable] can be explained by other factors or influences.Coefficient of determination (r2) Form . or no correlation » Pearson’s Correlation Coefficient Correlation coefficient: A value that shows how strong the relationship between two variables is. negative.Correlation coefficient (r) . its: Strength . 2. The correlation coefficient is designed for numerical and linear data only.Further Mathematics – Core Material Notes » Interpreting Scatterplots The object of bivariate analysis is to determine whether a relationship exists between two variables and if so. The standard analysis is: The coefficient of determination.Positive. calculated to be [ ].Linear or non-linear Direction . » Coefficient of Determination Coefficient of determination: describes the amount of influence that the independent variable had on the dependent variable (usually expressed as a percentage). 8 . o Negative values always represent relationships with a negative gradient. o Positive values always represent relationships with a positive gradient. where a ‘strong’ relationship will show a perfect linear graph whilst a graph with no correlation would have values scattered everywhere with no clear pattern. moderate.

it is important to consider other factors. This method assumes that the variables are linearly related.Further Mathematics – Core Material Notes Note: The coefficient of determination does not entirely determine whether there is a relationship between two variables. intellectual ability may be due to age rather than height. This straight line can be found using two methods: » Least Squares Method Note: Least Squares regression line is usually used for data without outliers. it was calculated that the relationship between height and intellectual ability had a strong correlation. therefore. this may simply be because taller people are generally older than those who are shorter and thus. Residuals: the vertical distances between the actual x value and the predicted x value which lies on the least squares line 9 .** When using this method. the IV and DV must be correctly identified. For example. However. Choosing a Suitable Graph Dependent variable Categorical Categorical Categorical (two categories only) Numerical Type of data Independent variable Categorical Numerical Numerical Numerical Graph Segmented bar chart Parallel box plots Back-to-back stem plots Parallel box plots (preferred) Scatterplot REGRESSION Least Squares Regression Linear regression: the process of fitting a straight line to bivariate data with the aim of modelling the relationship between two numerical variables.

o A residual that appears like a positive or negative parabola indicates that the data is non-linear and a transformation should be applied to the data. Connect the two outside points and move this line one third of the way towards the middle point. a. 4.Further Mathematics – Core Material Notes Least squares line: minimises the sum of the squares of the residuals. Find the median point of each group by finding the median of the x and y values. 10 . Note: extrapolation is a less reliable process than interpolation as you are going beyond your original data.** 1. Extrapolation and Interpolation Interpolation: predicting within the range of data. This method assumes that the variables are linearly related. predicts the value of y when x=0 ** To see whether two variables are linearly related. Extrapolation: predicting outside the range of data. Plot the data on a scatterplot 2. Note: The slope. b. predicts the change in y when x changes by one unit. divide it in a way that the left and right sides both have equal amounts). 3. Divide the points symmetrically into three groups (if you are unable to divide the points equally. A residual plot shows important information about a relationship and allows you to view the residual values for each point. A residual plot with a random pattern indicates that the data is linear. you can plot what is known as the ‘residual plot’. o +b means y increases as x increases o –b means y decreases as x increases The y-intercept. o » The Three Median Line Note: The Three Median regression line is usually used for data with outliers. and thus whether or not the least squares method should be applied.

There are four ways in which it can be described: Trend Data displays a trend (or secular trend) when a consistent increase or decrease can be seen in the data over a significant period of time. A trend line can be fitted to such data. 11 . TIMES SERIES Summarising Time Series Data Time series data is simply data with a timeframe as the independent variable.Further Mathematics – Core Material Notes Data Transformation Note: When a transformation is needed. use the values of r and r² to help determine which is best.

Further Mathematics – Core Material Notes Seasonality/Seasonal variation Seasonal data are repetitive fluctuating movements which occur within a time period of one year or less. This includes data such as stock prices.g. All fluctuations occur by chance and cannot be predicted. sales of warm drinks might fluctuate every winter. Cycles Cyclic data shows fluctuations. For e. but not at consistent intervals. amplitudes or seasons. Random (Variation) Random data shows no pattern. This data can be deseasonalised. 12 . and occur in time intervals of more than one year.

This method is generally preferred to moving average smoothing when outliers are present. Seasonal Indices 13 . 5 and 4.Further Mathematics – Core Material Notes Smoothing Time Series Data Time series can be smoothed in two ways: » Moving Means 3-moving mean smoothing 5-moving mean smoothing 4-moving mean smoothing When it comes to even numbers. This is done by taking two smoothed values beside one another and smoothing those two values. In other words. This problem is solved by using a process called centring. Note: Moving means smoothing is not limited to 3. » Moving Medians Median smoothing is very similar to moving means smoothing. These numbers have been chosen specifically for the sake of explaining the process of smoothing time series data. however the median of the points is taken instead of the average. the moving mean smoothing process is done twice. the centre of the set of points is not a point belonging to the original series.

.2. seasons. which are shown in the table below. convert the seasonalised index into a percentage (note: this is an optional step). this means that February unemployment figures tend to be 20% higher than the monthly average. the average seasonal index is 100%.8 means that the season is 20% below the average of the seasons Example: Mikki runs a shop and she wishes to determine quarterly seasonal indices based on her last year’s sales. The seasons 1. subtracting 1 will also give you the answer. it helps remove the seasonal component). say for example.e.3 means that the season is 30% above the average of the seasons o A seasonal index of 1. Therefore. If we obtained a negative answer.Deseasonalised data helps show the trend in the series more clearly and the individual months that are different from the usual seasonal pattern (i. Write the formula in terms of quarters: 14 . The seasonal index is defined by: are quarters.0 means that the season is equal to the average of the seasons o A seasonal index of 0.g. Summe r 920 Autumn 1085 Winter 1241 Spring 446 . Remember. Example: The seasonal index for unemployment for the month of February is 1. Alternatively. if the percentage conversion is not carried out. the seasonal indices add to 12). » Interpreting Seasonal Indices To interpret a seasonal index. for example: o A seasonal index of 1. Once this value is obtained.Further Mathematics – Core Material Notes Note: . this would mean that unemployment figures for that month tend to be 10% Seasonal indices can also lower be used tothe comment the relationship between than monthly on average.The sum of the seasonal indices equals the number of seasons (e. . subtract 100% from it. if the seasons are months. we deseasonalise when there is a seasonal component/when there is a petition for every season.Therefore. .

2. and so on. 2 and 3. Check that the seasonal indices sum to 4 (the number of seasons). Calculate the seasonal indices for Years 1. 4.483 » Steps in calculating seasonal indices for several years’ data 1.Further Mathematics – Core Material Notes 2. 2. Seasonal Indices Summe Autumn r 0. and 3. Fitting a Trend Line and Forecasting » Fitting a trend line » Forecasting » Taking seasonality into account » Making predictions with deseasonalised data 15 . find the average of quarter 1 for year 1. Work out the seasonal index (SI) for each time period. 3.345 Spring 0. You should then have three different sets of seasonal indices (or three tables that represent a each of the different years).g. find the average for quarter 2 for year 1. Average the three sets of seasonal indices at the end to obtain a single set of seasonal indices (e. separately. The slight difference here is due to rounding error. 3. 2.176 Winter 1.997 1. 3 etc. Write out your answers as a table of the seasonal indices. In the end you should have only one table representing the seasonal indices). Find the quarterly average for the year. 5.

Further Mathematics – Core Material Notes CHECK QUESTION 2A OF CHAPTER 7E CAS CALCULATOR TUTORIAL FOR ‘CORE MATERIAL’ Plotting: Five-number summary and Box Plots (and other types of plots) How to find the standard deviation. and mean (and other ones) How to find the r value and r^2 value Least squares regression line Residual Plot Applying transformations to a set of values of a function 16 .