You are on page 1of 24

Computational Statistics

Unit -4
S1 - Data Ranges & Frequencies
S2- Shifting
Time series

Time series is a series of data points in which each


data point is associated with a timestamp.

A simple example is the price of a stock in the stock


market at different points of time on a given day.

Another example is the amount of rainfall in a


region at different months of the year.
Time series
The Python standard library includes data types for date and
time data, as well as calendar-related functionality.

The datetime, time, and calendar modules are the main


places to start.

>>>from datetime import datetime


>>>now = datetime.now()
>>>now

OUTPUT:
datetime.datetime(2022, 10, 20, 4, 47, 15, 947391)

>>>now.year, now.month, now.day


(2022, 10, 20)
Time series

Time Series Analysis in Python considers data


collected over time might have some
structure; hence it analyses Time Series data to
extract its valuable characteristics.
Time series
pd.to_datetime(df.value_date)
df.index = df['value_date']

import matplotlib as plt


df.plot(figsize=(15, 6))
DATA RANGE
The range() function returns a sequence of numbers, starting from 0 by
default, and increments by 1
(by default), and stops before a specified number.

Syntax
=======
range(start, stop, step)

Parameter Values
===============
Parameter Description
===================
start Optional. An integer number specifying at which position to start.
Default is 0
stop Required. An integer number specifying at which position to stop (not
included).
step Optional. An integer number specifying the incrementation. Default is 1
DATA RANGE
Examples:
========
Create a sequence of numbers from 0 to 5, and print each item in the sequence:

x = range(6)
for n in x:
print(n)

Create a sequence of numbers from 3 to 19, but increment by 2


instead of 1:

x = range(3, 20, 2)
for n in x:
print(n)
DATA RANGE
Create a sequence of numbers from 5 to 10, and print each item in the
sequence:

x = range(5, 11)
for n in x:
print(n)
DATE RANGE
pandas.date_range is responsible for generating a DatetimeIndex
with an indicated length according to a particular frequency

index = pd.date_range('4/1/2012', '5/1/2012')


index
DatetimeIndex(['2012-04-01', '2012-04-02', '2012-04-03', '2012-04-04', '2012-04-05',
'2012-04-06', '2012-04-07', '2012-04-08', '2012-04-09', '2012-04-10', '2012-04-11',
'2012-04-12', '2012-04-13', '2012-04-14', '2012-04-15', '2012-04-16', '2012-04-17',
'2012-04-18', '2012-04-19', '2012-04-20', '2012-04-21', '2012-04-22', '2012-04-23',
'2012-04-24', '2012-04-25', '2012-04-26', '2012-04-27', '2012-04-28', '2012-04-29',
'2012-04-30', '2012-05-01'], dtype='datetime64[ns]', freq='D')
DATE RANGE
By default, date_range generates daily timestamps. If you pass only a start or end date,
you must pass a number of periods to generate

pd.date_range(start='4/1/2012', periods=10)
DatetimeIndex(['2012-04-01', '2012-04-02', '2012-04-03', '2012-04-04', '2012-04-05',
'2012-04-06', '2012-04-07', '2012-04-08', '2012-04-09', '2012-04-10'],
dtype='datetime64[ns]', freq='D')

import pandas as pd
pd.date_range(end='6/1/2012', periods=20)

DatetimeIndex(['2012-05-13', '2012-05-14', '2012-05-15', '2012-05-16', '2012-05-17',


'2012-05-18', '2012-05-19', '2012-05-20', '2012-05-21', '2012-05-22', '2012-05-23',
'2012-05-24', '2012-05-25', '2012-05-26', '2012-05-27', '2012-05-28', '2012-05-29',
'2012-05-30', '2012-05-31', '2012-06-01'], dtype='datetime64[ns]', freq='D')

date_range by default preserves the time (if any) of the start or end timestamp:
Frequencies {' ': 13,
',': 2,
Given a string, the task is to find the frequencies of all the characters in
'D': 1,
that string and return a dictionary with key as the character and
its value as its frequency in the given string. 'a': 7,
# initializing string 'c': 6,
'd': 4,
str1 = "Data science is an interdisciplinary field that uses sci 'e': 10,
entific methods, processes, algorithms and systems" 'f': 2,
# using naive method to get count of each element in strin 'g': 1,
g 'h': 3,
freq = {} 'i': 11,
'l': 3,
for i in str1: 'm': 3,
if i in freq: 'n': 6,
freq[i]+= 1 'o': 3,
else: 'p': 2,
freq[i] = 1 'r': 4,
's': 14,
't': 8,
# printing result 'u': 1,
dict(sorted(freq.items())) 'y': 2}
Frequencies
get() method is used to check the previously occurring character in string, if
its new, it assigns 0 as initial and appends 1 to it, else appends 1 to
previously holded value of that element in dictionary.

str1="good day, sunny day, good pl


ayer, excellent player"
counts = dict()
words = str1.split()
{'good': 2, 'day,': 2,
for word in words: 'sunny': 1, 'player,': 1,
'excellent': 1, 'player': 1}
if word in counts:
counts[word] += 1
else:
counts[word] = 1
print(counts)
Frequencies
Frequencies in pandas are composed of a base frequency and a multiplier.
Base frequencies are typically referred to by a string alias, like 'M‘ for monthly or 'H‘ for
hourly.

from pandas.tseries.offsets import Hour, Mi


nute
hour = Hour(4)
hour

Output:
<4 * Hours>
Frequencies
Frequencies in pandas are composed of a base frequency and a multiplier.
Base frequencies are typically referred to by a string alias, like 'M‘ for monthly or 'H‘
for hourly.
Putting an integer before the base frequency creates a multiple:

import pandas as pd
pd.date_range('1/1/2000', '1/3/2000 23:59', freq='4h')
DatetimeIndex(['2000-01-01 00:00:00', '2000-01-01 04:00:00',
'2000-01-01 08:00:00', '2000-01-01 12:00:00',
'2000-01-01 16:00:00', '2000-01-01 20:00:00',
'2000-01-02 00:00:00', '2000-01-02 04:00:00',
'2000-01-02 08:00:00', '2000-01-02 12:00:00',
'2000-01-02 16:00:00', '2000-01-02 20:00:00',
'2000-01-03 00:00:00', '2000-01-03 04:00:00',
'2000-01-03 08:00:00', '2000-01-03 12:00:00',
'2000-01-03 16:00:00', '2000-01-03 20:00:00'],
dtype='datetime64[ns]', freq='4H')
Frequencies and Date Offsets

pd.date_range('1/1/2000', periods=10, freq='1h30min')

DatetimeIndex(['2000-01-01 00:00:00',
'2000-01-01 01:30:00', '2000-01-01 03:00:00',
'2000-01-01 04:30:00', '2000-01-01 06:00:00',
'2000-01-01 07:30:00', '2000-01-01 09:00:00',
'2000-01-01 10:30:00', '2000-01-01 12:00:00',
'2000-01-01 13:30:00'],
dtype='datetime64[ns]', freq='90T')
Frequencies and Date Offsets
Frequencies and Date Offsets
Week of month dates
One useful frequency class is “week of month”, starting with WOM. This enables you
to get dates like the third Friday of each month:

rng = pd.date_range('1/1/2012', '9/1/2012', freq='WOM-3FRI')


rng

DatetimeIndex(['2012-01-20', '2012-02-17',
'2012-03-16', '2012-04-20', '2012-05-18', '2012-06-15',
'2012-07-20', '2012-08-17'],
dtype='datetime64[ns]', freq='WOM-3FRI')
Shifting Data

The bitwise left shift operator in Python shifts the bits of the binary representation of the
input number to the left side by a specified number of places.
The empty bits created by shifting the bits are filled by 0s. The syntax for the bitwise left shift
is a << n
Shifting Data

# Right Shift # Left Shift


a = 10 a = 10
a >> 1 a << 2

OUTPUT OUTPUT
5 40
Shifting (Leading and Lagging) Data
“Shifting” refers to moving data backward and forward through time. Both
Series and DataFrame have a shift method for doing naive shifts forward or
backward, leaving the index unmodified

import pandas as pd
import numpy as np
ts = pd.Series(np.random.randn(4),
index=pd.date_range('1/1/2000', periods=4, freq='M'))
ts

2000-01-31 0.203365
2000-02-29 -0.725373
2000-03-31 -1.002077
2000-04-30 -0.511180
Freq: M, dtype: float64
Shifting (Leading and Lagging) Data
A common use of shift is computing percent changes in a time
series or multiple time series as DataFrame columns.

ts = pd.Series(np.random.randn(4),
index=pd.date_range('1/1/2000', periods=4, freq='M'))
print(ts)
print(ts.shift(2))

2000-01-31 -0.466083
2000-02-29 -0.951154
2000-03-31 1.473405
2000-04-30 -0.771737
Freq: M, dtype: float64
2000-01-31 NaN
2000-02-29 NaN
2000-03-31 -0.466083
2000-04-30 -0.951154
Freq: M, dtype: float64
Shifting (Leading and Lagging) Data

ts = pd.Series(np.random.randn(4),
index=pd.date_range('1/1/2000', periods=4, freq='M'))
print(ts.shift(2, freq='M'))

2000-03-31 -1.483250
2000-04-30 0.651565
2000-05-31 -1.150464
2000-06-30 -0.521721
Freq: M, dtype: float64
Shifting (Leading and Lagging) Data

Shifting dates with offsets


The pandas date offsets can also be used with datetime or Timestamp
objects

from pandas.tseries.offsets import Day, MonthEnd


now = pd.datetime(2011, 11, 17)
now + 3 * Day()

Timestamp('2011-11-20 00:00:00')
Thank

You might also like