S1 CS - U4 Data Ranges - Frequencies - Shifting

Computational Statistics
Unit -4
S1 - Data Ranges & Frequencies
S2- Shifting
Time series
Time series is a series of data points in which each

data point is associated with a timestamp.
A simple example is the price of a stock in the stock

market at different points of time on a given day.
Another example is the amount of rainfall in a

region at different months of the year.
Time series
The Python standard library includes data types for date and
time data, as well as calendar-related functionality.
The datetime, time, and calendar modules are the main

places to start.
>>>from datetime import datetime

>>>now = datetime.now()
>>>now
OUTPUT:
datetime.datetime(2022, 10, 20, 4, 47, 15, 947391)
>>>now.year, now.month, now.day

(2022, 10, 20)
Time series
Time Series Analysis in Python considers data

collected over time might have some
structure; hence it analyses Time Series data to
extract its valuable characteristics.
Time series
pd.to_datetime(df.value_date)
df.index = df['value_date']
import matplotlib as plt

df.plot(figsize=(15, 6))
DATA RANGE
The range() function returns a sequence of numbers, starting from 0 by
default, and increments by 1
(by default), and stops before a specified number.
Syntax
=======
range(start, stop, step)
Parameter Values
===============
Parameter Description
===================
start Optional. An integer number specifying at which position to start.
Default is 0
stop Required. An integer number specifying at which position to stop (not
included).
step Optional. An integer number specifying the incrementation. Default is 1
DATA RANGE
Examples:
========
Create a sequence of numbers from 0 to 5, and print each item in the sequence:
x = range(6)
for n in x:
print(n)
Create a sequence of numbers from 3 to 19, but increment by 2

instead of 1:
x = range(3, 20, 2)
for n in x:
print(n)
DATA RANGE
Create a sequence of numbers from 5 to 10, and print each item in the
sequence:
x = range(5, 11)
for n in x:
print(n)
DATE RANGE
pandas.date_range is responsible for generating a DatetimeIndex
with an indicated length according to a particular frequency
index = pd.date_range('4/1/2012', '5/1/2012')

index
DatetimeIndex(['2012-04-01', '2012-04-02', '2012-04-03', '2012-04-04', '2012-04-05',
'2012-04-06', '2012-04-07', '2012-04-08', '2012-04-09', '2012-04-10', '2012-04-11',
'2012-04-12', '2012-04-13', '2012-04-14', '2012-04-15', '2012-04-16', '2012-04-17',
'2012-04-18', '2012-04-19', '2012-04-20', '2012-04-21', '2012-04-22', '2012-04-23',
'2012-04-24', '2012-04-25', '2012-04-26', '2012-04-27', '2012-04-28', '2012-04-29',
'2012-04-30', '2012-05-01'], dtype='datetime64[ns]', freq='D')
DATE RANGE
By default, date_range generates daily timestamps. If you pass only a start or end date,
you must pass a number of periods to generate
pd.date_range(start='4/1/2012', periods=10)
DatetimeIndex(['2012-04-01', '2012-04-02', '2012-04-03', '2012-04-04', '2012-04-05',
'2012-04-06', '2012-04-07', '2012-04-08', '2012-04-09', '2012-04-10'],
dtype='datetime64[ns]', freq='D')
import pandas as pd
pd.date_range(end='6/1/2012', periods=20)
DatetimeIndex(['2012-05-13', '2012-05-14', '2012-05-15', '2012-05-16', '2012-05-17',

'2012-05-18', '2012-05-19', '2012-05-20', '2012-05-21', '2012-05-22', '2012-05-23',
'2012-05-24', '2012-05-25', '2012-05-26', '2012-05-27', '2012-05-28', '2012-05-29',
'2012-05-30', '2012-05-31', '2012-06-01'], dtype='datetime64[ns]', freq='D')
date_range by default preserves the time (if any) of the start or end timestamp:
Frequencies {' ': 13,
',': 2,
Given a string, the task is to find the frequencies of all the characters in
'D': 1,
that string and return a dictionary with key as the character and
its value as its frequency in the given string. 'a': 7,
# initializing string 'c': 6,
'd': 4,
str1 = "Data science is an interdisciplinary field that uses sci 'e': 10,
entific methods, processes, algorithms and systems" 'f': 2,
# using naive method to get count of each element in strin 'g': 1,
g 'h': 3,
freq = {} 'i': 11,
'l': 3,
for i in str1: 'm': 3,
if i in freq: 'n': 6,
freq[i]+= 1 'o': 3,
else: 'p': 2,
freq[i] = 1 'r': 4,
's': 14,
't': 8,
# printing result 'u': 1,
dict(sorted(freq.items())) 'y': 2}
Frequencies
get() method is used to check the previously occurring character in string, if
its new, it assigns 0 as initial and appends 1 to it, else appends 1 to
previously holded value of that element in dictionary.
str1="good day, sunny day, good pl

ayer, excellent player"
counts = dict()
words = str1.split()
{'good': 2, 'day,': 2,
for word in words: 'sunny': 1, 'player,': 1,
'excellent': 1, 'player': 1}
if word in counts:
counts[word] += 1
else:
counts[word] = 1
print(counts)
Frequencies
Frequencies in pandas are composed of a base frequency and a multiplier.
Base frequencies are typically referred to by a string alias, like 'M‘ for monthly or 'H‘ for
hourly.
from pandas.tseries.offsets import Hour, Mi

nute
hour = Hour(4)
hour
Output:
<4 * Hours>
Frequencies
Frequencies in pandas are composed of a base frequency and a multiplier.
Base frequencies are typically referred to by a string alias, like 'M‘ for monthly or 'H‘
for hourly.
Putting an integer before the base frequency creates a multiple:
import pandas as pd
pd.date_range('1/1/2000', '1/3/2000 23:59', freq='4h')
DatetimeIndex(['2000-01-01 00:00:00', '2000-01-01 04:00:00',
'2000-01-01 08:00:00', '2000-01-01 12:00:00',
'2000-01-01 16:00:00', '2000-01-01 20:00:00',
'2000-01-02 00:00:00', '2000-01-02 04:00:00',
'2000-01-02 08:00:00', '2000-01-02 12:00:00',
'2000-01-02 16:00:00', '2000-01-02 20:00:00',
'2000-01-03 00:00:00', '2000-01-03 04:00:00',
'2000-01-03 08:00:00', '2000-01-03 12:00:00',
'2000-01-03 16:00:00', '2000-01-03 20:00:00'],
dtype='datetime64[ns]', freq='4H')
Frequencies and Date Offsets
pd.date_range('1/1/2000', periods=10, freq='1h30min')
DatetimeIndex(['2000-01-01 00:00:00',
'2000-01-01 01:30:00', '2000-01-01 03:00:00',
'2000-01-01 04:30:00', '2000-01-01 06:00:00',
'2000-01-01 07:30:00', '2000-01-01 09:00:00',
'2000-01-01 10:30:00', '2000-01-01 12:00:00',
'2000-01-01 13:30:00'],
dtype='datetime64[ns]', freq='90T')
Week of month dates
One useful frequency class is “week of month”, starting with WOM. This enables you
to get dates like the third Friday of each month:
rng = pd.date_range('1/1/2012', '9/1/2012', freq='WOM-3FRI')

rng
DatetimeIndex(['2012-01-20', '2012-02-17',
'2012-03-16', '2012-04-20', '2012-05-18', '2012-06-15',
'2012-07-20', '2012-08-17'],
dtype='datetime64[ns]', freq='WOM-3FRI')
Shifting Data
The bitwise left shift operator in Python shifts the bits of the binary representation of the
input number to the left side by a specified number of places.
The empty bits created by shifting the bits are filled by 0s. The syntax for the bitwise left shift
is a << n
Shifting Data
# Right Shift # Left Shift

a = 10 a = 10
a >> 1 a << 2
OUTPUT OUTPUT
5 40
Shifting (Leading and Lagging) Data
“Shifting” refers to moving data backward and forward through time. Both
Series and DataFrame have a shift method for doing naive shifts forward or
backward, leaving the index unmodified
import pandas as pd
import numpy as np
ts = pd.Series(np.random.randn(4),
index=pd.date_range('1/1/2000', periods=4, freq='M'))
ts
2000-01-31 0.203365
2000-02-29 -0.725373
2000-03-31 -1.002077
2000-04-30 -0.511180
Freq: M, dtype: float64
A common use of shift is computing percent changes in a time
series or multiple time series as DataFrame columns.
print(ts)
print(ts.shift(2))
2000-01-31 -0.466083
2000-02-29 -0.951154
2000-03-31 1.473405
2000-04-30 -0.771737
2000-01-31 NaN
2000-02-29 NaN
2000-03-31 -0.466083
2000-04-30 -0.951154
print(ts.shift(2, freq='M'))
2000-03-31 -1.483250
2000-04-30 0.651565
2000-05-31 -1.150464
2000-06-30 -0.521721
Shifting dates with offsets

The pandas date offsets can also be used with datetime or Timestamp
objects
from pandas.tseries.offsets import Day, MonthEnd

now = pd.datetime(2011, 11, 17)
now + 3 * Day()
Timestamp('2011-11-20 00:00:00')
Thank

S1 CS - U4 Data Ranges - Frequencies - Shifting

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

S1 CS - U4 Data Ranges - Frequencies - Shifting

Uploaded by

Copyright:

Available Formats

Computational Statistics

Time series is a series of data points in which each

A simple example is the price of a stock in the stock

Another example is the amount of rainfall in a

The datetime, time, and calendar modules are the main

>>>from datetime import datetime

>>>now.year, now.month, now.day

Time Series Analysis in Python considers data

import matplotlib as plt

Create a sequence of numbers from 3 to 19, but increment by 2

index = pd.date_range('4/1/2012', '5/1/2012')

DatetimeIndex(['2012-05-13', '2012-05-14', '2012-05-15', '2012-05-16', '2012-05-17',

str1="good day, sunny day, good pl

from pandas.tseries.offsets import Hour, Mi

pd.date_range('1/1/2000', periods=10, freq='1h30min')

rng = pd.date_range('1/1/2012', '9/1/2012', freq='WOM-3FRI')

# Right Shift # Left Shift

Shifting dates with offsets

from pandas.tseries.offsets import Day, MonthEnd

You might also like