You are on page 1of 8

Lesson outline

In this lesson you will learn how to read data, select subsets of it and generate useful plots,
using pandas and matplotlib. The documentation links below are for your reference.
 Read stock data from CSV files:
 pandas.DataFrame
 pandas.read_csv
 Select desired rows and columns:
 Indexing and Slicing Data
 Gotchas: Label-based slicing conventions
 Visualize data by generating plots:
 Plotting
 pandas.DataFrame.plot
 matplotlib.pyplot.plot

Hit Next to continue with the lesson.


Lesson outline
Here's an overview of what you'll learn to do in this lesson. Documentation links are for
reference.

Read in multiple stocks:


 Create an empty pandas.DataFrame with dates as index: pandas.date_range
 Drop missing date rows: pandas.DataFrame.dropna
 Incrementally join data for each stock: pandas.DataFrame.join

Manipulate stock data:


 Index and select data by row (dates) and column (symbols)
 Plot multiple stocks at once (still using pandas.DataFrame.plot)
 Carry out arithmetic operations across stocks
Hit Next to continue.
How to load:
 Have target df
 Read each company stock data individually. SPY is reference for many things. Eg. It
indicates when the market traded….
 We do a join on the dates….in the df which is our target…and do a join with SPY and
other stocks…
 Series of joins will give us a subset with data for all dates…


Lesson summary

To read multiple stocks into a single dataframe, you need to:


 Specify a set of dates using pandas.date_range
 Create an empty dataframe with dates as index
 This helps align stock data and orders it by trading date
 Read in a reference stock (here SPY) and drop non-trading days
using pandas.DataFrame.dropna
 Incrementally join dataframes using pandas.DataFrame.join

Once you have multiple stocks, you can:


 Select a subset of stocks by ticker symbols
 Slice by row (dates) and column (symbols)
 Plot multiple stocks at once (still using pandas.DataFrame.plot)
 Carry out arithmetic operations across stocks, e.g. normalize by the first day's price
Hit Next to continue.

Numpy:
 numpy.ndarray.shape: Dimensions (height, width, ...)
 numpy.ndarray.ndim: No. of dimensions = len(shape)
 numpy.ndarray.size: Total number of elements
 numpy.ndarray.dtype: Datatype

 numpy.sum: Sum of elements - along rows, columns or all


 numpy.min, numpy.max, numpy.mean: Simple statistics
Also: numpy.random.seed to (re)set the random number generator.

Axis = 1 is for by-rows, and axis=0 is for by-columns


.mean() -

Documentation:
 time.time: Current time in seconds (float value)
 timeit: Average execution time measurement
 profile: Code profiling

iPython "magics":
 %time: How long does it take to run once
 %timeit: Averaged over multiple runs
 %prun/%lprun: Per-function/line profiling

Access numpy element by [row, col] – indexing starts at 0


You can access array indices using an array…as above

You can also access the values that meet a certain condition…

Arithmetic operations on arrays are element-wise. For matrix operations use relavant
functions..
 numpy.add: Element-wise addition, same as + operator
 numpy.subtract: Element-wise subtraction, same as -
 numpy.multiply: Element-wise multiplication, same as *
 numpy.divide: Element-wise division, same as /
 numpy.dot: Dot product (1D arrays), matrix multiplication (2D)
Note: Arrays need to be compatible with each other for these operations to work
(see: Broadcasting).
For more matrix operations, see: Linear algebra and the matrix class.

You might also like