Neural Network With Tensor Flow

DATA Sets
for Machine Learning

scikit-learn
Machine Learning in Python
● Simple and efficient tools for data mining and
data analysis
● Accessible to everybody, and reusable in
various contexts
● Built on NumPy, SciPy, and matplotlib
● Open source, commercially usable - BSD
license
●
Scikit-learn Dataset loading utilities
http://scikit-learn.org/stable/user_guide.html
5.1. General dataset API 5.10. The Labeled Faces in the Wild face recognition dataset
5.2. Toy datasets 5.11. Forest covertypes
5.3. Sample images 5.12. RCV1 dataset
5.4. Sample generators 5.13. Boston House Prices dataset
5.5. Datasets in svmlight / libsvm format 5.14. Breast Cancer Wisconsin (Diagnostic) Database
5.6. Loading from external datasets 5.15. Diabetes dataset

5.16. Optical Recognition of Handwritten Digits Data Set
5.7. The Olivetti faces dataset
5.17. Iris Plants Database
5.8. The 20 newsgroups text dataset
5.18. Linnerrud dataset
5.9. Downloading datasets from the mldata.org
repository
Toy datasets (from SciKit-Learn)
scikit-learn comes with a few small standard datasets that do not require
to download any file from some external website.
load_boston([return_X_y])
Load and return the boston house-prices dataset (regression).
load_iris([return_X_y])
Load and return the iris dataset (classification).
load_diabetes([return_X_y])
Load and return the diabetes dataset (regression).
load_digits([n_class, return_X_y])
Load and return the digits dataset (classification).
load_linnerud([return_X_y])
Load and return the linnerud dataset (multivariate regression).
Loading Data from SK-learn
from sklearn import datasets
iris = datasets.load_iris()
digits = datasets.load_digits()
Sample images
The scikit also embed a couple of sample JPEG
images.
Those image can be useful to test algorithms and
pipeline on 2D data.
from sklearn.datasets import load_sample_image
load_sample_images()
Load sample images for image manipulation.
load_sample_image(image_name)
Load the numpy array of a single sample image
hand-written digits Dataset
Number of Instances: 5620
Number of Attributes: 64
Attribute Information:
8x8 image of integer pixels
in the range 0..16. (i.e 16 gray levels only)
hand-written digits Dataset
Loading Input data and target labels
digits.data gives access to the features that can be
used to classify the digits samples:
>>> print(digits.data)
[[ 0. 0. 5. ..., 0. 0. 0.]
[ 0. 0. 0. ..., 10. 0. 0.]
[ 0. 0. 0. ..., 16. 9. 0.]
...,
[ 0. 0. 1. ..., 6. 0. 0.]
[ 0. 0. 2. ..., 12. 0. 0.]
[ 0. 0. 10. ..., 12. 1. 0.]]
Loading Input data and target labels
digits.target gives the ground truth for the digit

dataset, that is the number corresponding to each
digit image that we are trying to learn:
>>> digits.target
array([0, 1, 2, ..., 8, 9, 8])
Shape of the data arrays
The data is a 2D array, shape (n_samples, n_features).
In the case of the digits, each original sample is an
image of shape (8, 8) and can be accessed using:
>>> digits.images[0]
array([[ 0., 0., 5., 13., 9., 1., 0., 0.],
[ 0., 0., 13., 15., 10., 15., 5., 0.],
[ 0., 3., 15., 2., 0., 11., 8., 0.],
[ 0., 4., 12., 0., 0., 8., 8., 0.],
[ 0., 5., 8., 0., 0., 9., 8., 0.],
[ 0., 4., 11., 0., 1., 12., 7., 0.],
[ 0., 2., 14., 5., 10., 12., 0., 0.],
[ 0., 0., 6., 13., 10., 0., 0., 0.]])
Iris Plants Database
Data Set Characteristics:
Number of Instances: 150 (50 in each of three classes)
Number of Attributes: 4 numeric, predictive attributes and the class
Attribute Information:
sepal length in cm
sepal width in cm
petal length in cm
petal width in cm
class:
Iris-Setosa
Iris-Versicolour
Iris-Virginica
Class Distribution: 33.3% for each of 3 classes.
sepal length sepal width petal length petal width Class
in cm in cm in cm in cm
4.5,2.3,1.3,0.3,Iris-setosa
4.4,3.2,1.3,0.2,Iris-setosa
5.0,3.5,1.6,0.6,Iris-setosa
5.1,3.8,1.9,0.4,Iris-setosa
7.0,3.2,4.7,1.4,Iris-versicolor
6.7,2.5,5.8,1.8,Iris-virginica
Important Notes
Before start Programming
Variables initialization with
Random Values
tf.random_normal(shape, mean=0.0, stddev=1.0, dtype=tf.float32,
seed=None, name=None)
Outputs random values from a normal distribution.
tf.truncated_normal(shape, mean=0.0, stddev=1.0, dtype=tf.float32,

seed=None, name=None)
The generated values follow a normal distribution with specified
mean and standard deviation, except that values whose magnitude is
more than (2 * standard deviations ) from the mean are dropped
and re-picked.
Variables initialization with
Random Values
tf.random_uniform(shape, minval=0, maxval=None,
dtype=tf.float32, seed=None, name=None)
dtype: The type of the output: float32, float64, int32, or int64

For floats, the default range is [0, 1).
For ints, at least maxval must be specified explicitly.
The generated values follow a uniform distribution in the range

[minval, maxval). The lower bound minval is included in the
range, while the upper bound maxval is excluded.
Generation of same repeatable
sequence
To generate the same repeatable sequence for an op across sessions, set the
seed for the op:
a = tf.random_uniform([1], seed=1)
b = tf.random_normal([1])
# Repeatedly running this block with the same graph will generate the same
# sequence of values for 'a', but different sequences of values for 'b'.
print("Session 1")
with tf.Session() as sess1: Output
print(sess1.run(a)) # generates 'A1' Session 1
print(sess1.run(a)) # generates 'A2'
[ 0.23903739]
print(sess1.run(b)) # generates 'B1'
[ 0.22267115]
[-0.56301004]
[-0.97901398]
print("Session 2")
with tf.Session() as sess2: Session 2
print(sess2.run(a)) # generates 'A1' [ 0.23903739]
print(sess2.run(b)) # generates 'B3' [ 1.26448703]
print(sess2.run(b)) # generates 'B4' [-0.76988888]
Generation of same repeatable
sequence
To make the random sequences generated by all ops be repeatable across sessions, set a graph-level seed:
tf.set_random_seed(1234)
a = tf.random_uniform([1])
b = tf.random_normal([1])
# Repeatedly running this block with the same graph will generate different
# sequences of 'a' and 'b'.print("Session 1")
with tf.Session() as sess1:
print(sess1.run(a)) # generates 'A1' Output
print(sess1.run(a)) # generates 'A2'
Session 1
[ 0.93559742]
[ 0.87699151]
[ 2.46717691]
print("Session 2") [ 1.58331776]
with tf.Session() as sess2:
print(sess2.run(a)) # generates 'A1' Session 2
[ 0.87699151]
print(sess2.run(b)) # generates 'B4' [ 2.46717691]
[ 1.58331776]
Randomly shuffles a tensor along its
first dimension.
tf.random_shuffle(value, seed=None, name=None)
Output
>>> import numpy as np Shuffle 1
array([[1, 2],
>>> import tensorflow as tf [5, 6],
[3, 4]], dtype=int32)
>>> sess=tf.Session() Shuffle 2
>>> c = tf.constant([[1,2,3], [ 4,5,6], [7,8,9]]) array([[1, 2],
[3, 4],
>>> shuff = tf.random_shuffle(c) [5, 6]], dtype=int32)
Shuffle 3
>>> sess.run(shuff) array([[5, 6],
[3, 4],
>>> sess.run(shuff) [1, 2]], dtype=int32)
>>> sess.run(shuff)
Split Data into
Training and Testing
train_test_split(arrays, options)
Arrays
lists, numpy arrays, and scipy-sparse matrices are allowed
Options
test_size : float, int, or None (default is None)
train_size : float, int, or None (default is None)
If float, should be between 0.0 and 1.0 and represent the proportion of the dataset to include in the
test/train split.
If int, represents the absolute number of test samples.
If None, the value is automatically set to the complement of the train/test size.
Both train and test size are None, test size is set to 0.25.
random_state : int value (==seed of random number generator).

If None, output will differ each time.
Split Data into
Training and Testing
from sklearn.model_selection import train_test_split
X, y = np.arange(24).reshape((8, 3)), range(8)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, random_state=42)
>>> X
Output
>>> y
Input Training Data
array([[ 0, 1, 2], array([[21, 22, 23],
>>> X_train [ 3, 4, 5], [ 6, 7, 8],
[ 6, 7, 8],
[ 9, 10, 11], [12, 13, 14],
>>> y_train [12, 13, 14], [ 9, 10, 11],
[15, 16, 17], [18, 19, 20]])
[18, 19, 20],
>>> X_test [21, 22, 23]])
[7, 2, 4, 3, 6]
>>> y_test
[0, 1, 2, 3, 4, 5, 6, 7]
Test Data
array([[ 3, 4, 5],
if repeated you will get the same: [15, 16, 17],
X_train and X_test samples and in same order [ 0, 1, 2]])
(as the random state is fixed)
to let it random every run don’t assign this field [1, 5, 0]
Conversion from 1-D Target vector to
ONE-HOT vector
Traditional Target vector (Labels) {assume three classes 0,1,2}
Target=
0
0
0
1
0
2
2
ONE-HOT target=
1 0 0
1 0 0
1 0 0
0 1 0
1 0 0
0 0 1
0 0 1
Conversion from 1-D Target vector to
ONE-HOT vector
to convert from target to one-hot
p=eye(3) Input Output
p= Target
0 1 0 0
1 0 0 0 1 0 0
0 1 0 0 1 0 0
0 0 1 1 0 1 0
0 1 0 0
0 0 1
Note: 2
0 0 1
2
P[0]= 1 0 0
p[1]= 0 1 0
P[2]= 0 0 1
Ont_hot=p[target]
build Neural Network
Steps to build Neural Network Example
1) get data set

2) read data set as Inputs and Targets,
3) Shape inputs as l=m*n array where m is number of samples, n length
of feature vector
4) Shape Targets as one-HOT vectors . Each target is Q*1 where Q is
number of classes. All values are “zero” except value corresponding
to proper class equal “one”
5) Select RANDOMLY percentage of data set samples to be training
data and the rest as test data
6) Define architecture of Neural Network
7) initialize random weights for NN (Uniform/ Normal, define min&max for
Uniform, mean&SD for normal) define Randomness of seed
(Operation based or graph based)
Neural Network
Define Imports library
import tensorflow as tf
import numpy as np
from sklearn import datasets
from sklearn.model_selection import train_test_split

Weight Initialization Function
def init_weights(shape):
""" Weight initialization """
weights = tf.random_normal(shape, stddev=0.1)
return tf.Variable(weights)
Define the Shape (dimension of the Weight matrix)

This function will return Tensorflow variable
Initialized using normal distribution
tf.set_random_seed(1234)
Remember to use this function(outside fn definition)
Read Data from Dataset
def get_iris_data():
""" Read the iris data set and split them into training and test sets
"""
iris = datasets.load_iris()
data = iris["data"]
target = iris["target"]
Remember:
● Data and target still needs to split into training
and testing samples

● Target is NOT a one-HOT (requires conversion
Add Bias to Input Data
Convert target to one-hot
# Prepend the column of 1s for bias
N, M = data.shape
all_X = np.ones((N, M + 1))
all_X[:, 1:] = data
# Convert into one-hot vectors

num_labels = len(np.unique(target))
all_Y = np.eye(num_labels)[target]
return train_test_split(all_X, all_Y, test_size=0.33, random_state=1234)
Notes:
● Use Fixed random generator in order to get same
training and testing samples each run of the program

● 2/3 data for training and 1/3 data for testing
● all_X, all_Y, returned variable as np variables (not
tensorflow varibles i.e. required place holder)

Feed forward
to Calculate Outputs
def forwardprop(X, w_1, w_2):
h = tf.nn.sigmoid(tf.matmul(X, w_1)) # The \sigma function

yhat = tf.matmul(h, w_2)
return yhat
Notes:
yhat is not softmax since TensorFlow's softmax_cross_entropy_with_logits()
does that internally.
“h” is the output of Hidden layer (calculated using W_1 * X input) the
squashed using Sigmoid activation function
“yhat” if the output of O/P layer (calculated using W_2 * h)
NN Architecture
x_size = train_X.shape[1] # Number of input nodes: 4 features and 1 bias
h_size = 256 # Number of hidden nodes
y_size = train_y.shape[1] # Number of outcomes (3 iris flowers)
# Symbols
X = tf.placeholder("float", shape=[None, x_size])
y = tf.placeholder("float", shape=[None, y_size])
Notes:
● As train_X, train_Y, test_X, test_Y are Numpy variables, there should be
place holder to intercept their values and pass them to tensorflow operations.
● W_1 size should be x_size * h_size
● W_2 size should be h_size * y_size
Backward Propagation
(Training)
cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(yhat, y))
updates = tf.train.GradientDescentOptimizer(0.01).minimize(cost)
Notes:
● Cost is calculate by performing “Softmax” on “yhat” before calculating the
LOSS. There was no need to define “Softmax” in output layer.
● There are different ways to calculate loss (other than cross_entropy)
● There are different ways to minimize loss other than Gradient Descent.
● {refer to Tensorflow documentation fr other methods}
Calculating Output Accurecy
predict = tf.argmax(yhat, dimension=1)
train_accuracy = np.mean(np.argmax(train_y, axis=1) ==

sess.run(predict, feed_dict={X: train_X, y: train_y}))
test_accuracy = np.mean(np.argmax(test_y, axis=1) ==
sess.run(predict, feed_dict={X: test_X, y: test_y}))
Remember:
● “yhat” is one-HOT output, however, train_y and test_y labels are not.
● Perdict whould convert “yhat” one-HOT into traditional labels
● “Predict” requires “yhat” which required train_X and train_Y ot Test_X and
Test_y. All are not tensorflow variable, it is required to use feed_dict with
place holders.
Epochs
predict = tf.argmax(yhat, dimension=1)
train_accuracy = np.mean(np.argmax(train_y, axis=1) ==

sess.run(predict, feed_dict={X: train_X, y: train_y}))
test_accuracy = np.mean(np.argmax(test_y, axis=1) ==
sess.run(predict, feed_dict={X: test_X, y: test_y}))
Remember:
● Each epoch required to UPDATE weights “number of training samples”
time. Then calculate Accurecy “ONCE”
● For better results system should iterate a suitable number of Epochs. Not too
small (bad training), not too large (memorize instead of training)

Neural Network With Tensor Flow

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Neural Network With Tensor Flow

Uploaded by

Copyright:

Available Formats

DATA Sets

for Machine Learning

5.2. Toy datasets 5.11. Forest covertypes

5.3. Sample images 5.12. RCV1 dataset

5.4. Sample generators 5.13. Boston House Prices dataset

5.6. Loading from external datasets 5.15. Diabetes dataset

digits.target gives the ground truth for the digit

tf.truncated_normal(shape, mean=0.0, stddev=1.0, dtype=tf.float32,

dtype: The type of the output: float32, float64, int32, or int64

The generated values follow a uniform distribution in the range

random_state : int value (==seed of random number generator).

1) get data set

from sklearn import datasets

from sklearn.model_selection import train_test_split

Define the Shape (dimension of the Weight matrix)

and testing samples

# Convert into one-hot vectors

training and testing samples each run of the program

● all_X, all_Y, returned variable as np variables (not

tensorflow varibles i.e. required place holder)

h = tf.nn.sigmoid(tf.matmul(X, w_1)) # The \sigma function

train_accuracy = np.mean(np.argmax(train_y, axis=1) ==

train_accuracy = np.mean(np.argmax(train_y, axis=1) ==

You might also like