You are on page 1of 3

coreplot cborn for heat map correlation

kde box to achieve p density fn-non parametric


Normalization factor no of points for all xn kernal sigma /h densotu estomate
for expectation value convolution 1/h integral k.pxdash dxdash
shape of distribution
gaussian 2pi tranpose square
sum of gaussian surrounding each datapoints
h bandwidth for kde
univariate bandwith changing
mise e integral x-x whole sqrd dx
amise no integral bias sqrd + variance +epsiolonsqrd
kl divergence between 2 pdf log fx dx integral asymmetric
scotts rule sigma of covariance matrix of d dimensions of data 1/n pow d+4 sigma to
half
silvermans rule of thumb stdandard deviation/3n pow 1/5
to find optimal bandwidth
gaussian kernel for kde
____________
multivariate jointplots for x and xbar for two other variables.
clusters found in eg joint kde
to use regression
pattrn classification using bayes
______________________________
NLP 2 libraries coursera vinod vidyanadhan
text entailment 2 sentence relation wordnet
lin similiarity trees
mod auxillary can cannot pen tagset method
linsim is 2logprbltylcsofuv/sumoflogpuandpv
.snet.noun is n.01 1st meanign
linsim withcluster unlike pathsim
sim fns for collocation and freq filter
nltk library
porter stemmer
lemmatize
tokenize
_______
for topic probablitic latent semantic analysis
latent dirchlet allocation lda text mining harry potter text clustering problem
lda
tokenize all lowercase
stp wor removal using tokenization
stemming root same what we saw
after processing uses dictionary data structure
named entity recognition
type tag classify boundary detection qn asking

_____________________________
naive bayes in text mining
features labels
python cs
prior problity cs likelihood of python/evidence python
laplace smoothing by adding 1 and n dummies
multivariant bias 2binary bernoulli bias
svm axis classifiers support vector
support vector classifiers using skicit svm
or skicit micro/macro averaging
____________________________________
auto encoders dimension reduction in neck
lstm input output circuits long short term with feedback memory`````
_______________________________________________
data cleaning linear distance 3 type taxicab
nonlinear kernel ann neural nets
feature selecton sampling interval/60 distance sampled over time
unique numpy driverids
scatterplot of sampling intervals
visually choose 60 mins time series
and 4 out of 24 h4r features
next histogram with frequency along y axisprevi y ax in now x axs
next forperson id histogram
for distinguishing distinct traces
which feature has more discriminaing power
principal component analysis
feature scaling
not more points
_____
How to distinguish patterns indata in uni or 2d data
in sapling time /freq histogram make sctter plotsctrplot.
output has matplotlib.
scatter plot with restricted region
scatter plot with peoples location pattern
using person event with driver id person
for every person
if two separate clusters bottu up down scaling
use location patterns to build model
distance as features person as classes
find event group using euclidean distance
________
validation
risk and loss functios
square/abs/error/lp lss
kulbakc leiber loss integate log fxtheta/fxthetacap
risk integaral expected theta ad the cap
whre thetais true valuof apaametr
and theta cap esimated value of paameter
feature variance bias complexity
based on feature cost of traning and test set
is based on mdel complxty of feaures
cost is p() test error le traing error+ a formula with quantity of
complexity,number of features is test error within training error box within upper
bound of box which 1-efficiency
linear classifiers h complexity = featurs no of +1
confusion matrix true and false positiveand negative actual - predicted + false +
___________________
pandas apply map dataframe cleanset
df.apply 1 to 1 square kattam
df + series added value or nan
___________________
SENTIMENT ANALYSIS
rating no value remove with true inplace dropna
remove neutral 3 rating
condition for positive rating
find mean
split to traintest
train first entry and shape is length
cnt vectorizer word to token count
fit training set to vectorizer
feature training set
get feature names
transform to vectorized training set
use logistic regression on vectorized training set
test not in train ignored by fitting to training data
vector transform x test predict a model to rocscore
get array of vectors feature names and sort arguments
prit best features with array index + postve 10
-ve 11 1 largest coefs gives good amazing
tfidf index auc score
phone working eg.,
transform vectorizer log regression model.predict vector transform
binary output
\cmpile
summary
evaluate model.predict by word2vec

________________
anomaly detection time series folder
apache system ml scaling through keras shape
one hot indexing
extract to folder
spark.read.csv for apache spark dataframe
data frame as pequet files to objectstar
right dummy.csv in ibm datascience experiene
dataplatform.ibm.com
on 100 samples future sample
decreased data set
sample updater
gradoent descent
auto encoder
filter by file id from matlab file
row object
sequence arry
neural network

watson has github


___________________________

autoencoder
sequence padding and append for preceding and succeeding neighbours
onehot label enc transform reshape and shape
38 word pairs with 2 dimensions outted
succedding 1hotencode output contains 50 sparse matrix value
fit dimensionality of encoded word to sparse vector matrix representation befitting
neural network
one value among 50 is 1
bottleneck using neighbou by relu activation through 2 and add 50 dimensions
original through softmax
training bottleneck layer is low dimensional embedding word to vec by model.compile
and summary and nd fit a
sequentially use 2d again neural network predict
calculate vectors for every word
check code

You might also like