Professional Documents
Culture Documents
_____________________________
naive bayes in text mining
features labels
python cs
prior problity cs likelihood of python/evidence python
laplace smoothing by adding 1 and n dummies
multivariant bias 2binary bernoulli bias
svm axis classifiers support vector
support vector classifiers using skicit svm
or skicit micro/macro averaging
____________________________________
auto encoders dimension reduction in neck
lstm input output circuits long short term with feedback memory`````
_______________________________________________
data cleaning linear distance 3 type taxicab
nonlinear kernel ann neural nets
feature selecton sampling interval/60 distance sampled over time
unique numpy driverids
scatterplot of sampling intervals
visually choose 60 mins time series
and 4 out of 24 h4r features
next histogram with frequency along y axisprevi y ax in now x axs
next forperson id histogram
for distinguishing distinct traces
which feature has more discriminaing power
principal component analysis
feature scaling
not more points
_____
How to distinguish patterns indata in uni or 2d data
in sapling time /freq histogram make sctter plotsctrplot.
output has matplotlib.
scatter plot with restricted region
scatter plot with peoples location pattern
using person event with driver id person
for every person
if two separate clusters bottu up down scaling
use location patterns to build model
distance as features person as classes
find event group using euclidean distance
________
validation
risk and loss functios
square/abs/error/lp lss
kulbakc leiber loss integate log fxtheta/fxthetacap
risk integaral expected theta ad the cap
whre thetais true valuof apaametr
and theta cap esimated value of paameter
feature variance bias complexity
based on feature cost of traning and test set
is based on mdel complxty of feaures
cost is p() test error le traing error+ a formula with quantity of
complexity,number of features is test error within training error box within upper
bound of box which 1-efficiency
linear classifiers h complexity = featurs no of +1
confusion matrix true and false positiveand negative actual - predicted + false +
___________________
pandas apply map dataframe cleanset
df.apply 1 to 1 square kattam
df + series added value or nan
___________________
SENTIMENT ANALYSIS
rating no value remove with true inplace dropna
remove neutral 3 rating
condition for positive rating
find mean
split to traintest
train first entry and shape is length
cnt vectorizer word to token count
fit training set to vectorizer
feature training set
get feature names
transform to vectorized training set
use logistic regression on vectorized training set
test not in train ignored by fitting to training data
vector transform x test predict a model to rocscore
get array of vectors feature names and sort arguments
prit best features with array index + postve 10
-ve 11 1 largest coefs gives good amazing
tfidf index auc score
phone working eg.,
transform vectorizer log regression model.predict vector transform
binary output
\cmpile
summary
evaluate model.predict by word2vec
________________
anomaly detection time series folder
apache system ml scaling through keras shape
one hot indexing
extract to folder
spark.read.csv for apache spark dataframe
data frame as pequet files to objectstar
right dummy.csv in ibm datascience experiene
dataplatform.ibm.com
on 100 samples future sample
decreased data set
sample updater
gradoent descent
auto encoder
filter by file id from matlab file
row object
sequence arry
neural network
autoencoder
sequence padding and append for preceding and succeeding neighbours
onehot label enc transform reshape and shape
38 word pairs with 2 dimensions outted
succedding 1hotencode output contains 50 sparse matrix value
fit dimensionality of encoded word to sparse vector matrix representation befitting
neural network
one value among 50 is 1
bottleneck using neighbou by relu activation through 2 and add 50 dimensions
original through softmax
training bottleneck layer is low dimensional embedding word to vec by model.compile
and summary and nd fit a
sequentially use 2d again neural network predict
calculate vectors for every word
check code