Assignment For Data Scientist

Uploaded by

02.satya.2001

0% found this document useful (0 votes)

8 views2 pages

Original Title

Assignment for Data Scientist (4) (1)

Copyright

Available Formats

PDF, TXT or read online from Scribd

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Report this Document

Copyright:

Available Formats

Download as PDF, TXT or read online from Scribd

Flag for inappropriate content

0% found this document useful (0 votes)

8 views2 pages

Assignment For Data Scientist

Uploaded by

02.satya.2001

Copyright:

Available Formats

Download as PDF, TXT or read online from Scribd

Flag for inappropriate content

Jump to Page

You are on page 1of 2

Search inside document

ML Assignment

The data contains features extracted from text similar to the one shown
below.

You have to create a ML model that predict the probability that a piece of
text belongs to a particular class. Use techniques like Bag of Words, tf-
idf vectorization and word embedding. Please use Hash field value and
explain how you are going to use the Hash field.

Data extraction
Fro the documents nGrams have been extracted, Each row in the
Train.csvcorresponds to one such nGram.

Features
For a given nGram several features have been extracted (145). These
features have been saved in the train.csvand test.csv. They have
parsing, spatial, content and relative information.
• Content: The cryptographic hash of the raw text.
• Parsing: nGram is a number, text, alphanumeric, etc.
• Spatial: Position and size of the nGram
• Relational: details of text nearby the nGram
The feature values can be:
• Numbers. Continuous/discrete numerical values.
• Boolean. The values include YES (true) or NO (false).
• Categorical. Values within a finite set of possible values.

Labels

This are the labels corresponding to the probability that the current
sample belongs to the given class. This is multilabel problem and hence
a given sample can belong to more than one class.

File descriptions
All the files are CSV.
train.csv - the features x x of the training set. Each row
corresponds to a different sample, while each column is a different
feature.
• trainLabels.csv - the expected labels y y for the training set. Each
row corresponds to a different sample, while each column is a
different label. The order of the rows is aligned with train.csv.
• test.csv - the features x x of the test set. Each row corresponds to
a different sample, while each column is a different feature.
sampleSubmission.csv - example of the expected probabilities
y ̂ y ^ for the test set. Each row contains two columns,
namely one string and the probability of each sample belonging to each
label. For example, if the test.csv has 3 samples and 4 labels, the
submission file must have 13 rows with these strings in the first column:
id_label, 1_y1, 1_y2, 1_y3, 1_y4, 2_y1, 2_y2, 2_y3, 2_y4, 3_y1, 3_y2,
3_y3, 3_y4, 4_y1, 4_y2, 4_y3, 4_y4

Learn C++
From Everand
Learn C++
Durgesh
Rating: 4.5 out of 5 stars
4.5/5 (9)
R - A Practical Course
Document42 pages
R - A Practical Course
S M Ataul Karim
No ratings yet
Assignment For Data Scientist
Document2 pages
Assignment For Data Scientist
noorullahmd461
No ratings yet
Assignment For Data Scientist
Document2 pages
Assignment For Data Scientist
noorullahmd461
No ratings yet
STATA Programming II
Document2 pages
STATA Programming II
Afc Hawk
No ratings yet
PLP 24
Document43 pages
PLP 24
ashokmohan87
No ratings yet
Chapter7 PDF
Document31 pages
Chapter7 PDF
ZADOD YASSINE
No ratings yet
Vectors:: Status Poor, Improved, Excellent
Document4 pages
Vectors:: Status Poor, Improved, Excellent
Mayank Tripathi
No ratings yet
Introduction To Matlab: Presented by
Document43 pages
Introduction To Matlab: Presented by
narjes
No ratings yet
Computer Applications in Metallurgical Industries
Document52 pages
Computer Applications in Metallurgical Industries
Yousef Adel Hassanen
No ratings yet
Day 2
Document42 pages
Day 2
Diza N.SultanaA44
No ratings yet
ARRAYS, STRINGS, POINTERSclass PDF
Document28 pages
ARRAYS, STRINGS, POINTERSclass PDF
NJENGA Wagithi
No ratings yet
User Defined Ordinal Type
Document8 pages
User Defined Ordinal Type
shankar singam
No ratings yet
Tutorial 2 - Syntax & Grammar: Lexical Structure
Document8 pages
Tutorial 2 - Syntax & Grammar: Lexical Structure
janesan_14
No ratings yet
5 Characters and Strings
Document27 pages
5 Characters and Strings
MAECHAILAH MANOS
No ratings yet
Data Anylysis UNIT 1
Document25 pages
Data Anylysis UNIT 1
Shivaditya singh
No ratings yet
Module 06. String Algorithms Lecture 3-6
Document48 pages
Module 06. String Algorithms Lecture 3-6
Clash Of Clanes
No ratings yet
String/Arrays in Java
Document7 pages
String/Arrays in Java
Saif Hassan
No ratings yet
MATLAB For Engineers ME1006
Document92 pages
MATLAB For Engineers ME1006
Shravani Wagh
No ratings yet
Chapter 4
Document47 pages
Chapter 4
vishal kumar oad
No ratings yet
Unix Text Analysis
Document9 pages
Unix Text Analysis
jimbotyson
No ratings yet
Programming in C (Common To Agri, Civil, Mech, Eee and Eie)
Document26 pages
Programming in C (Common To Agri, Civil, Mech, Eee and Eie)
Sowrirajan Nandhagopal
No ratings yet
LaF Manual
Document9 pages
LaF Manual
manuelq9
No ratings yet
2.1 Structuring The Data, Computations and Program
Document27 pages
2.1 Structuring The Data, Computations and Program
Snehal Kedar
100% (1)
Matlab Basics: Yaara Erez MRC Cognition and Brain Sciences Unit November 2013
Document22 pages
Matlab Basics: Yaara Erez MRC Cognition and Brain Sciences Unit November 2013
ncharala
No ratings yet
Programming in R
Document2 pages
Programming in R
Julia S
No ratings yet
Arrays in C - Kaushik Jaiswal 1
Document4 pages
Arrays in C - Kaushik Jaiswal 1
Shashank Kumar
No ratings yet
Main PDF Lesson 2
Document27 pages
Main PDF Lesson 2
onecarloscruz111
No ratings yet
RANDOM FOREST (Binary Classification)
Document5 pages
RANDOM FOREST (Binary Classification)
Noor Ul Haq
No ratings yet
Oop - Lesson 13 and 14
Document8 pages
Oop - Lesson 13 and 14
David Samuel
No ratings yet
Design Sys 4
Document20 pages
Design Sys 4
drbulus
No ratings yet
Informatics Practices Class 12 Cbse Notes Data Handling
Document17 pages
Informatics Practices Class 12 Cbse Notes Data Handling
ellastark
0% (1)
Apunts BLOC 1 Estadística
Document15 pages
Apunts BLOC 1 Estadística
Mayssae Essabbar
No ratings yet
Module 1 Arrays
Document13 pages
Module 1 Arrays
C/Cpt GOMEZ Micaela G.
No ratings yet
Data Analysis
Document4 pages
Data Analysis
Sereti Codepro
No ratings yet
Appendix A: Review of Data Types in Pascal
Document14 pages
Appendix A: Review of Data Types in Pascal
Dj262022/ /DaviePlays
No ratings yet
JAVA
Document51 pages
JAVA
project first
No ratings yet
Arrays and Do Loops
Document18 pages
Arrays and Do Loops
Mpho Seutloali
No ratings yet
Pythonic Data Cleaning With Numpy and Pandas
Document11 pages
Pythonic Data Cleaning With Numpy and Pandas
Alok Yadav
No ratings yet
MATLAB - Strings: 'Tutorials Point'
Document7 pages
MATLAB - Strings: 'Tutorials Point'
Kingsley chigozie Eneh
No ratings yet
C - Language: Arrays (I)
Document13 pages
C - Language: Arrays (I)
Dinesh Lasantha
No ratings yet
Ps 3
Document3 pages
Ps 3
Annas Mutiara
No ratings yet
Arrays and Strings Module
Document5 pages
Arrays and Strings Module
Ichi Paradiang
No ratings yet
Introduction To Octave
Document26 pages
Introduction To Octave
Maria Rizette Sayo
No ratings yet
Eviews Help
Document7 pages
Eviews Help
jehanmo
No ratings yet
Array 12
Document15 pages
Array 12
Niraj Kumar
No ratings yet
Lecture 15 Modeling Sequential Data Using Recurrent Neural Networks
Document63 pages
Lecture 15 Modeling Sequential Data Using Recurrent Neural Networks
Devyansh Gupta
No ratings yet
C - Language: Arrays (I)
Document13 pages
C - Language: Arrays (I)
Dinesh Lasantha
No ratings yet
تحليل وعرض البيانات
Document7 pages
تحليل وعرض البيانات
Mohamed Mostafa
No ratings yet
Data Structures - Module 1
Document65 pages
Data Structures - Module 1
Shivansh Sharma
No ratings yet
Unit II Arrays and Strings in C
Document31 pages
Unit II Arrays and Strings in C
Savitha Raja
No ratings yet
8 XML Processing
Document2 pages
8 XML Processing
harshita singh
No ratings yet
Assignment2 3
Document1 page
Assignment2 3
keteg87703
No ratings yet
Chapter 8
Document20 pages
Chapter 8
Jojo Cansino
No ratings yet
System Verilog Datatypes
Document50 pages
System Verilog Datatypes
Sinchana Gupta
No ratings yet
EE2211 CheatSheet
Document15 pages
EE2211 CheatSheet
Aditi
No ratings yet
Data Exploration With R I PDF
Document21 pages
Data Exploration With R I PDF
atiqah ariff
No ratings yet
Importing and Exporting Data: Lab Session 07
Document8 pages
Importing and Exporting Data: Lab Session 07
Shahansha Humayun
No ratings yet
Oracle Searching Matching
Document27 pages
Oracle Searching Matching
DrRohini Sharma
No ratings yet
Ian Talks Regex A-Z
From Everand
Ian Talks Regex A-Z
Ian Eress
No ratings yet