Professional Documents
Culture Documents
By
Neha tyagi
What is Data Science
1.2
Why data science?
• Data is the oil for today’s world. With the right tools, technologies,
algorithms, we can use data and convert it into a distinctive
business advantage.
• Data Science can help you to detect fraud using advanced machine
learning algorithms
• It helps you to prevent any significant monetary losses
• Allows to build intelligence ability in machines.
• You can perform sentiment analysis to gauge customer brand
loyalty
• It enables you to take better and faster decisions
• Helps you to recommend the right product to the right customer to
enhance your business
1.3
1.4
Statistics:
Visualization:
1.5
Machine learning
Data science can work with manual methods as Machine learning algorithms hard to implement
well, though they are not very useful. manually.
In Data Science, high RAM and SSD used, which In Machine Learning, GPUs are used for intensive
helps you to overcome I/O bottleneck problems. vector operations.
1.7
Data scientist job role:
1.8
Data Scientist:
Role: A Data Scientist is a professional who manages enormous
amounts of data to come up with compelling business visions by
using various tools, techniques, methodologies, algorithms, etc.
Languages: R, SAS, Python, SQL, Hive, Matlab, Pig, Spark
Data Engineer:
Role: The role of data engineer is of working with large amounts of
data. He develops, constructs, tests, and maintains architectures
like large scale processing system and databases.
Languages: SQL, Hive, R, SAS, Matlab, Python, Java, Ruby, C + +,
and Perl
1.9
Data Analyst:
Role: A data analyst is responsible for mining vast amounts of data.
He or she will look for relationships, patterns, trends in data. Later
he or she will deliver compelling reporting and visualization for
analyzing the data to take the most viable business decisions.
Languages: R, Python, HTML, JS, C, C+ + , SQL
Statistician:
Role: The statistician collects, analyses, understand qualitative and
quantitative data by using statistical theories and methods.
Languages: SQL, R, Matplotlib, Python, Perl, Spark, and Hive
Data Administrator:
Role: Data admin should ensure that the database is accessible to
all relevant users. He also makes sure that it is performing correctly
and is being kept safe from hacking.
1.10
Challenges of Data science Technology
• High variety of information & data is required for accurate analysis
• Not adequate data science talent pool available
• Management does not provide financial support for a data science team
• Unavailability of/difficult access to data
• Data Science results not effectively used by business decision makers
• Explaining data science to others is difficult
• Privacy issues
• Lack of significant domain expert
• If an organization is very small, they can’t have a Data Science team
1.11
Traits of big data
1.12
Types Of Big Data
Following are the types of Big Data:
1. Structured
2. Unstructured
3. Semi-structured
Structured
Any data that can be stored, accessed and processed in the form of
fixed format is termed as a ‘structured’ data. Over the period of
time, talent in computer science has achieved greater success in
developing techniques for working with such kind of data (where
the format is well known in advance) and also deriving value out of
it. However, nowadays, we are foreseeing issues when a size of such
data grows to a huge extent, typical sizes are being in the rage of
multiple zettabytes.
1.13
Structured data
1.14
Unstructured
Any data with unknown form or the structure is classified as
unstructured data. In addition to the size being huge, un-structured
data poses multiple challenges in terms of its processing for
deriving value out of it. A typical example of unstructured data is a
heterogeneous data source containing a combination of simple text
files, images, videos etc. Now day organizations have wealth of data
available with them but unfortunately, they don’t know how to
derive value out of it since this data is in its raw form or
unstructured format.
Buil C
150
t
B
1.15
Semi-structured
Semi-structured data can contain both the forms of data. We can
see semi-structured data as a structured in form but it is actually not
defined with e.g. a table definition in relational DBMS. Example of
semi-structured data is a data represented in an XML file.
1.17