Professional Documents
Culture Documents
DATA
STREAK
A monthly digest on all things Data
Article Barn 6
In A Nutshell 10
New Innovations 18
DataShots 23
Data Ticklers 25
Career Transitions 28
Guesstimate Solutions 29
How To Be
01 Transition-Ready?
If there’s one thing I’ve learned as a Student Mentor for the Data
Science batch, it is this: getting feedback on your data science
job application or interview is virtually impossible. We, at upGrad,
have been constantly taking reviews from recruiters and have
also been modifying our career content accordingly.
2
Reason 1: Python For Data Science
Data Exploration:
Feature Selection:
90% of the time, your dataset will have way more features
1
than you need (which leads to excessive training time, and a
heightened risk of overfitting). Get familiar with basic filter
methods (look up scikit-learn’s VarianceThreshold and
SelectKBest functions), and more sophisticated model-based
feature selection methods (look up SelectFromModel).
3
7
Bayes’s Theorem:
It’s a foundational pillar of
probability theory, and it comes
up all the time in interviews.
You should practice doing
some basic Bayes Theorem
whiteboarding problems.
Basic Probability
You should be able to answer
questions like these.
Model Evaluation
4
Reason 5: Technical Knowledge Reason 6:
Data scientists are increasingly required to take on software Employability Test
engineering work. Many employers insist that applicants
Imagine you have an interview
should understand how to manage their code and keep clean
with a recruiter and they ask
notebooks and scripts in particular: you to give a pre-hiring test to
understand your knowledge.
Version Control You probably won’t have
You should know how to use Git, and Google indexing for enough time to prepare for the
your repository. If you don’t, start with this tutorial. same and may end up failing in
the first round. Despite having
Web Development relevant experience, you might
have had to face rejection.
Some companies like their data scientists to be comfortable
During your course with
accessing data that’s stored on their web app, or via an API. upGrad, every learner has to
Getting comfortable with the basics of web development is give an Employability Test. The
important, and the best way to do that is to learn a bit of reason for us providing you
Flask. with a series of employability
tests is to get you ready for the
unprecedented job interview
Web Scraping test. Having said that, this will
This is slightly related to web development. Sometimes, also give you an insight on your
you’ll need to automate data collection by scraping data overall knowledge of the topics
from live websites. Two great tools to consider for this are covered in the test. Did you
know, these tests are based on
BeautifulSoup and Scrapy.
the recruiters POV rather than
academic knowledge?crucial.
Clean Code
Learn how to use docstrings. Don’t overuse inline comments. It’s a foundational pillar of prob-
Break your functions up into smaller functions. Way smaller. ability theory, and it comes up
There shouldn’t be functions in your code longer than 10 lines. all the time in interviews. You
Give your functions good, descriptive names (function_1 is not should practice doing some
a good name). Follow pythonic convention and name your basic Bayes Theorem white-
variables with underscores like_this and not LikeThis or boarding problems.
likeThis. Don’t write python modules (.py files) with more than
400 lines of code.
5
Reason 7: Business Instinct
A number of people have an understanding that getting hired is about showing that you’re the most
technically skilled applicant to a role. It’s not. In reality, companies want to hire people who can help
them make more money, faster. In general that means moving beyond just technical ability, and building
a number of additional skills:
6
Article
02 Barn
Data science is a high-ranking profession that allows the curiosity to make game-changing
discoveries in the field of Big Data. A report from Indeed, one of the top job sites, has shown a
29% increase in demand for data scientists year over year. Moreover, since 2013, the demand
has increased by a whopping 344%. What’s the reason for such a demand?
Learn More
Researchers at Duke University’s Pratt School of Engineering have developed a machine learn-
ing algorithm that can increase the resolution of optical coherence tomography (OCT), an imag-
ing technology similar to ultrasound that uses light instead of soundwaves.
Learn More
7
The Data Science Life Cycle Consists Of 7 Phases
In this post, we will go through each of them briefly. Check how the infographic depicts different
phases in the data science life cycle. Data Science being a mixture of various tools, algorithms
and machine learning principles aim in identifying hidden patterns or insights from our data
which helps us to make improvised decisions.
Learn More
Let us understand what skillsets you require to be employable and the current trends in the
market. As a data scientist, you are in high demand. So, how can you increase your marketabili-
ty even more? Check out these current trends in skills most desired by employers in 2019.
Learn More
Therefore it is important for us to know the Regulation and Ethics in Data Science and Machine
Learning. Let’s understand why regulation is essential and what are the three requirements
every statistical algorithm should satisfy in order to be a better risk assessment tool on respon-
sible machine learning principles.
Learn More
8
Mergers And
03 Acquisitions
Blackberry is
making its largest
acquisition ever!
Tell Me More
9
In A
04 Nutshell
Anyone running a company, or who’s part of a company that travels requires business
travels, is familiar with the challenges often faced with travel expenses and figuring out
how much a business trip will cost. The new startup TravelBank is now working towards
reducing both the struggle and the cost of arranging business trips, by applying machine
learning to help employees document spending as well as filing expense reports. The app
is not only designed to help employees keep track of expenses, but it is also focusing on
helping employees change their behaviour, to ensure they spend less money. In return,
companies that use TravelBank, can then “reward” their employees who spend less than
the predicted budget half of what they saved.
Making music is one of the most human things we do, but in recent years, AI has stepped
in to lend a helping hand. Now, AI is reaching the mastering process, raising hard
questions about the need for human experts in the most specialized areas of music
production. Mastering is the final step in audio post production, and balances out all of a
song’s elements so it will sound consistent no matter how you’re listening to it - on Spotify,
in iTunes, or on a CD. The goal of mastering is to make the listening experience balanced
and cohesive from song to song. As mastering engineer Ian Cooper says, mastering is “a
bit like photography - you can make the sky bluer, the greens greener.”
10
SomaDetect - A unique application of Machine Learning
in the dairy industry
“Can I meet your cows?”- that is always one of the questions Bethany asks the dairy
farmers she meets these days. After studying math and environmental studies, Bethany
Deshpande earned a PhD in biology, researching Thermokarst lakes - shallow lakes
caused by the thawing and collapse of ice-rich permafrost - and modeling their oxygen,
and microbial composition. The education didn’t teach her about the dairy industry, but
she gained a lot of knowledge about sensors, developed a deep appreciation for data,
and learned that tough problems can be solved with a healthy mix of curiosity and
determination. Today, she is an accidental AgTech (agricultural technology) entrepreneur,
leading a company that uses data science to improve the sustainability and efficiency of
dairy farming. In a dairy farm, the milk is tested every few days by the farmer and upon
delivery. The farmers’ compensation is directly tied to the quality of the milk, a measure of
concentration of different components such as fat, protein, and their ratio. Further, the
milk gets tested for presence of antibiotics and somatic cell counts (SCC). Even small
traces of antibiotics or SCC levels above a certain threshold may lead to rejection by the
processor per regulatory standards. Somadetect marries a century-old technology with
the latest data science algorithms to address the deficiencies of the current system in fast
& accurate analysis of the quality of the milk and the health of a cow.
Artists are supposed to be among the least likely to lose their jobs to automation, but what
happens when AI-enabled features start painting, editing, and doing other parts of their
jobs for them? AI tools are already starting to automate what used to be time-consuming
manual processes - but the results may be good for artists’ creativity, rather than potential
job killers. Companies that make industry-standard creative tools like Adobe and Celsys
have been adding AI features to their digital art software in recent years in the hopes that
it’ll speed up workflows by eliminating drudge work, and give artists more time to
experiment. From machine learning tools that help find specific video frames faster, to
features that color in entire works of line art with just a button, AI is being incorporated in
subtle, but surprisingly impactful ways.
11
05
6 Machine Learning
Applications Enhancing
The Healthcare Sector 2019
The ever increasing population of the world has put tremendous
pressure on the healthcare sector to provide quality treatment and
healthcare services. Now, more than ever, people are demanding
smart healthcare services, applications, and wearables that will help
them to lead better lives and prolong their lifespan.
Also, the fact that the healthcare sector’s data burden is increasing
by the minute (owing to the ever-growing population and higher
incidence of diseases) is making it all the more essential to incorpo-
rate Machine Learning into its canvas. With Machine Learning, there
are endless possibilities. Through its cutting-edge applications, ML
is helping transform the healthcare industry for the better.
12
Now that you are familiar with the core components of ML
systems, it’s time to take a look at the different ways they “learn.”
Personalised Treatment
And Behavioral Modification
Between 2012-2017, the penetration rate of Electronic Health
Records in healthcare rose from 40% to 67%. This naturally means
more access to individual patient health data. By compiling this
personal medical data of individual patients with ML applications
and algorithms, health care providers (HCPs) can detect and
Guesstimate
2
assess health issues better. Based on supervised learning,
medical professionals can predict the risks and threats to a
There are 2 jugs
patient’s health according to the symptoms and genetic informa-
with 4 litres and 5 litres of
tion in his medical history. This is precisely what IBM Watson
Oncology is doing. Using patients’ medical information and medi-
water respectively. The
cal history, it is helping physicians to design better treatment plans objective is to pour exactly
based on an optimized selection of treatment choices. Behavioral 7 litres of water in a
modification is a crucial aspect of preventive medicine. ML bucket. How can it be
accomplished?
13
4
Drug Discovery And Manufacturing Identifying Diseases
Machine learning applications have found their way into the field
and Diagnosis
of drug discovery, especially in the preliminary stage, right from
Machine Learning, along with Deep
initial screening of a drug’s compounds to its estimated success Learning, has helped make a remark-
rate based on biological factors. This is primarily based on able breakthrough in the diagnosis
next-generation sequencing. Machine Learning is being used by process. Thanks to these advanced
pharma companies in the drug discovery and manufacturing technologies, today, doctors can
process. However, at present, this is limited to using unsupervised diagnose even such diseases that
were previously beyond diagnosis –
ML that can identify patterns in raw data. The focus here is to
be it a tumour/or cancer in the initial
develop precision medicine powered by unsupervised learning,
stages to genetic diseases. For
which allows physicians to identify mechanisms for “multifactorial” instance, IBM Watson Genomics
diseases. The MIT Clinical Machine Learning Group is one of the integrates cognitive computing with
leading players in the game. Its precision medicine research aims genome-based tumour sequencing
to develop such algorithms that can help to understand the to further the diagnosis process so
disease processes better and accordingly chalk out effective that treatment can be started
treatment for health issues like Type 2 diabetes. Apart from this, head-on. Then there’s Microsoft’s
InnerEye initiative launched in 2010
R&D technologies, including next-generation sequencing and
that aims to develop breakthrough
precision medicine, are also being used to find which alternative
diagnostic tools for better image
paths for the treatment of multifactorial diseases. Microsoft’s analysis.
Project Hanover uses ML-based technologies for developing
precision medicine. Even Google has joined the drug discovery
bandwagon. Pharmaceutical manufacturers can harness the data
from the manufacturing processes to reduce the overall time
required to develop drugs, thereby also reducing the cost of
manufacturing.
14
Personalised Treatment Robotic Surgery
By leveraging on patient medical history, ML technologies Today, doctors can successfully operate
can help develop customised treatments and medicines even in the most complicated situations,
that can target specific diseases in individual patients. with precision, Thanks to robotic
This, when combined with predictive analytics, reaps surgery. Case in point - the Da Vinci
further benefits. So, instead of choosing from a given set robot. This robot allows surgeons to
of diagnoses or estimating the risk to the patient based on control and manipulate robotic limbs to
his/her symptomatic history, doctors can rely on the perform surgeries with precision and
predictive abilities of ML to diagnose their patients. IBM fewer tremors in tight spaces of the
Watson Oncology is a prime example of delivering human body. Robotic surgery is also
personalised treatment to cancer patients based on their widely used in hair transplantation
medical history. procedures as it involves fine detailing
and delineation. Today robotics is
Understanding the importance of people in the healthcare spearheading in the field of surgery.
sector, Kevin Pho states: Robotics powered by AI and ML algo-
rithms enhance the precision of surgical
“Technology is great. But people and process improve tools by incorporating real-time surgery
care. The best predictions are merely suggestions until metrics, data from successful surgical
they’re put into action. In healthcare, that’s the hard part. experiences, and data from pre-op
Success requires talking to people and spending time medical records within the surgical
learning context and workflows - no matter how badly procedure. According to Accenture,
vendors or investors would like to believe otherwise.” robotics has reduced the length of stay
in surgery by almost 21%. Mazor Robot-
ics uses AI to enhance customization
and keep invasiveness at a minimum in
surgical procedures involving body
parts with complex anatomies, such as
the spine.
15
Did You
06 Know?
AI Can Now
Cheer You Up!
AI Is In Line
To Be The Next
Picasso!
16
Machines Are
Now Chefs!
AI Is Now A
Producer And
A Director!
17
New
07 Innovations
01. Story
There was a recent buzz about an AI
System in japan writing novel, “The Day
a Computer Writes a Novel”, that was
supposed to win a literary prize. The
Research and Development team start-
ed to write their own novel and then
deconstructed it into several parts. After
this process, the AI was commissioned
to sequentially arrange the parts it was
Source: Wakefield, J. (2015). Can a machine become an artist?. [online] BBC
assigned and create “another story News. Available at: https://www.bbc.com/news/technology-33677271
similar to the sample novel,” construct- [Accessed 16 Jun. 2019].
ing it from the “different words, phrases,
characters, and plot outlines that had
been fed to it.”
02. Music
The famous Rockstar “David Bowie” was
the co-writer behind a program whose
function was to generate “lyric ideas.”
18
Kaggle
08 Problem Statement
New York City Airbnb Open Data
Since 2008, guests and hosts have used Airbnb to expand on traveling possibilities and
present a more unique, personalised way of experiencing the world. This dataset
describes the listing activity and metrics in NYC, NY for 2019.
Problem Link:
https://www.kaggle.com/dgomonov/new-york-city-airbnb-open-data/activity
India is one of the fastest developing nations of the world and trade between nations is
the major component of any developing nation. This dataset includes the trade data for
India for commodities in the HS2 basket.
Problem Link:
https://www.kaggle.com/lakshyaag/india-trade-data/kernels
This is a list of over 3,500 pizzas from multiple restaurants provided by Datafiniti's Business
Database. The dataset includes the category, name, address, city, state, menu information,
price range, and more for each pizza restaurant.
Problem Link:
https://www.kaggle.com/datafiniti/pizza-restaurants-and-the-pizza-they-sell/kernels
19
Trending YouTube Video Statistics
Testimonials
YouTube (the world-famous video sharing website) maintains a list of the top trending
videos on the platform. According to Variety magazine, “To determine the year’s top-trend-
ing videos, YouTube uses a combination of factors including measuring user interactions
(number of views, shares, comments and likes). Note that they’re not the most-viewed
videos overall for the calendar year”. Top performers on the YouTube trending list are
music videos (such as the famously virile “Gangnam Style”), celebrity and/or reality TV
performances, and the random dude-with-a-camera viral videos that YouTube is
well-known for.
Problem Link:
https://www.kaggle.com/datasnaek/youtube-new
Craigslist is the world's largest collection of used vehicles for sale, yet it's very difficult to
collect all of them in the same place. A student built a scraper for a school project and
expanded upon it later to create this dataset which includes every used vehicle entry
within the United States on Craigslist.
Problem Link:
https://www.kaggle.com/austinreese/craigslist-carstrucks-data/kernels
20
Why Our Learners
09 Love Data Streak
I read the Data Streak magazine and I am delighted to see the nice curated content on Data
Science and AI and the best part was the news about companies utilising AI solutions for differ-
ent uses, like the one news of AI Game based interviews in Unilever.
Also the top Data Scientist to follow is important. Thanks for starting this initiative.
Kapil Manchanda - PGDML & AI
The games at work are very nice and new to me. Also, how the industry is changing from tradi-
tional way of hiring is very informative. I am so much interested in Student Article & Career tran-
sitions & latest trends . I will always eagerly look into this and actively look into In a Nutshell to
know the industry, current developments, case studies, new ideas. And yes, guesstimates are
very nice too. Data ticklers - finally makes us relax.
Balamurugan Gurusamy - PGDML& AI
Love those problem solving approaches. I love SME sections and what we learn on latest
trends. I’m certainly looking forward to more technical content or articles.
Bhavin Panchal - PGDML & AI
I went through the Data Streak magazine and I found it very useful to know about the latest hap-
penings in AI and ML. I like the dataShots section. It has brief info about how AI and ML is being
applied in real life.
Pradeep Kumar Reddy Kondreddy - PGC
The data streak for the July month is amazing. The content is really a treat to read.
Great initiative by upGrad and I am waiting for the Data streak for next month..:blush::blush:
Tavish Aggarwal PGDML & AI
21
Top Data Scientists To
10 Follow
His contribution in the field of digital transformation He is one of the biggest data science, machine
has been recognised by organisations such as learning, AI and deep learning stalwarts within
Onalytica, Dataconomy, and Klout. He is also an the HR niche. He has over 20 years of experi-
author for a number of leading big data websites, ence and has built the most trusted brand in
including The Guardian, The Datafloq, and Data data science and machine learning. He is the
Science Central, and he regularly speaks at founder and chief data scientist at V-Squared
high-profile events and conferences. You must Data Strategy. His LinkedIn article “How to
follow him if you are an ardent enthusiast of data Become a Data Scientist, No Matter Where
science, big data, the IoT (Internet of Things), Your Career Is At Now” is a great read.
predictive analytics, and business intelligence.
22
11 Data Shots
23
Top Linkedin Profiles Of The Month
About: About:
Believe in Innovation and Growth, Hard work Result oriented and dedicated professional
and Knowledge. skilled with 2.5 years of experience in Oil &
Gas Wellsite Operations and Data Visualiza-
Industrial Knowledge on: tion & Analysis and Coding Skills like SQL, C++
Artificial Intelligence, Machine Learning, Deep and Python. Known for displaying high ethical
Learning, Neural Networks, Natural Language standards, integrity and confidentiality while
Processing, Conversational AI, Computer always exceeding expectations independent-
Vision and Predictive Analytics. ly or as a part of a team.
Cherish, Adore and Challenge.
Experience:
Experience: 2.5 years of experience as a Senior Process
2.8+ experience as a Data Scientist in Data Engineer-Wells in Royal Dutch Shell,
Genpact, Hyderabad. Chennai.
24
Data
12 Ticklers
One of the
monty python team has
invented an unmanned aircraft What do you call a 3.14 m
that does sky-writing that‛s long python?
spelled the same
backwards as forwards?
25
The Great Hack: Review
Karan Mehta The Great Hack is a documentary cum movie that examines the
Student Mentor effects when private companies harvest online information about
Data Science Vertical us.
upGrad - Student Success Team
26
Using the collected data, Cambridge Analytica set So to all my wonderful readers be careful
out to create fear and apathy to achieve the results from today as the photos you click, the
of the political parties that hired them. Carroll, the messages you exchange, the things you
main buy reflect your personality and all this data
character’s lawsuit is an attempt to retrieve the data is easily available to the apps that live in
collected on him.To say that we love in a world of your electronic devices. The most recent
illusions would be true, but we don’t admit Why? example of this is FaceApp that became an
Simply because we're addicted to it. internet trend within hours but no one
bothered to read their privacy agreement.
To conclude, The Great Hack will alarm you,
infuriate you, and - hopefully - activate you. We’re My rating for this movie is a 3.5/5 as it
told that tech companies are the richest businesses paints people like Mark Zuckerberg as the
in the world, and since data is the hottest villains of this modern world where data is
commodity on the market, they’ll do anything to get apparently “more valuable than oil”, yet it
it. That we don’t know what’s being done with our offers nothing that we don’t already know.
data is the scariest aspect of all this. “The Great
Hack” hammers that point home quite successfully.
27
14 Career
Transitions
BOBY JOHNSON
28
15 GUESSTIMATE
SOLUTIONS
1
A birthday cake has to be equally divided into
8 equal pieces in exactly 3 cuts. Determine the way
to make this division possible?
This puzzle is not really difficult to solve if you really put your mind to work.
The approach entails slicing the cake horizontally down the centre, followed
by making another division vertically through the centre.
The two divisions made across horizontal and vertical directions will give
you 4 equal pieces of the cake.
In the final step, simply stack the 4 pieces one above the other, and then
make the third division, splitting the stack into half..
This gives you the 8 equal pieces of cake, along with answer to your puzzle.
29
2
There are 2 jugs with 4 litres and 5 litres of water
respectively. The objective is to pour exactly 7 litres
of water in a bucket. How can it be accomplished?
The approach here is to initially fill the 5L jug with water and empty the
same into the 4L jug. The 5L jug will be left with 1L of water, which is
poured into the bucket. Meanwhile, empty the 4L jug.
The above step is repeated, so that the bucket is filled with 2L of water.
Finally, fill the 5L jug with water and empty the same into the bucket.
The bucket will now have 7L of water, as you add % L directly to the
previously collected 2L of water in the bucket.
30
To share your stories/articles/blogs,
write to us at
pgdds@upgrad.com | pgdml@upgrad.com
FIND US HERE:
upGrad Education Private Limited, Nishuvi, 75, Dr. Annie Besant Road, Worli, Mumbai – 400018