Professional Documents
Culture Documents
(https://www.dataiku.com)
It's hard to know where to start once you’ve decided that, yes, you want to dive into the fascinating world of data and AI. Just looking at all
the technologies you have to understand and tools you’re supposed to master can be dizzying. What data science steps do you take first?
Luckily for you, building your first data analytics project plan is actually not as hard as it seems. Yes, starting with a tool that is designed to
empower people of all backgrounds and levels of expertise such as Dataiku helps, but first you need to understand the data science process
itself. Becoming data-powered (/want-to-be-a-data-powered-organization-start-with-these-3-steps)is first and foremost about learning
the basic steps and phases of a data analytics project and following them from raw data preparation to building a machine learning
model, and ultimately, to operationalization (/wtf-is-operationalization-o16n).
→ Download Machine Learning Basics Continued: Building Your First Machine Learning Model (https://blog.dataiku.com/cs/c/?
cta_guid=8240a049-8d64-4aa9-83fd-
ecc5cffec37b&signature=AAH58kEwR72qVAL7IgS9w9JM39_nPKNsoA&pageId=4588629981&placement_guid=6962a56f-2f3a-4ef5-93a9-
1b7a45da78f6&click=efa0701e-fe09-4f13-9cb9-
2a80c05ec08d&hsutk=4eae2dc75c9898fd223eb695af7bce72&canon=https%3A%2F%2Fblog.dataiku.com%2F2019%2F07%2F04%2Ffundamental-
steps-data-project-
success&portal_id=2123903&redirect_url=APefjpFNbcyMeweIkbNGLka7MzMO0lGavoZgGVDjYM8qeX23ElLf245EdSvPv5_VrotPij4li6m1IgZPZRmOpm7lE
JOFZT94_LDiOJajpxbUMF6zHjb_4K3NDwavEklORhlGd9e1J_0yqgfQHkZuHP7Tpn48cahoaD9Lb6OrB28tfM8&__hstc=186155446.4eae2dc75c9898fd223e
post)
The following is our take on a data project definition via the fundamental steps of a data analytics project plan in this exciting age of AI,
machine learning, and big data! These seven data science steps will help ensure that you realize business value from each unique project and
mitigate the risk of error.
https://blog.dataiku.com/2019/07/04/fundamental-steps-data-project-success 1/6
11/5/21, 2:48 PM 7 Fundamental Steps to Complete a Data Analytics Project
If you’re working on a personal project or playing around with a dataset or an API, this step may seem irrelevant. It’s not. Simply downloading
a cool open dataset is not enough. In order to have motivation, direction, and purpose, you have to identify a clear objective of what you want
to do with data: a concrete question to answer, a product to build, etc.
Connect to a database: Ask your data and IT teams for the data that’s available or open up your private database and start digging through it
to understand what information your company has been collecting.
Use APIs: Think of the APIs to all the tools your company’s been using and the data these guys have been collecting. You have to work on
getting these all set up so you can use those email open and click stats, the information your sales team put in Pipedrive or Salesforce, the
support ticket somebody submitted, etc. If you’re not an expert coder, plugins in Dataiku (https://www.dataiku.com/product/plugins/) give
you lots of possibilities to bring in external data!
Look for open data: The Internet is full of datasets to enrich what you have with additional information. For example, census data will help
you add the average revenue for the district where your user lives or OpenStreetMap can show you how many coffee shops are on a given
street. A lot of countries have open data platforms (like data.gov (https://www.data.gov/) in the U.S.). If you're working on a fun project
outside of work, these open data sets are also an incredible resource (/top-ds-resources-for-data-scientists)!
Once you’ve gotten your data, it’s time to get to work on it in the third data analytics project phase. Start digging to see what
you’ve got and how you can link everything together to achieve your original goal. Start taking notes on your first analyses and
ask questions to business people, the IT team, or other groups to understand what all your variables mean.
The next step (and by far the most dreaded one) is cleaning your data. You’ve probably noticed that even though you have a country feature,
for instance, you’ve got different spellings, or even missing data. It’s time to look at every one of your columns to make sure your data is
homogeneous and clean.
Warning! This is probably the longest, most annoying step of your data analytics project. It's going to be painful for a little bit, but as long as
you keep focused on the final goal, you’ll get through it.
Finally, one crucially important element of data preparation not to overlook is to make sure that your data and your project are compliant
with data privacy regulations (/executing-data-projects-in-the-age-of-data-privacy). Personal data privacy and protection is becoming a
priority for users, organizations, and legislators alike and it should be one for you from the very start of your data journey. In order to execute
privacy compliant projects, you’ll need to centralize all your data efforts, sources, and datasets into one place or tool to facilitate governance.
Then, you’ll need to clearly tag datasets and projects that contain personal and/or sensitive data and therefore would need to be treated
differently.
https://blog.dataiku.com/2019/07/04/fundamental-steps-data-project-success 2/6
11/5/21, 2:48 PM 7 Fundamental Steps to Complete a Data Analytics Project
Now that you have clean data, it’s time to manipulate it in order to get the most value out of it. You should start the data enrichment phase of
the project by joining all your different sources and group logs to narrow your data down to the essential features. One example
of that is to enrich your data by creating time-based features, such as:
Extracting date components (month, hour, day of the week, week of the year, etc.)
Calculating differences between date columns
Flagging national holidays
Another way of enriching data is by joining datasets — essentially, retrieving columns from one dataset or tab into a reference dataset. This is
a key element of any analysis, but it can quickly become a nightmare when you have an abundance of sources. Luckily, some tools such as
Dataiku allow you to blend data through a simplified process (/7-awesome-things-you-can-do-without-coding-in-dataiku), by easily retrieving
data or joining datasets based on specific, fine-tuned criteria.
When collecting, preparing, and manipulating your data, you need to be extra careful not to insert unintended bias or other undesirable
patterns into it. Indeed, the data that is used in building machine learning models and AI algorithms is often a representation of the outside
world, and thus can be deeply biased against certain groups and individuals. One of the things that make people fear data and AI (/finding-
hope-in-artificial-intelligence) the most is that the algorithm isn’t able to recognize bias. As a result, when you train your model on biased
data, it will interpret recurring bias as a decision to reproduce and not something to correct.
This is why an important part of the data manipulation process is making sure that the used datasets aren’t reproducing or reinforcing any
bias that could lead to biased, unjust, or unfair outputs. Accounting for the machine learning model’s decision-making process and being able
to interpret it (/why-machine-learning-interpretability-matters) is nowadays as important a quality for a data scientist, if not even more, as
being able to build models in the first place.
The tricky part here is to be able to dig into your graphs at any time and answer any question someone would have about a given insight.
That’s when the data preparation comes in handy: you’re the guy or gal who did all the dirty work, so you know the data like the palm of your
hand!
If this is the final step of your project, it’s important to use APIs and plugins so you can push those insights to where your end users want to
have them.
Graphs are also another way to enrich your dataset and develop more interesting features. For example, by putting your data points on a map
you could perhaps notice that specific geographic zones are more telling than specific countries or cities.
https://blog.dataiku.com/2019/07/04/fundamental-steps-data-project-success 3/6
11/5/21, 2:48 PM 7 Fundamental Steps to Complete a Data Analytics Project
By working with clustering algorithms (aka unsupervised), you can build models to uncover trends in the data that were not distinguishable in
graphs and stats. These create groups of similar events (or clusters) and more or less explicitly express what feature is decisive in these
results.
More advanced data scientists can go even further and predict future trends with supervised algorithms. By analyzing past data, they find
features that have impacted past trends, and use them to build predictions. More than just gaining knowledge, this final step can lead to
building entirely new products and processes.
Even if you’re not quite there yet in your personal data journey or that of your organization, it’s important to understand the process so all the
parties involved will be able to understand what comes out in the end.
Finally, in order to garner real value from your project, your predictive model must not sit on the shelf; it needs to be operationalized.
Operationalization (o16n) (/why-operationalization-is-hard-but-it-doesnt-have-to-be) simply means deploying a machine learning model for
use across an organization. Operationalization is vital for your organization and for you to realize the full benefits of your data science efforts.
Ironically, in order to successfully complete your first data project, you need to recognize that your model will never be fully “complete." In
order for it to remain useful and accurate, you need to constantly reevaluate, retrain it, and develop new features. If there's anything you take
away from these fundamental steps in analytics and data science, it is that a data scientist’s job is never really done, but that’s what makes
working with data all the more fascinating!
July 4, 2019
Data Basics Alivia Smith
SUBSCRIBE
https://blog.dataiku.com/2019/07/04/fundamental-steps-data-project-success 4/6
11/5/21, 2:48 PM 7 Fundamental Steps to Complete a Data Analytics Project
READ MORE
(HTTPS://BLOG.DATAIKU.COM/DATAIKUS-ROLE-IN-THE-MODERN-DATA-STACK)
(https://blog.dataiku.com/challenges-to-the-modern-data-stack)
READ MORE
(HTTPS://BLOG.DATAIKU.COM/CHALLENGES-TO-THE-MODERN-DATA-STACK)
(https://blog.dataiku.com/data-culture-best-practices)
READ MORE
(HTTPS://BLOG.DATAIKU.COM/DATA-CULTURE-BEST-PRACTICES)
(https://blog.dataiku.com/demystifying-the-modern-data-stack)
October 1, 2021
READ MORE
(HTTPS://BLOG.DATAIKU.COM/DEMYSTIFYING-THE-MODERN-DATA-STACK)
https://blog.dataiku.com/2019/07/04/fundamental-steps-data-project-success 5/6
11/5/21, 2:48 PM 7 Fundamental Steps to Complete a Data Analytics Project
https://blog.dataiku.com/2019/07/04/fundamental-steps-data-project-success 6/6