You are on page 1of 6

Software Engineering XOR Data Science

Abstract

KEY WORDS:

ABSTRACT:I am pretty sure that many of the readers question or think of what they do in their
professional lives. Besides, they also compare their projects, development practices, managers, and
even colleagues. Since I have completed my 6 years and had a chance to work at both Software
Engineering and Data Science project developments, I would like to write a blog post to mention
similarities, differences and essentials.

Keywords: Data Sources, Software and Data Science , MLops, DevOps, Splunk, Datadog.

INTRODUCTION:

Software Engineering

A detailed study of engineering to the design, development, and maintenance of software is called
Software Engineering. It involves the analysis of users’ requirements by focusing highly on the best
processes and methodologies and producing high-quality software. A suitable programming
language and solutions to algorithmic problems complying with the ’users’ requirement is a prime
objective of a software engineer. It ensures that the application is built consistently, error-free, and
on budget. As the application is built, there is always an immense rate of change in the users’
requirements, Software Engineering becomes resourceful here.

Data Science

The domain of Data Science is incredibly diverse and requires a skill-set from several different
domains to be used together for adequate results. If you are an Iron Man fan, you already know
about Jarvis, Tony Stark’s assistant – a virtual AI. It helps Tony in predicting an outcome for any given
action. The process of collecting data, analyzing the data, and predicting a certain outcome is Data
Science. Finally, The final question “Software Engineer vs Data Scientist” – Which profession is
better? Both Data Science and Software Engineering requires you to have programming skills. While
Data Science includes statistics and Machine Learning, Software Engineering focuses more on coding
languages.
Both career choices are in demand and highly rewarding. Ultimately, it depends on your choice of
interest. Although the field of data science is soaring, its importance will never outgrow that of a
software engineer, because we will need them to build the software the data scientists work on. We
will always need Data Scientists to analyze the data and bring new scope for the business on which
Software Engineering can build software. Thus, we reach the end of our Data Science vs Software
Engineering article. We hope you have gained few insights regarding the topic. Please feel free to ask
any questions regarding Data Science vs Software Engineering in the comments below.

Data Science Software Engineering

Data Science focuses on gathering Software Engineering focuses on the development of


and processing data. applications and features for users.

Includes machine learning and Focuses more on coding languages.


statistics.

Deals with Data Visualisation tools, Software Engineering deals with programming instruments,
Data Analytics tools, and Database database services plan instruments, CMS devices, testing
Tools. devices, integration apparatus, etc.

Deals with Exploratory Data. Software Engineering focuses on systems building.

Data Science is Process Oriented Software Engineering is methodology-oriented.

Skills include programming, Skills include the ability to program and code in multiple
machine learning, statistics, data languages.
visualization.

What Is Data Science?

Data science is an interdisciplinary domain derived from computer science that uses several
scientific processes and methods to study different kinds of data—structured, semi-structured, and
unstructured. It involves using numerous technologies like data transformation, data purging, and
data mining to study and analyze that data. While both data science and software engineering rely
heavily on programming knowledge, data scientists focus more on manipulating large datasets.

A data scientist exploits a huge amount of data for prediction, understanding, intervention, and
exploration. They focus on the value of approximation, the results of data analysis, and the
understanding of its results. Like software engineers, data scientists aim to optimize algorithms and
manage the trade-off between speed and accuracy. They coordinate with experts and work together
to achieve a balance between the assumptions and results.

Data science requires specialized knowledge in analytics, statistics, and mathematics. Data science,
as a separate and independent discipline, was conceived by William S. Cleveland, after which it
became more popular across the world. Data science is a fast-growing field—the job of data scientist
has been declared the second best in America in 2021.

What Is Software Engineering?


Software engineering, on the other hand, is the process of developing software by systematically
applying the principles of engineering. A software engineer analyzes user requirements, then
designs, builds, and tests software applications if they fulfill the set requirements.

Prominent German computer scientist Fritz Bauer defined software engineering as “the
establishment and use of sound engineering principles in order to obtain economically software that
is reliable and works efficiently on real machines”.

Often the term will be used informally to refer to a range of activities related to system analysis or
computer programming. It’s related to several other disciplines like computer science, economics,
management science, and system engineering.

Software engineering serves as a foundation for understanding software in computer science and
helps in the estimation of resources in economics. It employs  management science for labor-
intensive work.  It’s currently one of the  most widely chosen careers worldwide.

A Software Engineer has extensive knowledge of programming languages and is expected to have
sound knowledge of Software Development, Computer Programming, Operating Systems, and good
Analytical skills. Software engineering knowledge is considered as a base for any computer-related
stream/jobs/opportunities. 

Software Engineers are expected to build software, solve software issues, and provide Infrastructure,
Maintenance, and Testing. There are a variety of software domains that a Software Engineer can
develop, such as Operating Systems, Business Software, Games, Control Systems, Payment Gateway,
etc.

However, a Data Scientist is more focused on defining a Problem Statement, Querying Data,
performing Exploratory Data Analysis, developing Models, and Interpreting Results. 

Data Scientist works on structured and unstructured Big Data and combines data with Mathematics
and Science to derive conclusions from the data. Their usual job is to get the data from a Data
Engineer, identify the features and labels, model them by algorithm, test, and train and then
interpret or forecast the results.

Project Management

As you might guess, the phases Business Goal or Demand, Planning, Tech Design, Implementation,


and Maintenance/Monitoring have been the same or similar for both types of projects as far as I have
worked. Project managers or product owners, depending on the size of the company or the aspect,
finalizes the business goal after many meetings, functional and technical requirements are defined
and estimation is done with both product and technical teams, then application is developed. Finally,
the released application has been monitored and required changes/features are developed later on.

Application Development Phases


Data Analysis

With great planning and clean implementation with a smart design, great software can be released.
For the DS projects, I would like to add a data analysis section between Demand and Planning phases.
In this phase, first, the hypothesis is defined, existing or possible data sources are checked and
validated, and the validation of the hypothesis is done. At any step of this process, I have seen that
the project can be stopped or modified. As in life, unless we have any basis, it is one of the best
decisions to give up on our hypothesis as soon as it falters.

In data source validation, a low quality data may maintain an average software but it’s critical for the
DS projects. If the company wants to make some decisions from the output of the DS application,
accuracy of the model must be acceptable or negligible for the first iterations. Creating a base model
would be great to observe how a new proposed model is better or worse. Even this base model may
be employed if it has a reasonable accuracy, then the complexity is increased in order to outperform
the naive model in the further developments. Despite the accuracy being great, data quality must be
verified. So, the alignment between data engineers and data scientist or ML engineers is significant
for the data applications. Because the training data and the prediction data must be the same for a
data application. For example, if there is a batch prediction, data migration must be verified before
the prediction. Otherwise, prediction and training parts would be differ and it may cause some
catastrophic conclusions for the company.

Tech Design

Using a clear software design such as Hexagonal Architecture makes adding new features or changing
the technology stack for both SW and DS projects. Data sources, frameworks and even programming
languages can change before the development. For example, if we want to implement a very short
response time application for payment processes, cache can be implemented to keep this crucial
information and we can reduce the workload of DB.

For DS projects, application flow is similar with SW projects except model training, model selection
and prediction phases. So, in the tech design, the complexity of the MLOps can be defined. Initially,
starting with the simplest way in which Model Training, Model Selection and Model Deployment can
be handled manually, then these steps can be automated. Besides the MLOps, algorithm type and
the complexity of the model is another important actor for the tech design.

Planning

If all validations are done and the requirements are finalized, we can initiate the planning section.
Since the team starts to know dynamics and the development environment better, this phase gets
better as I observed. The tasks can be created as granular as possible, each team member can give
better and similar sprint points, and the overall estimation and planning can be finalized.
At the software side, there might be some underestimations for the DevOps operations. Operations
on the servers and the permissions between the applications or the data sources may last longer
the estimation. A new application may need to call an endpoint from an existing application or a new
field may need to be added into the response. Hence, these communications should be done before
the planning to make better estimations.

At the data side, the details at the software side are generally similar. Besides, the planning
for Feature Engineering, Model Training, and Model Selection steps can be overestimated and it’s
also normal since they are data related processes. Better data analysis and starting with a simple
approach may help with better planning. In addition to them, some simpler thresholds can be
assigned for models to outperform them at these steps. For example, if the F1 Score of the classifier
is higher than X value, we can stop there and keep our excitement for the further model training.

Development

The development part where I enjoy most for both sides is the last step before the final step which is
the monitoring. Implementation of the services, handling the unit-integration-functional tests, and
preparing the server for the application are the common phases for both applications.

In the SW applications, implementation of the tests for a non-exceptional application is a bit easier
than the one in the DS applications. Testing the result of the AI model is not a good approach if we
employ a complex model. However, mocks can be implemented for these models as it can be
implemented for external services in the applications.

Feature Engineering, Model Training, and Model Selection are the distinguishing phases apart from
the SW projects. As it is mentioned in the previous session, there might be some thresholds with 2
dimensions. One dimension is a success criteria to stop and the other dimension is the time
criteria. Since these steps are one of the most willing and the most hyped developments nowadays,
these developments and research should be limited to increase productivity. If this new application
brings some success and it requires better accuracy these three phases can continue for further
developments.

Monitoring

Finally, it’s time to reap the fruits for both sides. Using the monitoring tools such
as Splunk and Datadog which extract information from the application logs has a significant role.
After a new release, it would be the first place to validate that the new release works seamlessly.
Metrics of the software such as number of the requests and response time and error types can be
extracted by writing good logs. It is also great to do data checks in the controller class where the
requests are received and forwarded to desired services before starting the operations for both SW
and DS projects unless the UI side doesn’t do any check. Regex validations and data type checks can
literally save the applications at both performance and security sides. An easy SQL injection may
be prevented by a simple validation and it can even save the company’s future.

On the DS side, there should be further monitoring besides the software metrics. Distribution of the
model features and the model predictions are vital. If the distribution of any data column changes,
there might be a data-shift and a new model training can be required. If the prediction results can
be validated in a short time, these results must be monitored and warn the developers if the accuracy
is below or above from a given threshold.

Common Practices
Before the conclusion, I would like to mention some common practices which help me for both types
of projects.

Having great business domain knowledge: It is indispensable for both sides. In order to acquire this
knowledge, we must understand the client requirements. This knowledge can provide benefits such
as implementation of significant edge cases in the SW projects and extraction of great features in the
DS projects. It also helps to write better tests and prepare better environment for the application

Using OKRs for the project output: In order to give a better business value in the applications, it’s
better to talk with business language. We should transform the goals from ‘reduce the response time
by x’ to ‘accelerate the onboarding process by y’ or ‘reduce the fraud rate by x’ to ‘increase the
payment conversion rate by y’. These transformations would help the teams in different departments
to have better communication and reduce the technical terms in the communication between
technical and non-technical teams.

Application of development practices: Clean coding, designing a great service architecture, pair
programming, and the reviews are the most common ones as far as I remember. For the maintenance
code quality is ineluctable. Even for a one Jupyter-Notebook file, another person should understand
the script as much as possible to understand and implement new features. Pair programming and the
reviews are also important to increase productivity and reduce overlooked errors.

Great documentation: During the planning and the design sessions, it’s also significant to have great
documentation which can be counted as a legal document before starting the implementation. It can
be used as a source for the development and contain brief explanations and flow charts for other
technical and non-technical members in the company.

Conclusion

I hope some of them will be useful for some readers. It would help some as the ones that I used to get
benefit from and I am currently benefiting. As it’s mentioned above, better planning, great and visible
communication, a great analysis, an efficient development, and a great monitoring phases generally
drive me to work in great developments. These are the only challenges that I faced and I just wanted
to share my experiences. These inferences may be useful for some other developers.

References

1. https://www.amazon.com/gp/product/149204511X/ref=x_gr_w_bb_sin?
ie=UTF8&tag=x_gr_w_bb_sin0&linkCode=as2&camp=1789&creative=9325&creativeASIN=14
9204511X&SubscriptionId=1MGPYB6YW3HWK55XCGG2
2. https://www.educative.io/courses/grokking-the-system-design-interview/m2yDVZnQ8lG
3. https://eugeneyan.com/writing/writing-docs-why-what-how/
4. https://eugeneyan.com/writing/first-rule-of-ml/
5. https://cloud.google.com/architecture/mlops-continuous-delivery-and-automation-pipelines-
in-machine-learning#devops_versus_mlops
6. https://www.amazon.fr/Clean-Code-Handbook-Software-Craftsmanship/dp/0132350882
7. https://alistair.cockburn.us/hexagonal-architecture/
8. https://microservices.io/patterns/decomposition/decompose-by-business-capability.html

You might also like