You are on page 1of 6

Comparison Python, R and SAS Efficiency

 
20MSM3013- Suraj Kumar
20MSM3016- Muskan Parihar
20MSM3001- Harleen Kaur
Master of Science in Data Science,
Chandigarh University, Gharuan, Mohali, Punjab
Email- Suraj29may@gmail.com,
hkaur3099@gmail.com,
muskan.parihar86@gmail.com

Keywords:

Abstract:

Introduction:
For more than two decades, data scientists have been debating the merits of using R and SAS for
data processing. The debate has never reached a conclusion, but Python has now joined the race
as a modern common data science tool. Now, we're going to talk about SAS, R and Python
properties and try to make a clear choice about which method is better for data science. Both
professionals use their higher ability for their duties and comparatively better equipment.
Incorporate learning from their work experiences by veteran clinicians while new individuals
look for guidance.The study in this paper will concentrate on quantifiable and qualifying
characteristics in order to provide a comparison of several performance-focused instruments.
Reports from renowned data science and data analysis websites, Burtch Works, and KDNuggets
contained public data. In the top 4 analytic and data mining tools KDNuggets includes:
Python1 ,R2 and SAS3, these three in his article[1]. In 2017, Burtch[2] Works conducted a flash
survey of over one thousand data professionals to assess the preferences for Python, R, and SAS.
As a consequence of the study, this paper will rely on these approaches."In his report "[3],
O'Reilly wrote: "[3] Over four decades ago, calculations were developed to measure the
complexity of algorithms and languages. A quote by Tim would quickly sum up the inspiration
behind this article. The metrics that are now known as the Halstead complexity measures are
attributed to Maurice H. Halstead, the founder of this field. Rigorous research and testing was
carried out on these measurements [5], [6] and [7]. The Cyclomatic complexity model, also
known as McCabe's complexity, was also developed in the 1970s [9].Since the introduction of
the models, criticism of both models has been voiced. One issue is that the complexity of code
can be more complicated than the complexity of language. The complexity of the tasks carried
out by the code or practices with less simple coding can reflect the complexity[10]. Another
challenge [11] is the direct relation between the complexity and the code lines. Given the
questions, both Halstead and McCabe's indicators of complexity continue to be used.Python,
created by Guido Van Rossum as the successor of the ABC language and officially published in
1991, focuses on the dedication of its broad user and developer community to its continuous
creation and growth, self-identified as PUGs (Python User Groups) 5. There is a scientific
community 'well-established and growing'. A group of scientists, engineers and scientists who
use, improve and promote the use of Python in scientific research[12]. A recent survey[14]
reported that Python's NumPy and SciPy packages were among the most favored for statistical
analysis, while scikit-learn was a data mining favorite.

3.1: Python: Python is a programming language, multiparadigm, general-purpose, interpreted,


high-level. For open-source scripting, Python is a language. Python helps programmers to use
different programming types in order to create basic or complicated applications, get quicker
results and write code just as if they were speaking in a human language. It has a wide range of
libraries, such as data transformation, data filtering, data mining, machine learning, predictive
analysis, etc., that allow you to work in many fields.
3.2: R: R is a programming language for statistical computation and graphics and a free software
platform supplied by the R Statistical Computing Foundation. It is useful both for complex
measurements of quantitative and statistical data and for data visualization. Owing to its open-
source nature, R has a wide audience. The R language is widely used by statisticians and data
miners to construct statistical tools and data analysis. The R group has produced a vast range of
packages over the years, making R capable of executing all data science activities.
3.3: SAS: SAS stands for Statistical Analytics System. The SAS language is a computer
programming language developed at North Carolina State University by Anthony James Barr
that is used for statistical analysis. From common spreadsheets and databases, it can read and
output statistical analysis results in tables, graphs and RTF, HTML and PDF documents. It is a
proprietary computing platform for predictive analytics. SAS is very pricey. It is used for large
corporations, but for individuals and small groups, it is out-of-reach.

SAS Vs R Vs Python
There was never a definite answer to the question as to which is superior between SAS, R or
Python? All three technologies have cases in which they excel, while failing in others. What
method one selects should be based on the individual's demands. We have made distinctions
between them, based on these points:
1. Ease of learning
2. Data handling ability
3. Data visualization
4. Cost-effectiveness
5. Customer service and community support
6. Updates
7. Market demand

1. Ease of learning:
● Of all the three, SAS is easier to exercise and learn. It has a powerful GUI that
makes it much easier to learn and use. To successfully use SAS, one needs to
know SQL beforehand.
● Python is renowned for its versatility and simplicity. It has no widespread
Interface, but with Python notebooks, it is becoming popular. At a high level,
Python is an object-oriented language that is simpler to understand than R.
● R has the steepest learning curve of these. It's a low-level language, so more
programming is needed for easier tasks. The code can be much messier and
longer without prior experience of reliable coding methods to perform the
simplest tasks.
When it comes to learning, the sequence for easiness to understand is SAS, followed by Python
and R.
2. Data handling ability:Knowledge is increasing in scale and sophistication per day. A
data science infrastructure must be in a position to store and handle large quantities of
data effectively.
● SAS is quick and safe when it comes to managing details on stand-alone
computers.
● Python has libraries, such as Panda and NumPy, that make data handling very
easy.
● R runs only on RAM, which makes working with huge datasets very sluggish.
It has packages such as plyr and Dplyr that make managing data much simpler
in R. We can also merge R with Hadoop, making distributed data management
and retrieval possible.
All three can efficiently manage large data as base packages or with added extensions.
3. Data visualization:
● SAS is capable of graph plotting and creating basic diagrams, but its data
visualization features are simply operational.
● Python has extensions to Matplotlib and Seaborn that make constructing
custom graphs simple.
● R shines at data visualization with packages like ggplot2, map, Rvis, Rgis, etc.
It is the perfect tool for visualizing data.
For data visualization, R is the simple winner.
4. Cost-effectiveness:
● SAS is software which is licensed. It is incredibly expensive and out-of-reach
for people and small and medium-sized enterprises. It has minimal features,
but a free university version is available.
● Python is a language which is open-source. It can be used for free by learners,
as well as by practitioners.
● R is Source-Open. It can be used and contributed to by everyone. R is used by
most start-ups and large-scale firms, too.
As open source, R and Python have an advantage in terms of cost-effectiveness over SAS.
5. Customer service and community support:
● SAS has dedicated customer support that deals with both installation and
usage concerns. However, because of its cost, the group is not that big.
● Python is open source, and so it also has a wide community. However,
Python's prominence has risen in recent years and, thus, the population is not
as large as R's.
● R has no direct service personnel for clients, although it has a wide
community. There are people in the R community from almost all industries
and from all over the world. A response to the topic should be given by the
larger community.
While SAS may have dedicated customer service, in comparison with that of R's or Python's,
it is group pales.
6. Updates:
● Only with each new rollout version can SAS be updated. After rigorous
checking, however, the SAS team provides fresh features and has virtually no
errors.
● R and Python are open-sourced. New functionality can be added for those
with new packages and plugins. More quickly, they navigate the latest
updates.
The new updates are much quicker for R and Python, and everyone can add to them.
7. Market demand:
While SAS has previously been the leading player in data analytics in open corporate
jobs. The situation has been growing in recent years. More and more companies are
choosing open-source applications for people with expertise such as R and Python to
generate more work openings. Big businesses such as Ford use R along with Hadoop
for data processing. They need practitioners with skills in such technology. Python is
also used by the network and software creation industries in addition to data science
and attracts specialists with experience.
In corporate jobs, SAS has been a world leader in data mining. Open-source
technology has taken hold today, though. We expect the job possibilities to continue
to grow for R and Python.

Conclusion
References
[1] G. Piatetsky, "Four main languages for Analytics, Data Mining, Data Science," 18 August 
2014. [Online]. Available: https://www.kdnuggets.com/2014/08/four-main-languages analytics-
data-mining-data-science.html. [Accessed 10 November 2017]. 
[2] Burtch Works, "Burtch Works," Burtch Works, 19 June 2017. [Online]. Available: 
https://www.burtchworks.com/2017/06/19/2017-sas-r-python-flash-survey-results/.  [Accessed
11 November 2018]. 
[3] T. O'Reilly, "O'Reilly Media," 30 September 2005. [Online]. Available:
http://www.oreilly.com/pub/a/web/archive/what-is-web20.html. [Accessed 12 March  2018]. 
[4] CrowdFlower, "2017 Data Scientist Report," CrowdFlower, San Francisco, 2017. [5] L. M.
Ottenstein, V. B. Schneider and M. H. Halstead, "Predicting the Number of Bugs  Expected in a
Program Module," Purdue University, West Lafayette, 1976. [6] N. Bulnut, M. H. Halstead and
R. Bayer, "Experimental Validation of a Structural Property  of Fortran Algorithms," Purdue
University, West Lafayette (Ankara, Munich), 1974. [7] N. Bulnut and M. H. Halstead,
"Impurities Found in Algorithm Implementations," Purdue  Univiersity, West Lafayette, 1974. 
[8] T. J. McCabe, "A Complexity Measure," IEEE Transactions on Software Engineering, Fort 
Meade, 1976. 
[9] A. Measday, "npath - C Source Complexity Measures," GEONius, 25 June 2016. [Online]. 
Available: http://www.geonius.com/software/tools/npath.html. 
[10] G. Jay, J. E. Hale, R. K. Smith, D. Hale, N. A. Kraft and C. Ward, "Cyclomatic Complexity 
and Lines of Code: Empirical," Scientific Research, Tuscaloosa, 2009. 
[11] J. K. Millman and M. Aivazis, "Python for Scientists and Engineers," Computing in 
Science & Engineering, vol. 13, no. 12, pp. 9-12, 2011.  
[12] W. McKinney, "Chapter 1- Preliminaries," in Python for Data Analysis, Sebastopol, 
O'Reilly Media, 2013, p. 3. 
[13] P. Barlas, I. Lanning and C. Heavey, "A survey of open source data science tools," 
International Journal of Intelligent Computing and Cybernetics Information , vol. 8, no.  3, pp.
243-245, May-June 2015.  
[14] P. H. Vasconcellos, "Top 5 Python IDEs For Data Science," 22 June 2017. [Online]. 
Available: https://www.datacamp.com/community/tutorials/data-science-python-ide.  [Accessed
11 November 2017].

You might also like