Big Data

#What is R? Explain features of R.
===1)R is a well-developed, simple and effective programming language which

includes conditionals, loops, user defined recursive functions and input and output facilities. 2)R has an
effective data handling and storage facility, 3)R provides a suite of operators for calculations on arrays,
lists, vectors and matrices. 4)R provides a large, coherent and integrated collection of tools for data analysis.
#Define following Terms. 1)Population, 2) Sample, 3) Data Analysis ===1)Population,== The aggregate of all elements
under study having one or more common characteristic or Population is the collection of all individuals or items
under consideration in a statistical study.
2) Sample==Sample is a part of population chosen at random for participation in the study or Sample is that part of
the population from which information is collected.
3) Data Analysis ==="Data analytics is the science of drawing insights from raw information sources. Many of the
techniques and processes of data analytics have been automated into mechanical processes and algorithms that
work over raw data for human consumption".
#What is digital data? List its types. ===Digital data, in information theory and information systems, is the discrete,
discontinuous representation of information or works. Numbers and letters are commonly used representations.
Structured data ==is data whose elements are addressable for effective analysis. It has been organized into a
formatted repository that is typically a database. It concerns all data which can be stored in database SQL in a table
with rows
#Explain correlation with its types. === Correlation analysis is a method of statistical evaluation used to study the
strength of a relationship between two, numerically measured, continuous variables (e.g. height and weight). This
particular type of analysis is useful when we want to establish if there are possible connections between variables.
#Write an R program to find Sum, Mean and product of a Vector. === vec = c(1, 2, 3 , 4) print(mean(vec))
print(sum(vec)) print(prod(vec))
#Explain applications of Big Data. ===Financial Firms:Currently, capital firms are using advanced technology to store
huge volumes of data. But Increasing data sources like Internet and Social media require them to adopt big data
storage systems. Law Enforcement:Law enforcement officials try to predict the next crime
location using past data i.e., type of crime, place and time; social media data; drone and smartphone tracking.
#What is big data? === The data lying in the servers of your company was just data until yesterday - sorted and filed.
Suddenly, the slang Big Data got popular, a d now the data in your company is Big Data. The term covers each and
every piece of data your organization has stored till now. It includes data stored in clouds and even the URLS that
you bookmarked. Your company might not have digitized all the data. You may not have structured all the data
already. But then, all the digital, papers, structured and non- structured data with your company is now Big Data.
#What is data manipulation? === Data manipulation involves modifying data to make it easier to read and to be
more organized. We manipulate data for analysis and visualization. It is also used with the term 'data exploration'
which involves organizing data using available sets of variables.
#What is data science? === Data science is the study of data to extract meaningful insights for
business. It is a multidisciplinary approach that combines principles and practices from
the fields of mathematics, statistics, artificial intelligence, and computer engineering to
analyze large amounts of data. This analysis helps data scientists to ask and answer
questions like what happened, why it happened, what will happen, and what can be done
with the results.
#What is statistical Inference? === Statistical Inference is the process of using data analysis to deduce properties of
an underlying distribution of probability. Inferential statistical analysis infers properties of a population, for example
by testing hypotheses and deriving estimates. It is assumed that the observed data set is sampled from a larger
population.
#Define Machine learning. === Machine learning is a scientific discipline that is concerned with the design and
development of algorithms that allow computers to evolve behaviors based on empirical data, such as from sensor
data or databases".
#Define SVM? === A Support Vector Machine (SVM) is a discriminative classifier formally defined by a separating
hyperplane. In other words, given labeled training data (supervised learning), the algorithm outputs an optimal
hyperplane which categorizes new examples. In two dimensional space, this hyperplane is a line dividing a plane in
two parts where in each class lay in either side.
#What is the use of histogram? === When to use: Histogram is used to plot continuous variable. It breaks the data
into bins and shows frequency distribution of these bins. We can always change the bin size and see the effect it has
on visualization.
#What is data analysis? === Data analysis is defined as a process of cleaning, transforming, and modeling data to
discover useful information for business decision-making. The purpose of Data Analysis is to extract useful
information from data and taking the decision based upon the data analysis".
#Give advantages and disadvantages of Machine Learning. === advantages ==1. It is used in variety of applications
such as banking and financial sector, healthcare, retail, publishing and social media, robot locomotion, game playing
etc. 2. It has capabilities to handle multi-dimensional and multi-variety data in dynamic or uncertain
environments. 3. It allows time cycle reduction and efficient utilization of resources.
Disadvantages= 1. Acquisition of relevant data is the major challenge. Based on different algorithms data need to be
processed before providing as input to respective algorithms. This has significant impact on results to be achieved or
obtained. 2. It is impossible to make immediate accurate predictions with a machine learning system.
3. Machine learning needs a lot of training data for future prediction.
#State advantages and disadvantages of SVM. === Advantages:1. It works really well with clear margin of separation.
2. It is effective in high dimensional spaces. 3. It is effective in cases where number of dimensions is greater than
the number of samples. 4. It uses a subset of training points in the decision function (called support vectors), so it
is also memory efficient. Disadvantages=1. It does not perform well, when we have large data set because
the required training time is higher. 2) It also does not perform very well, when the data set has more noise i.e.
target classes are overlapping. 3. SVM does not directly provide probability estimates, these are calculated
using an expensive five-fold cross-validation.
#Explain types of regression models. === Simple Regression Analysis:1)It is used to estimate the relationship
between a dependent variable and a single independent variable. 2)Regression models that involve one
explanatory variable are called Simple Regression. (2) Multiple Regression Analysis: 1)It is used to
estimate the relationship between a dependent variable and two or more independent variables. 2)When two
or more explanatory variables are involved, the relationships are called Multiple Regressions. 3)For example, the
relationship between the salaries of employees and their experience and education.
#Explain Naive Bayes with the help of example. === 1)Naïve Bayes algorithm is a Supervised Learning Algorithm.
2)It is a classification technique based on Bayes' Theorem with an assumption of independence among predictors.
3)In simple terms, a Naïve Bayes classifier assumes that the presence of a particular feature in a class is unrelated to
the presence of any other feature.
#What is data visualization? ===Data visualization is a technique used for the graphical representation of data. By
using elements like scatter plots, charts, graphs, histograms, maps, etc., we make our data more understandable.
Data visualization makes it easy to recognize patterns, trends, and exceptions in our data. It enables us to convey
information and results in a quick and visual way.
#Tools used in Big Data. === NOSQL: Databases MongoDB, CouchDB, Cassandra, Redis, BigTable, Hbase, Hypertable,
Voldemort, Riak, ZooKeeper. MapReduce: Hadoop, Hive, Pig, Cascading, Cascalog, mrjob,
Caffeine, S4, MapR, Acunu, Flume, Kafka, Azkaban, Oozie, Greenplum. Storage: S3, Hadoop
Distributed File System. Server: EC2, Google App Engine, Elastic, Beanstalk, Heroku.
#What is the term ‘big data’? Ans= Big data refers to data that is so large, fast or complex that it's difficult or
impossible to process using traditional methods. The act of accessing and storing large amounts of information for
analytics has been around for a long time.
#What are the five V’s of Big Data? Ans= Big data is a collection of data from many different sources and is often
describe by five characteristics: volume, value, variety, velocity, and veracity.
#What is big data analytics? Ans= Big data analytics is the process of collecting, examining, and Analyzing amounts of
data to discover market trends, insights, and patterns that can help companies make better business decisions.
#What is digital data? Ans = Digital data is data that represents other forms of data using specific machine language
systems that can be interpreted by various technologies.
#Write Advantages and disadvantages of big data. Ans= advantages of Big Data:➨Big data analysis derives
innovative solutions. Big data analysis helps in understanding and targeting customers. It helps in optimizing
business processes. ➨It helps in improving science and research. ➨It improves healthcare and public health with
availability of record of patients. ➨It helps in financial trading, sports, polling, security/law enforcement etc.
Disadvantages of Big Data ➨ Traditional storage can cost lot of money to store big data. ➨Lots of big data is
unstructured. ➨Big data analysis violates principles of privacy. ➨It can be used for manipulation of customer
records. ➨It may increase social stratification.
#Enlist Phases of data analytics life cycle. Ans= 1: Discovery 2: Data Preparation 3: Model Planning 4: Model Building
5: Communication Results 6: Operationalize
#What is data science. Ans= Data science is the study of data to extract meaningful insights for business. It is a
multidisciplinary approach that combines principles and practices from the fields of mathematics, statistics, artificial
intelligence, and computer engineering to analyze large amounts of data.
#Explain need of data analytics. Ans=The role of data analytics is to extract and catalogue data, so that organisations
can pinpoint and evaluate relationships, patterns and trends so they can glean insights and draw conclusions based
on the data and use these to make informed decisions.
#Explain life cycle of data analytics. Ans= Data Analytics Lifecycle defines the roadmap of how data is generated,
collected, processed, used, and analyzed to achieve business goals. It offers a systematic way to manage data for
converting into information can be used to fulfill organizational and project goals.
#What is statistical modelling? Ans= A statistical model can provide intuitive visualizations that aid data scientists in
identifying relationships between variables and making predictions by applying statistical models to raw data.
Examples of common data sets for statistical analysis include census data, public health data, and social media data.
#Explain probability distribution modelling. Ans= Probability distribution yields the possible outcomes for any
random event. It is also defined based on the underlying sample space as a set of possible outcomes of any random
experiment. These settings could be a set of real numbers or a set of vectors or a set of any entities
#What is machine learning? Ans= Machine learning (ML) is a branch of artificial intelligence (AI) that enables
computers to “self-learn” from training data and improve over time, without being explicitly programmed. Machine
learning algorithms are able to detect patterns in data and learn from them, in order to make their own predictions.
#What supervised machine learning? Ans=Supervised learning is the types of machine learning in which machines
are trained using well "labelled" training data, and on basis of that data, machines predict the output. The labelled
data means some input data is already tagged with the correct output.
#Short note naïve Baye’s. ans= Naive Bayes is a probabilistic technique for constructing classifiers. The characteristic
assumption of the naive Bayes classifier is to consider that the value of a particular feature is independent of the
value of any other feature, given the class variable.
#What is Regression analysis? Ans= Regression analysis is a set of statistical methods used for the estimation of
relationships between a dependent variable and one or more independent variables. It can be utilized to assess the
strength of the relationship between variables and for modeling the future relationship between them.
#Describe KNN in detail. Ans= The k-nearest neighbors algorithm, also known as KNN or k-NN, is a non-parametric,
supervised learning classifier, which uses proximity to make classifications or predictions about the grouping of an
individual data point.
#What is non-linear regression? Ans= Nonlinear regression is a form of regression analysis in which data is fit to a
model and then expressed as a mathematical function. Simple linear regression relates two variables (X and Y) with a
straight line (y = mx + b), while nonlinear regression relates the two variables in a nonlinear (curved) relationship.
#Short note linear regression. Ans= Linear regression analysis is used to predict the value of a variable based on the
value of another variable. The variable you want to predict is called the dependent variable. The variable you are
using to predict the other variable's value is called the independent variable.
#Short note: Apriori algorithm. Ans= Apriori is an algorithm for frequent item set mining and association rule
learning over relational databases. It proceeds by identifying the frequent individual items in the database and
extending them to larger and larger item sets as long as those item sets appear sufficiently often in the database.
#What is WEKA? Ans=Weka is a collection of machine learning algorithms for data mining tasks. The algorithms can
either be applied directly to a dataset or called from own Java code.
#What is Pipe Operator? Ans= The pipe operator is a special operational function available under the magrittr and
dplyr package (basically developed under magrittr), which allows us to pass the result of one function/argument to
the other one in sequence.
#What is Histograms? Ans=
#What is Box Plot? Ans= When we display the data distribution in a standardized way using 5 summary – minimum,
Q1 (First Quartile), median, Q3(third Quartile), and maximum, it is called a Box plot.
#What is base R Graphics? Ans= In R, graphs are typically created interactively. Creating a new graph by issuing a
plotting command, such as plot() , hist() , boxplot() , among others, will typically overwrite a previous graph. In
addition one can specify fonts, colors, line styles, axes, reference lines, etc.
#Explain different types of data analytics. ===
#Explain the process of data analysis. ===
#Explain data frame with example. ===
#Explain function including in “dplyr” package. ===
#Advantages of big data. ===
#Advantages and disadvantages of EM algorithms. ===
#What is population? ===
#What is operators in R? ===
#Define array in R? ===
#Define sample. ===
#What is machine learning? ===
#Define data frame. ===
#Define market basket analysis. ===
#What is data analytics. ===

#Define head () and tail (). ===
#Enlistdata types in R? ===
#Explain probability in details. ===
#Explain the types of analytics. ===
#Explain correlation with its type. ===
#Explain the application of big data. ===
#How Naïve bayes algorithm works. ===
#Explain decision tree with example. ===
#Explain support vector machine with example. ===
#Explain digital data with its types. ===
#Explain association rule mining. ===
#Data manipulation function. ===
#Loops in R. ===
#Write an R program to find out number is positive or negative. ===
#Write an R program to sort a vector in ascending and descending order. ===
#Write an R program to print Multiplication table of 2. ===
#Write an r program to check number is Armstrong or not. ===

Big Data

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Big Data

Uploaded by

Copyright:

Available Formats

#What is R? Explain features of R.

===1)R is a well-developed, simple and effective programming language which

#What is Histograms? Ans=

#Explain different types of data analytics. ===

#Explain the process of data analysis. ===

#Explain data frame with example. ===

#Explain function including in “dplyr” package. ===

#Advantages of big data. ===

#Advantages and disadvantages of EM algorithms. ===

#What is population? ===

#What is operators in R? ===

#Define array in R? ===

#Define sample. ===

#What is machine learning? ===

#Define data frame. ===

#Define market basket analysis. ===

#What is data analytics. ===

#Enlistdata types in R? ===

#Explain probability in details. ===

#Explain the types of analytics. ===

#Explain correlation with its type. ===

#Explain the application of big data. ===

#How Naïve bayes algorithm works. ===

#Explain decision tree with example. ===

#Explain support vector machine with example. ===

#Explain digital data with its types. ===

#Explain association rule mining. ===

#Data manipulation function. ===

#Write an R program to find out number is positive or negative. ===

#Write an R program to sort a vector in ascending and descending order. ===

#Write an R program to print Multiplication table of 2. ===

#Write an r program to check number is Armstrong or not. ===

You might also like