Professional Documents
Culture Documents
KIT-601
STREAM COMPUTING
• Stream computing is a way to analyze and process Big Data in real time to gain current
insights to take appropriate decisions or to predict new trends in the immediate future
• Stream computing is a computing paradigm that reads data from collections of software or
• Stream computing uses software programs that compute continuous data streams.
• Stream computing is one effective way to support Big Data by providing extremely low-
• It is becoming the fastest and most efficient way to obtain useful knowledge from Big Data.
SAMPLING
guarantee that the sample is representative of the entire population, the target population must be
2. Selecting a sampling technique. The next step is to choose the best sampling method based on the
research question and the characteristics of the population under study. There are several methods for
drawing samples from data such as simple random sampling, cluster sampling, stratified sampling and
systematic sampling.
3. Determining the sample size. The optimum sample size required to produce accurate and reliable results
should be decided in this phase. This decision may be influenced by certain factors, such as money, time
constraints and the requirement for greater precision. The sample size should be large enough to be
representative of the population, but not so large that it becomes impractical to work with.
DATA SAMPLING PROCESS
4. Collecting the data. The data is collected from the sample using the
5. Analyzing the sample data. After collecting the data sample, it's processed
and analyzed to draw conclusions about the population. The results of the
• In systematic sampling, every nth unit from the population is taken. That means a
sample from the population is selected at every regular interval. The starting point
is selected randomly and after that, every nth element is selected. In the below
figure, n=3, so every 3rd element is selected.
• Say our population size is x and we have to select a sample size of n. Then, the next
individual that we will select would be x/nth intervals away from the first individual.
STRATIFIED SAMPLING
In this type of sampling, we divide the population into subgroups (called
strata) based on different traits like gender, category, etc. And then we select
the sample(s) from these subgroups.
• In this type of sampling, the whole population is divided into some groups
or clusters. Units with similar characteristics are kept in one cluster. For example,
People can be grouped according to their age or country.
• These clusters are also known as strata. Now, the researcher will pick some strata
(according to the requirement and resources) randomly and perform his research
on that.
NON-PROBABILITY SAMPLING
1. AVAILABILITY SAMPLING: This is also known as convenience sampling. This occurs
when the researcher selects the samples based on availability. For example: If a
student wants to do research on how many college students are using the canteen
for lunch. He will select his own college and nearby colleges to do the survey.
The researcher selects this technique when they feel that other sampling