You are on page 1of 19

STREAM SAMPLING

KIT-601
STREAM COMPUTING
• Stream computing is a way to analyze and process Big Data in real time to gain current

insights to take appropriate decisions or to predict new trends in the immediate future

• Stream computing is a computing paradigm that reads data from collections of software or

hardware sensors in stream form and computes continuous data streams.

• Stream computing uses software programs that compute continuous data streams.

• Stream computing is one effective way to support Big Data by providing extremely low-

latency velocities with massively parallel processing architectures.

• It is becoming the fastest and most efficient way to obtain useful knowledge from Big Data.
SAMPLING

Sampling is a method that allows us to get information about the


population based on the statistics from a subset of the population
(sample), without having to investigate every individual.
STREAM SAMPLING
It is the practice of selecting an individual group from a population to study the whole
population

Stream sampling is the process of collecting a representative sample of the elements


of a data stream. The sample is usually much smaller than the entire stream, but can
be designed to retain many important characteristics of the stream, and can be used
to estimate many important aggregates on the stream. Every sampling type comes
under two broad categories:
• Probability sampling - Random selection techniques are used to select the sample.

• Non-probability sampling - Non-random selection techniques based on certain criteria are


used to select the sample.
DATA SAMPLING PROCESS
DATA SAMPLING PROCESS
1. Defining the population. The population is the entire set of data from which the sample is drawn. To

guarantee that the sample is representative of the entire population, the target population must be

precisely defined, including all essential traits and criteria.

2. Selecting a sampling technique. The next step is to choose the best sampling method based on the

research question and the characteristics of the population under study. There are several methods for

drawing samples from data such as simple random sampling, cluster sampling, stratified sampling and

systematic sampling.

3. Determining the sample size. The optimum sample size required to produce accurate and reliable results

should be decided in this phase. This decision may be influenced by certain factors, such as money, time

constraints and the requirement for greater precision. The sample size should be large enough to be

representative of the population, but not so large that it becomes impractical to work with.
DATA SAMPLING PROCESS
4. Collecting the data. The data is collected from the sample using the

sampling approach that was chosen, such as interviews, surveys or

observations. This may entail random selection or other stated criteria,

depending on the research question. For example, in random sampling, data

points are selected at random from the population.

5. Analyzing the sample data. After collecting the data sample, it's processed

and analyzed to draw conclusions about the population. The results of the

analysis are then generalized or applied to the entire population.


TYPES OF SAMPLING TECHNIQUES
• PROBABILITY SAMPLING: In probability sampling, every element of
the population has an equal chance of being selected. Probability
sampling gives us the best chance to create a sample that is truly
representative of the population

• NON-PROBABILITY SAMPLING: In non-probability sampling, all


elements do not have an equal chance of being selected.
Consequently, there is a significant risk of ending up with a non-
representative sample which does not produce generalizable results
TYPES OF SAMPLING TECHNIQUES
SIMPLE RANDOM SAMPLING
This is a type of sampling technique you must have come across at
some point. Here, every individual is chosen entirely by chance and
each member of the population has an equal chance of being selected.

In this type of sampling, members are chosen randomly from the


population, merely by chance. This can be done by either putting chits
in a bowl like a lottery system or spinning the wheel. The advantage of
simple random sampling it that it is easy cost-efficient, reliable and
represents the whole population.
SYSTEMATIC SAMPLING
• In this type of sampling, the first individual is selected randomly and others are
selected using a fixed ‘sampling interval’. Let’s take a simple example to understand
this.

• In systematic sampling, every nth unit from the population is taken. That means a
sample from the population is selected at every regular interval. The starting point
is selected randomly and after that, every nth element is selected. In the below
figure, n=3, so every 3rd element is selected.

• Say our population size is x and we have to select a sample size of n. Then, the next
individual that we will select would be x/nth intervals away from the first individual.
STRATIFIED SAMPLING
In this type of sampling, we divide the population into subgroups (called
strata) based on different traits like gender, category, etc. And then we select
the sample(s) from these subgroups.

Example: If we want to find a review of a book in a country. We can divide the


population according to the age groups like 18-25years, 25-35 years, 35-
45years, 45-55 years and 55-65 years. Each age group represents each
stratum. Then, a particular number of members is selected from each age
group to take a review of the book. These members are the final samples.
CLUSTER SAMPLING
In a clustered sample, we use the subgroups of the population as the sampling unit
rather than individuals. The population is divided into subgroups, known as clusters,
and a whole cluster is randomly selected to be included in the study.

• In this type of sampling, the whole population is divided into some groups
or clusters. Units with similar characteristics are kept in one cluster. For example,
People can be grouped according to their age or country.

• These clusters are also known as strata. Now, the researcher will pick some strata
(according to the requirement and resources) randomly and perform his research
on that.
NON-PROBABILITY SAMPLING
1. AVAILABILITY SAMPLING: This is also known as convenience sampling. This occurs

when the researcher selects the samples based on availability. For example: If a

student wants to do research on how many college students are using the canteen

for lunch. He will select his own college and nearby colleges to do the survey.

2. JUDGMENTAL SAMPLING: It is also called purposive sampling. In this samples are

selected on basis of the researcher’s own knowledge, experience and intuition.

The researcher selects this technique when they feel that other sampling

techniques are time-consuming and he is confident about his knowledge.


NON-PROBABILITY SAMPLING
4. QUOTA SAMPLING: In this type of sampling, the researcher divides the
population into some quotas according to some characteristic and select the
members from each quota.
5. SNOWBALL SAMPLING: This is also known as chain-referral sampling. In this,
reference from existing samples is taken to collect the samples.
• For Example: if a person is doing a survey on a rare disease and he knows only a few patients,
then he can take the contacts of other persons from these patients and in this way, using snowball
sampling, researchers can get in touch with these hard-to-contact sufferers.

You might also like