You are on page 1of 30

ASSIGNMENT 1 FRONT SHEET

Qualification BTEC Level 5 HND Diploma in Computing

Unit number and title Unit 06: Planning a computing project

Submission date Date Received 1st submission

Re-submission Date Date Received 2nd submission

Student Name LAM XUAN PHUONG NAM Student ID BH00902

Class Assessor name NGUYEN TRONG HUNG

Student declaration

I certify that the assignment submission is entirely my own work and I fully understand the consequences of plagiarism. I
understand that making a false declaration is a form of malpractice.

Student’s signature NAM

Grading grid

P1 P2 P3 P4 M1 M2 D1
❒ Summative Feedback: ❒ Resubmission Feedback:

Grade: Assessor Signature: Date:

IV Signature:
Table of Contents
A. INTRODUCTION............................................................................................................................................................................................... 5
B. BODY................................................................................................................................................................................................................ 13
I.P1 Demonstrate qualitative and quantitative research methods to generate relevant primary data for an identified theme.................................13
1.What Is a Primary Source? (primary source, 2023).........................................................................................................................................13
2.Primary source in bigdata............................................................................................................................................................................... 14
II.P2 Examine secondary sources to collect relevant secondary data and information for an identified theme..........................................................15
1.What Is a Secondary Source? (secondary source, 2023).................................................................................................................................15
2.Secondary source in bigdata............................................................................................................................................................................ 16
III. P3 Discuss the features and operational areas of a businesses in an identified sector......................................................................................17
IV.P4 Discuss the role of stakeholders and their impact on the success of a business...........................................................................................28
C. Conclusion......................................................................................................................................................................................................... 29
References................................................................................................................................................................................................................. 30
Figure 1:BIGDATA..................................................................................................................................................................................................... 6
Figure 2:5V.................................................................................................................................................................................................................. 7
Figure 3:Volume.......................................................................................................................................................................................................... 8
Figure 4:Velocity......................................................................................................................................................................................................... 9
Figure 5:Variety......................................................................................................................................................................................................... 10
Figure 6:Veracity....................................................................................................................................................................................................... 11
Figure 7:value............................................................................................................................................................................................................ 12
Figure 8:Primary........................................................................................................................................................................................................ 14
Figure 9:Secondary source......................................................................................................................................................................................... 16
Figure 10:Hadoop Distributed File System (HDFS)..................................................................................................................................................17
Figure 11:NoSQL databases...................................................................................................................................................................................... 18
Figure 12:Apache Spark............................................................................................................................................................................................ 19
Figure 13:Apache Flink............................................................................................................................................................................................. 19
Figure 14:Apache Storm............................................................................................................................................................................................ 20
Figure 15:Elasticsearch.............................................................................................................................................................................................. 21
Figure 16:Apache Solr............................................................................................................................................................................................... 22
Figure 17:Tableau...................................................................................................................................................................................................... 23
Figure 18:Power BI.................................................................................................................................................................................................... 24
Figure 19:TensorFlow................................................................................................................................................................................................ 25
Figure 20:scikit-learn................................................................................................................................................................................................. 26
Figure 21:Data encryption......................................................................................................................................................................................... 27
Figure 22:network security........................................................................................................................................................................................ 27
A. INTRODUCTION
Big Data or "big data" is a term that refers to data sets that are so huge and complex that they are difficult to process using traditional
methods. Businesses use this huge amount of data to analyze and convert it into important information to solve related problems.

Big Data actually formed around the 80s and 90s of the twentieth century. In 1984, Teradata Corporation introduced the DBC 1012
parallel data processing system to the market. Teradata's processing systems were among the first to store and analyze up to 1 terabyte
of data in 1992. Hard drives also reached 2.5GB capacity in 1991.

In 2000, Seisint Inc (now LexisNexis Corporation) developed a C++-based file sharing framework for storing and querying data. This
system stores and distributes structured, semi-structured, and unstructured data across multiple servers. In 2004, Google published a
paper on the MapReduce process, providing a parallel processing model and releasing related applications to process huge amounts of
data.

In 2005, many businesses began to realize the huge number of users generated through Youtube, Facebook and other online services.
That same year, Hadoop (an open source framework created specifically for the task of storing and analyzing Big Data) was
developed and NoSQL also started to become popular. The development of frameworks such as Hadoop (or more recently Spark) is
essential to the development of Big Data, they make Big Data easier to operate and cheaper to store.

Currently, thanks to the Internet of Things, the volume of Big Data is increasingly large with extremely fast loading speeds. The
reason is because today's data is not only created by humans but also created automatically by machines. Big Data has become a
valuable resource for businesses, especially e-commerce businesses, helping businesses increase their competitive advantage and serve
customers better.

Figure 1:BIGDATA
5 characteristics of Big Data (5V, 2023)

In recent years, big data was defined by 3V but now there are 5V which are also called Big Data characteristics. So what is the 5V
characteristic of big data? These are Volume, Velocity, Variety, Veracity, and Value. We will learn about these 5 characteristics
together
Figure 2:5V
Volume

The name Big Data itself is associated with a very large size.

Volume is a large amount of data.

To determine the value of data, the size of the data plays a very important role. If the volume of data is very large then it is actually
considered Big Data. This means that whether a particular data is actually considered Big Data or not depends on the volume of data.

Therefore, while processing Big Data it is necessary to consider a characteristic volume.

For example, in 2016, global mobile traffic was estimated to be 6.2 Exabytes (6.2 billion GB) per month. In 2020, we will have almost
40000 ExaBytes of data.
Figure 3:Volume

Velocity

Velocity refers to high-speed data accumulation.

In Big Data, transmission speed data comes from sources such as machines, networks, social networks, mobile phones, etc.

There is a large and continuous stream of data. This determines the potential of data through the rate at which it is generated and
processed to meet demand.

Data sampling can help solve the velocity problem.

For example: There are over 3.5 billion searches per day on Google. Additionally, Facebook users are growing at about 22% annually.
Figure 4:Velocity

Variety

It refers to the nature of data as structured, semi-structured and unstructured data.

It also deals with heterogeneous sources.

Diversity is essentially the emergence of data from new sources both inside and outside the business. It can be structured, semi-
structured and unstructured.

Structured Data: This data is basically organized data, i.e. data that has defined length and format of the data.
Semi-structured data: This data is essentially semi-organized data. It is often a form of data that does not conform to the formal
structure of the data. Log files are examples of this type of data.

Unstructured data: This data is basically unorganized data. It is data that does not fit neatly into the traditional row and column
structure of a database. Text, images, videos, etc. are examples of unstructured data that cannot be stored in rows and columns.

Figure 5:Variety

Veracity

This characteristic refers to inconsistency and uncertainty in data, i.e. available data can sometimes be messy, and the quality and
accuracy are difficult to control.

Big Data is also subject to change due to the multitude of data dimensions generated by many different data sources and types.

For example, Bulk data can create confusion while smaller amounts of data can convey half or incomplete information.
Figure 6:Veracity

Value

Mostly valueless data won't benefit the company unless you turn it into something useful.

Data by itself has no use value or importance but it needs to be converted into something valuable to extract information.
Figure 7:value
B. BODY

I.P1 Demonstrate qualitative and quantitative research methods to generate relevant primary data for an identified theme.

1.What Is a Primary Source? (primary source, 2023)


A primary source is an original material created at the time a historical event occurs, or soon afterward, and can be original
documents, creative works, material published in modern times, institutional and government documents, or relics and artifacts.
Authors citing primary sources relay the subjective interpretation of a witness to an event, which allows historians to use the materials
to interpret and analyze the past.
Diaries, letters, memoirs, personal journals, speeches, manuscripts, direct interviews, and other unpublished works can be primary
sources and typically serve as the main objects of an analysis or research work. Published pieces, including newspaper or magazine
articles, photographs, audio or video recordings, research reports in the natural or social sciences, or original literary or theatrical
works are all considered primary sources.
An example of how a primary source is used includes the collection of research associated with the spread of a particular disease and
the use of source material that may include medical statistical data, interviews with medical experts and patients, and laboratory
results. In cases of research related to historical events, an author may not be able to access direct evidence because the people
associated with the event may no longer be alive, but sources produced by witnesses at that time may be used. This includes
photographs, video footage, letters, diary entries, and newspaper reports at the time of the event.
Figure 8:Primary

2.Primary source in bigdata


In Big Data, primary source is the term for large and complex data sets that traditional data processing applications cannot process. It
includes analysis, collection, monitoring, search, sharing, storage, transmission, visualization, query and privacy protection. It helps us
find new correlations in business, health, crime prevention and many other fields. ETL (Extract, Transform, Load) tool is an important
tool in extracting and transforming data from primary sources for use in different systems.

• Industrial sensors, weather sensors, and environmental sensors are examples of IoT devices that provide data through IoT sensors.

• LinkedIn, Facebook, Twitter, and Instagram are all sites that generate data from user activities - this is commonly referred to as
social media data.

• Online applications and websites: Interactions and user access data gleaned from online applications and websites.
• Business information: Information related to transactions, customer data, and other enterprise database entries.

• Measurement tools and scientific instruments: Gathering information from fields like physical sciences, medicine and other areas of
study.

• Online transactions, surveys, and tests are some of the direct sources for collecting data.

II.P2 Examine secondary sources to collect relevant secondary data and information for an identified theme.

1.What Is a Secondary Source? (secondary source, 2023)


Secondary sources are created by someone who did not experience firsthand or participate in the events or conditions being
researched. Secondary sources are used to interpret and analyze primary sources. These sources are one or more steps removed from
the event and may contain pictures, quotes, or graphics of primary sources. They are used to interpret, assign value to, conjecture
upon, and draw conclusions about the events reported in primary sources. Textbooks, edited works, books, and articles that interpret or
review research works, histories, biographies, literary criticism and interpretation, reviews of law and legislation, political analyses,
and commentaries are all examples of secondary sources.
Authors of research studies cite secondary sources to support arguments, formulate new theories, or argue against existing information
in the field. Using secondary sources, researchers reinforce theories or arguments based on primary sources.
Figure 9:Secondary source

2.Secondary source in bigdata


• Academic documents, market reports, and scientific studies are all fair game when it comes to mining data for research purposes.
References and published research materials also become valuable resources for information.

• Examples of enterprise data available include financial and customer data gathered from enterprise systems, such as CRM and
financial systems, capturing transactional information.

• Data from various sources such as social networks, media applications, and websites are compiled to form auxiliary data. This may
involve scrutinizing data extracted from a social network, data from media endeavours, and data from websites.

• Financial and statistical intel: Economy statuses from governmental sources, finance institutions, and other monetary origins,
providing transactional intel, income specifics, and cost digits.
• Information from surveys, experiments, and research projects are frequently utilized as sources for secondary data creation. These
sources could be scientific research projects or social statistics-based research.

• Secondary data is typically derived from various online communities and open source ventures. This frequently involves utilizing
data from online forums and open source projects.

III. P3 Discuss the features and operational areas of a businesses in an identified sector.
 Technology plays an important role in the development of Big Data. Here are some technologies that contribute to this process:
 Storage technology: Storage systems such as Hadoop Distributed File System (HDFS) and NoSQL databases allow
storing and managing large amounts of data.

Figure 10:Hadoop Distributed File System (HDFS)


Figure 11:NoSQL databases

 Data processing and analysis technology: Technologies such as Apache Spark, Apache Flink and Apache Storm allow
Big Data to be processed and analyzed quickly and effectively.
Figure 12:Apache Spark

Figure 13:Apache Flink


Figure 14:Apache Storm

 Query and search technology: Elasticsearch and Apache Solr allow searching and querying Big Data data quickly and
accurately.
Figure 15:Elasticsearch
Figure 16:Apache Solr

 Data visualization technology: Technologies like Tableau and Power BI help represent Big Data in an intuitive and
easy-to-understand way.
Figure 17:Tableau
Figure 18:Power BI

 Machine Learning and AI Technology: Technology like TensorFlow and scikit-learn help analyze Big Data and build
machine learning and artificial intelligence models.
Figure 19:TensorFlow
Figure 20:scikit-learn

 Security technology: Data encryption, network security monitoring and access control technology helps protect Big
Data from threats and intrusions.
Figure 21:Data encryption

Figure 22:network security


 Not only the above technologies, there are also many other technologies such as IoT technology (Internet of Things),
blockchain technology and cloud computing technology that also contribute to the development of Big Data. All of these
technologies together contribute to creating an increasingly strong Big Data development technology ecosystem.

IV.P4 Discuss the role of stakeholders and their impact on the success of a business
 There are some common difficulties and solutions in the Big Data field. Here are some examples:
 Handling big data

Difficulty: Handling big data - Big data is defined as data that is large in size and grows rapidly. This creates challenges in
storing, processing and analyzing data.

Solution: Use distributed technology and parallel data storage to process big data. Big Data tools and systems like Hadoop and
Spark are developed to process and analyze big data efficiently.

 Data reliability and quality

Difficulty: Data reliability and quality - Big data can include heterogeneous and uncertain data sources. This raises issues
about the reliability and quality of the data.

Solution: Develop a data auditing process to eliminate inaccurate and incomplete data. Apply specialized algorithms and tools
to clean and preprocess data before being analyzed.

 Security and privacy

Difficulty: Security and privacy - Big data contains a lot of sensitive and personal information. Protecting data and ensuring
privacy is an important issue in this sector.

Solution: Apply data security measures such as data encryption, access management, and authentication technology to ensure
data safety. Legal regulations and privacy policies should also be complied with.

 Understanding and mining data

Difficulty: Understanding and mining data - Big data contains large amounts of information, and understanding and mining
knowledge from this data can require an understanding of complex data analysis tools and techniques miscellaneous.
Solution: Train and prepare human resources with expertise and expertise in data analysis. Use advanced data analysis tools
and techniques such as machine learning and data mining to extract knowledge from data.

 The challenges and solutions in Big Data can vary depending on the context and project scale. However, understanding and
dealing with these challenges is necessary to successfully use Big Data.

C. Conclusion
Into the age of digitalization and the proliferation of information, Big Data has emerged as a powerful force that is impacting the
economy and society. This report offers comprehensive insights into the promising prospects, challenges, applications, and origins of
Big Data.

Utilizing Big Data is more than just processing vast datasets. It involves discovering, analyzing, and strategically utilizing information
within data to generate value. A vast number of businesses and organizations have harnessed Big Data to foster innovation, enhance
operational efficiency, and create novel solutions for intricate issues. Keep in mind that Big Data's utilization goes beyond just data
processing.

Constantly expanding is our capacity to comprehend and dissect data with the rise of artificial intelligence and machine learning, and
the potential for Big Data is equally remarkable. Thus, its influence is set to grow even further and span domains such as healthcare,
education, and the management of smart cities.

Maximizing the societal benefits of Big Data necessitates both a properly trained workforce and appropriate supporting policies. It is
crucial to equip individuals with the necessary skills through education and training to access and utilize data effectively. Creating a
favorable business environment through policies can aid entities using Big Data.

Big Data holds the power to drive modern-world development. It provides us with the opportunity to unlock a brighter future if we can
effectively harness its benefits. However, this responsibility requires us to act collectively and with a sense of awareness, ensuring that
Big Data is used in appropriate and advantageous ways.
References
5V, 2023. http://iottuonglai.com/5-dac-trung-cua-big-data.html. [Online].

primary source, 2023. https://www.wgu.edu/blog/what-difference-between-primary-secondary-source2304.html#close. [Online].

secondary source, 2023. https://www.wgu.edu/blog/what-difference-between-primary-secondary-source2304.html. [Online].

You might also like