Big Data Assignment

BIG DATA ASSIGNMENT
IN PARTIAL FULFILLMENT OF THE POST GRADUATE DEGREE

"MASTER OF FASHION MANAGEMENT (MFM)"
Department: Fashion Management Studies
Semester: 03
Submitted To: Dr. GULNAZ BANU .P
Submitted By: Shruti Jhunjhunwala
(MFM/19/401)
Batch: 2019 – 21
NIFT Bengaluru
CERTIFICATE
This is to certify that, Shruti Jhunjhunwala student of Masters of Fashion Management, semester 3,
Batch 2019- 2021 of National Institute of Fashion Technology (NIFT), Bengaluru have successfully
completed report on “Big Data”, under the guidance of Dr. Gulnaz Banu .P towards the fulfilment of
this project.
Dr. Gulnaz Banu .P
Professor
Department of FMS
NIFT, Bengaluru
Q1. What is Machine Learning?

Machine learning is the study of computer algorithms that allow computer programs to
automatically improve through experience. Machine-learning algorithms use statistics to find
patterns in massive amounts of data. And data, here, encompasses a lot of things like numbers,
words, images, clicks. If it can be digitally stored, it can be fed into a machine-learning
algorithm.
Machine learning is the process that powers many of the services we use today for example
recommendation systems like those on Netflix, YouTube, and Spotify, search engines like
Google, social-media feeds like Facebook and Twitter, voice assistants like Siri and Alexa and
many more.
In all of these instances, each platform is collecting as much data about us as possible like what
genres we like watching, what links we are clicking, which statuses we are reacting to and using
machine learning to make a highly educated guess about what we might want next. Or, in the
case of a voice assistant, about which words match best with the funny sounds coming out of our
mouth.
Machine learning involves computers discovering how they can perform tasks without being
explicitly programmed to do so. It involves computers learning from data provided so that they
carry out certain tasks.
For simple tasks assigned to computers, it is possible to program algorithms telling the machine
how to execute all steps required to solve the problem at hand; on the computer's part, no
learning is needed. For more advanced tasks, it can be challenging for a human to manually
create the needed algorithms. In practice, it can turn out to be more effective to help the
machine develop its own algorithm, rather than having human programmers specify every
needed step.
Data mining is a related field of study, focusing on exploratory data

analysis through unsupervised learning. In its application across business problems, machine
learning is also referred to as predictive analytics.
Simple Definition: Machine learning is an application of artificial intelligence (AI) that

provides systems the ability to automatically learn and improve from experience without being
explicitly programmed. Machine learning focuses on the development of computer programs
that can access data and use it to learn for themselves.
Machine learning approaches
Machine learning approaches are traditionally divided into three broad categories, depending
on the nature of the "signal" or "feedback" available to the learning system:
 Supervised learning: The computer is presented with example inputs and their
desired outputs, given by a "teacher", and the goal is to learn a general rule
that maps inputs to outputs.
 Unsupervised learning: No labels are given to the learning algorithm, leaving it
on its own to find structure in its input. Unsupervised learning can be a goal in itself
(discovering hidden patterns in data) or a means towards an end (feature learning).
 Reinforcement learning: A computer program interacts with a dynamic
environment in which it must perform a certain goal (such as driving a vehicle or playing
a game against an opponent). As it navigates its problem space, the program is provided
feedback that's analogous to rewards, which it tries to maximize.
Q2. What is the difference Between ML and AI?

Artificial Intelligence: The word Artificial Intelligence comprises of two words “Artificial”
and “Intelligence”. Artificial refers to something which is made by human or non-natural thing
and Intelligence means ability to understand or think. There is a misconception that Artificial
Intelligence is a system, but it is not a system .AI is implemented in the system. There can be so
many definition of AI; one definition can be “It is the study of how to train the
computers so that computers can do things which at present human can do better.
“Therefore it is intelligence where we want to add all the capabilities to machine that human
contain.
Machine Learning: Machine Learning is the learning in which machine can learn by its own
without being explicitly programmed. It is an application of AI that provides system the ability
to automatically learn and improve from experience. Here we can generate a program by
integrating input and output of that program. One of the simple definition of the Machine
Learning is “Machine Learning is said to learn from experience E w.r.t some class of
task T and a performance measure P if learners performance at the task in the
class as measured by P improves with experiences.”
As you can see on the above image of three

concentric circles, DL is a subset of ML, which is also
a subset of AI.
For example, here is a table that identifies the type of fruit based on its characteristics:
As you can see on the table above,
the fruits are differentiated based on their weight and texture. However, the last row gives only
the weight and texture, without the type of fruit. And, a machine learning algorithm can be
developed to try to identify whether the fruit is an orange or an apple. After the algorithm is fed
with the training data, it will learn the differing characteristics between an orange and an apple.
Therefore, if provided with data of weight and texture, it can predict accurately the type of fruit
with those characteristics.
Q3. What is Big data, Five V’s and recent advancements in Big Data?
There is no place where Big Data does not exist! The curiosity about what is Big Data has been
soaring in the past few years. Let me tell you some mind-boggling facts! Forbes reports that
every minute, users watch 4.15 million YouTube videos, send 456,000 tweets on Twitter,
post 46,740 photos on Instagram and there are 510,000 comments posted and 293,000
statuses updated on Facebook!
Just imagine the huge chunk of data that is produced with such activities. This constant
creation of data using social media, business applications, telecom and various other domains is
leading to the formation of Big Data.
The most common myth associated with it is that it is just about the size or volume of data. But
actually, it’s not just about the “big” amounts of data being collected. Big Data refers to the
large amounts of data which is pouring in from various data sources and has different formats.
Even previously there was huge data which were being stored in databases, but because of the
varied nature of this Data, the traditional relational database systems are incapable of handling
this Data. Big Data is much more than a collection of datasets with different formats; it is an
important asset which can be used to obtain enumerable benefits.
However, there are certain basic tenets of Big Data that will make it even simpler to answer
what is Big Data:
 It refers to a massive amount of data that keeps on growing exponentially with time.
 It is so voluminous that it cannot be processed or analysed using conventional data
processing techniques.
 It includes data mining, data storage, data analysis, data sharing, and data
visualization.
 The term is an all-comprehensive one including data, data frameworks, along with
the tools and techniques used to process and analyse the data.
The three different formats of big data are:
1. Structured: Organised data format with a fixed schema. Ex: RDBMS

2. Semi-Structured: Partially organised data which does not have a fixed format. Ex:
XML, JSON
3. Unstructured: Unorganised data with an unknown schema. Ex: Audio, video files
etc.
The 4 Characteristics of Big Data
What is the difference between regular data analysis and when are we talking about “Big” data?
Although the answer to this question cannot be universally determined, there are a number of
characteristics that define Big Data.
The characteristics of Big Data are commonly referred to as the four Vs:
1. Volume of Big Data -
The volume of data refers to the size of the data sets that need to be analyzed and processed,
which are now frequently larger than terabytes and petabytes. The sheer volume of the data
requires distinct and different processing technologies than traditional storage and processing
capabilities. In other words, this means that the data sets in Big Data are too large to process
with a regular laptop or desktop processor. An example of a high-volume data set would be all
credit card transactions on a day within Europe.
2. Velocity of Big Data –
Velocity refers to the speed with which data is generated. High velocity data is generated with
such a pace that it requires distinct (distributed) processing techniques. An example of a data
that is generated with high velocity would be Twitter messages or Facebook posts.
3. Variety of Big Data –
Variety makes Big Data really big. Big Data comes from a great variety of sources and generally
is one out of three types: structured, semi structured and unstructured data. The variety in data
types frequently requires distinct processing capabilities and specialist algorithms. An example
of high variety data sets would be the CCTV audio and video files that are generated at various
locations in a city.
4. Veracity of Big Data –
Veracity refers to the quality of the data that is being analysed. High veracity data has many
records that are valuable to analyse and that contribute in a meaningful way to the overall
results. Low veracity data, on the other hand, contains a high percentage of meaningless data.
The non-valuable in these data sets is referred to as noise. An example of a high veracity data
set would be data from a medical experiment or trial.
Data that is high volume, high velocity and high variety must be processed with advanced tools
(analytics and algorithms) to reveal meaningful information. Because of these characteristics of
the data, the knowledge domain that deals with the storage, processing, and analysis of these
data sets has been labelled Big Data.
BIG DATA TECHNOLOGIES

1. R Programming
R is the programming language and an open-source project. It is free software highly used for
statistical computing, visualization, unified developing environments like Eclipse and Visual
Studio assistance communication.
Expert says it has graced the most prominent language across the world. Along with it, being
used by data miners and statisticians, it is widely implemented for designing statistical software
and mainly in data analytics.
2. Data Lakes
Data Lakes refers to a consolidated repository to stockpile all formats of data in terms of
structured and unstructured data at any scale.
In the process of data accumulation, data can be saved as it is, without transforming it into
structured data and executing numerous kinds of data analytics from dashboard and data
visualization to big data transformation, real-time analytics, and machine learning for better
business interferences.
Organizations that use data lakes will be able to defeat their peers, new types of analytics can be
conducted such as machine learning across new sources of log files, data from social media and
click-streams and even IoT devices freeze in data lakes.
It helps organizations to know and respond to better opportunities for faster business growth by
bringing and engaging customers, sustaining productivity, maintaining devices actively, and
taking acquainted decisions.
3. Artificial Intelligence
From SIRI to self-driving car, AI is developing very swiftly, on being an interdisciplinary branch
of science, it takes many approaches like augmented machine learning and deep learning into
account to make a remarkable shift in almost every tech industry.
The excellent aspect of AI is the strength to intellectualize and make decisions that can provide
a plausible likelihood in achieving a definite goal. AI is evolving consistently to make benefits in
various industries. For example, AI can be used for drug treatment, healing patients, and
conducting surgery in OT.
Artificial Intelligence (AI) in fashion is changing this industry by playing a crucial role in the
various key divisions. From design to manufacturing, logistic supply chain and marketing, AI in
fashion is playing a big role in transforming this industry.
4. NoSQL Database
NoSQL incorporates a broad range of separate database technologies that are developing to
design modern applications. It depicts a non SQL or non-relational database that delivers a
method for accumulation and retrieval of data. They are deployed in real-time web applications
and big data analytics.
It stores unstructured data and delivers faster performance, and proffers flexibility while
dealing with varieties of datatypes at a huge scale. Examples included MongoDB, Redis, and
Cassandra.
It covers the integrity of design, easier horizontal scaling to an array of devices and ease control
over opportunities. It uses data structures that are different from those accounted by default in
relational databases, it makes computations quicker in NoSQL. For example, companies like
Facebook, Google and Twitter store terabytes of user data every single day.
5. Predictive Analytics
A subpart of big data analytics, it endeavours to predict future behaviour via prior data. It works
using machine learning technologies, data mining and statistical modelling and some
mathematical models to forecast future events.
The science of predictive analytics generates upcoming inferences with a compelling degree of
precision. With the tools and models of predictive analytics, any firm deploys prior and latest
data to drag out trends and behaviours that could occur at a particular time.
For example, to explore the relationships among various trending parameters. Such models are
designed to assess the pledge or risk delivered by a specific set of possibilities.
6. Apache Spark
With in-built features for streaming, SQL, machine learning and graph processing support,
Apache Spark earns the cite as the speediest and common generator for big data
transformation. It supports major languages of big data comprising Python, R, Scala, and Java.
The Hadoop was introduced due to spark; concerning the main objective with data processing is
speed. It lessens the waiting time between interrogating and program execution timing. The
spark is used within Hadoop mainly for storage and processing. It is a hundred times faster
than MapReduce.
7. In-memory Database
The in-memory database (IMDB) is stored in the main memory of the computer (RAM) and
controlled by the in-memory database management system. In prior, conventional databases
are stored on disk drives.
If you consider, conventional disk-based databases are configured with the attention of the
block-adapt machines at which data is written and read. Instead, when one part of the database
refers to another part, it feels the necessity of different blocks to be read on the disk. This is a
non-issue with an in-memory database where interlinked connections of the databases are
monitored using direct indicators.
In-memory databases are built in order to achieve minimum time by omitting the requirements
to access disks. But, as all data is collected and controlled in the main memory completely, there
are high chances of losing the data upon a process or server failure.
8. Hadoop Ecosystem
The Hadoop ecosystem comprises a platform that assists in resolving the challenges
surrounding big data. It incorporates a variety of varied components and services namely
ingesting, storing, analysing, and maintaining inside it.
Majority services prevalent in the Hadoop ecosystem are to complement its various components

which include HDFS, YARN, MapReduce and Common.
Hadoop ecosystem comprises both Apache Open Source projects and other wide variety of
commercial tools and solutions. A few of the well-known open source examples include Spark,
Hive, Pig, Sqoop and Oozie.
Q4. Big Data in Fashion Industry?

To keep up with the demands of the ‘fast fashion changes’ and to reduce the ‘turnaround time’
from ramp to stores, fashion retailers are increasingly turning to ‘big data’. Big Data is touching
every aspect of the fashion industry from design to demand, resale, as well as operations.
The fashion industry is one of the latest sectors to aggressively embrace data analytics, probably
because of its proven result. Extremely large sets of data are segregated into groups and
analyzed to reveal patterns, associations, and define the latest trends in the fashion industry.
Big data helps designers come to startling conclusions about their designs and help them create
a product line that will sell.
1. Big Data Allows You to Harness the Power of ‘Social Media Data’
Websites such as Twitter, Facebook, Instagram, and Pinterest are sources for raw and
uncensored public opinion. Sentiment analysis is used to get insights from public opinion. The
volume of this data from public opinion is huge and mostly unstructured, which needs to be
cleaned and transformed. If harnessed properly, this data has a lot of potential to give insightful
data.
Many companies now initially release photos of a new collection on social media and study the
general public reaction and comment to make changes before a large scale launch of the
collection. It helps the manufacturers know their audience and their likings on a real-time basis.
And it is also beneficial from the perspective that designers can provide the product as desired
by their customers.
Fashion has now become an experience and industry leaders are now using the power of social
media to convert this experience into a well-defined data set to devise their next fashion trends.
2. Big Data Help Reduce the Time Elapse between Order and Distribution
We all know Zara is one of the biggest fashion brands and key retailers with hundreds of stores
across the globe. Until a few years ago, most Zara stores faced the problem of limited supply.
Retail stores used to wait for the stock to finish before placing their order and hence there was
mostly a time gap between order and distribution leading to supply crunch.
Zara created an adaptive, data-driven supply chain management to deal with the problem.
Unlike traditional retailers who order ‘bulk clothes’ for the entire season, Zara orders only a
small amount of merchandise. Once a given product line hits the store, Zara keeps a track of its
sales data and analyzes its sales against supply for a particular SKU. In addition, Zara also
analyses the sales data of each SKU to identify the sales trend in that area. For example, they
might find that in a particular country, slim fit pants sell better than loose fit or a given coloured
clothes is given preference. Zara then uses all these insights and data to guide their following
order. This ensures that the product is stocked again on time based on sales trends.
With the use of data analytics and big data, Zara has now adopted the concept of ‘fast fashion’
where the entire process of designing a collection to putting it for sale in stores takes a
maximum of 21 days.
3. Big Data and Fashion Quality Control
‘Replica fashion wear’ or ‘Pirated merchandise’ is one of the biggest problems that have plagued
the fashion industry. As per the report released by the office of the US Trade Representatives in
its annual Notorious Markets List, pirated merchandise imports are estimated at nearly half a
trillion dollars or around 2.5% of global imports.
From replica Gucci apparel to fake Rolex watches, from counterfeit ‘Victoria Secret’ lingerie to
even replica ‘L’Oréal make-up’, anything and everything is now being pirated. And with the
latest technology available, now it is possible to replicate even the minute details, which makes
it difficult to differentiate between the two. Also, e-commerce giants like eBay, Amazon, and
Flipkart have made it easier to sell and buy pirated products. While counterfeiting has been
there for ages, it is only recently that is has started having such huge scale impact on the fashion
industry.
The huge numbers have garnered the attention of industry leaders who are now using big data
to solve the problem of counterfeit merchandise. Companies are now using pattern recognition
coupled with big data to protect the integrity of their company. For example, a designer can
create a new design and can use pattern recognition to find if something of a similar nature has
ever been created. Cognitive Prints, a suite of AI tools can be used for pattern recognition where
it will scan through huge amounts of data looking for similarities.
This in addition to fighting piracy helps in building a brand exclusivity and recognition. Also, if
a designer wants to include some patterns from a given era in his designs, big data will help
him/her achieve that.
4. Dealing Competition with Data
One of the biggest concerns of most fashion retailers is to understand their competitor's
strategy and to be ready to outshine them. With appropriate big data tools, fashion retailers can
get real-time insights on what their competitors are creating and how their campaign is
performing. Armed with these insights, they can strategize their campaigns better than their
peers. Your competitor is planning to launch a sporting cloth line? You can introduce a new
clothing line of ‘athleisure’ and appeal to both sporty and non-sporty people. Big data can help
fashion retailers be on the top of the game and launch the right products, at the right time and
in the right way.
Role of Artificial Intelligence and its Impact on the Fashion Industry
In the age of digitalization, AI and machine learning (ML) based technologies in the fashion

industry are providing an automated solution to manufacturers helping them to leverage the
intelligence of AI into fashion and exhaust the best possibilities into their field. For example:
 AI in Fashion Manufacturing, Supply Chain & Fashion Store

 AI in Fashion Design
 AI Robots in Fashion & Sewing
 AI in Fashion Retail
 AI Fashion Stylist
 AI in Fast Fashion with Smart Mirror
 AI Interactive Smart Mirrors
 AI in Online Fashion with Recommendation in Ecommerce
 AI in Visual Search
Nowadays, AI is playing a crucial in the fashion industry with huge potential to make this AI
integrated into various other subfields. It is powering the manufacturers to redefine how fashion
businesses engage and interact with their customers.
AI-enabled applications and system are enhancing the customer’s experience that goes beyond
personalized ads, notification alerts on price drops, or chatbot assistance.
With this kind of technology, fashion brands strive to put customization at the forefront for
customers during their buying journey.
Moreover, AI will not only help designers to predict the upcoming trends, visualized by the
current fast-changing-environment, but also examine and minimize the impacts on the
environment while producing the fashion garments and accessories.
Customers now becoming aware to use the AI-enabled features while searching or buying
clothing or fashion accessories online. Customers can take a photo and match the accessories
and clothes over brands to get the same design.
Apart from that, it is also reducing the errors and making the product delivery process fast
through automated warehousing management.
Companies or brands can now ask for feedback and suggestions through AI featured
applications.
Artificial intelligence impact on fashion will make this industry more smart and intelligent in
understanding the sentiments and fashion taste of customers.

Big Data Assignment

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Big Data Assignment

Uploaded by

Copyright:

Available Formats

BIG DATA ASSIGNMENT

IN PARTIAL FULFILLMENT OF THE POST GRADUATE DEGREE

Department: Fashion Management Studies

Submitted To: Dr. GULNAZ BANU .P

Submitted By: Shruti Jhunjhunwala

Dr. Gulnaz Banu .P

Q1. What is Machine Learning?

Data mining is a related field of study, focusing on exploratory data

Simple Definition: Machine learning is an application of artificial intelligence (AI) that

Machine learning approaches

Q2. What is the difference Between ML and AI?

As you can see on the above image of three

The three different formats of big data are:

1. Structured: Organised data format with a fixed schema. Ex: RDBMS

The 4 Characteristics of Big Data

1. Volume of Big Data -

2. Velocity of Big Data –

4. Veracity of Big Data –

BIG DATA TECHNOLOGIES

Majority services prevalent in the Hadoop ecosystem are to complement its various components

Q4. Big Data in Fashion Industry?

4. Dealing Competition with Data

Role of Artificial Intelligence and its Impact on the Fashion Industry

In the age of digitalization, AI and machine learning (ML) based technologies in the fashion

 AI in Fashion Manufacturing, Supply Chain & Fashion Store

You might also like