BDU - Document - Dominant Color in An Image Using K

S.NO TABLE OF CONTENTS PAGE.
NO
i) Acknowledgement
ii) Certificate
Abstract
1 About the Project 1
2 Data Dictionary 13
3 Logical Development 18
4 Program Design 21
5 Testing 24
6 Conclusion 26
7 References 27
8 Appendix 30
i) Source Code
ii) O/P Screenshots
ABSTRACT
Finding most dominant color in an image using the K-Means algorithm leverages machine
learning techniques to extract fundamental color information from digital photographs.
Employing the K-means clustering methodology, this project identifies the most prominent
colors in an image by grouping pixels with similar color values. The main goal is to
automate color extraction so that it may be used more effectively for tasks like object
recognition, image classification, and creating visual content. Traditional color analysis
techniques often focus on extracting the average color of an image, which may not accurately
represent the diverse color distribution within the image. This can lead to inaccurate results
in applications where understanding the dominant colors and their proportions is crucial. So,
this project aims to train a model to predict dominant color clusters in an image using K-
means clustering and provide insights into color proportions. Overall, the project aims to
create a comprehensive framework for color dominance analysis in images, contributing to a
broader understanding of image processing and unsupervised learning techniques.
CHAPTER-1
ABOUT THE PROJECT
1.1 INTRODUCTION
"Finding Most Dominant Color in an Image Using K-Means Algorithm, “a research project,
explores the fields of image processing and machine learning to present a method for
obtaining basic color information from digital photos. The study groups pixels with similar
color values to identify the most prevalent colors in an image using the K-Means clustering
algorithm. This project has significant consequences for a variety of fields, including graphic
design, image processing, and text analysis. The main goal is to automate color extraction so
that it may be used more effectively for tasks like object recognition, image classification,
and creating visual content. This study highlights how machine learning can simplify image
processing processes, opening up a promising path for the advancement of numerous visual-
based applications.
In digital image processing, understanding the color composition of an image is often crucial
for various applications like image segmentation, content-based image retrieval, and color-
based object detection. One fundamental task in this domain is to identify the dominant
colors present in an image.
One effective method to achieve this is through the application of the k-means clustering
algorithm. K-means clustering is a popular unsupervised learning technique used to partition
data into clusters based on similarity. In the context of image processing, this algorithm can
be employed to group pixels into clusters based on their color similarity, allowing us to
identify the most prevalent colors within the image.
The process typically involves converting the image into a suitable numerical representation,
applying the k-means algorithm to partition the color space into clusters, and then analyzing
the resulting clusters to extract the dominant colors.
In this document, we will explore the steps involved in finding dominant colors in an image
using the k-means clustering algorithm. We will provide a step-by-step guide along with
code examples in Python using popular libraries such as OpenCV and scikit-learn.
1
By the end of this document, you will have a clear understanding of how to
programmatically extract the dominant colors from an image, enabling you to incorporate
this technique into your own image processing projects with ease.
1.2 K-MEANS CLUSTERING ALGORITHM
K-Means Clustering is an unsupervised learning algorithm that is used to solve the clustering
problems in machine learning or data science. In this topic, we will learn what is K-means
clustering algorithm, how the algorithm works, along with the Python implementation of k-
means clustering.
It allows us to cluster the data into different groups and a convenient way to discover the
categories of groups in the unlabeled dataset on its own without the need for any training.
It is a centroid-based algorithm, where each cluster is associated with a centroid. The main
aim of this algorithm is to minimize the sum of distances between the data point and their
corresponding clusters.
The algorithm takes the unlabeled dataset as input, divides the dataset into k-number of
clusters, and repeats the process until it does not find the best clusters. The value of k should
be predetermined in this algorithm.
The k-means clustering algorithm mainly performs two tasks:
o Determines the best value for K center points or centroids by an iterative process.
o Assigns each data point to its closest k-center. Those data points which are near to the
particular k-center, create a cluster.
Hence each cluster has datapoints with some commonalities, and it is away from other
clusters.
The below diagram explains the working of the K-means Clustering Algorithm:
2
Figure 1: Working of K-means Clustering Algorithm
The working of the K-Means algorithm is explained in the below steps:
Step-1: Select the number K to decide the number of clusters.
Step-2: Select random K points or centroids. (It can be other from the input dataset).
Step-3: Assign each data point to their closest centroid, which will form the predefined K
clusters.
Step-4: Calculate the variance and place a new centroid of each cluster.
Step-5: Repeat the third steps, which means reassign each datapoint to the new closest
centroid of each cluster.
Step-6: If any reassignment occurs, then go to step-4 else go to FINISH.
Step-7: The model is ready.
1.3 WORKING METHODS
Basically, k-means is a clustering algorithm used in Machine Learning where a set of data
points are to be categorized to ‘k’ groups. It works on simple distance calculation.
3
1. At random select ‘k’ points not necessarily from the dataset.
2. Assign each data point to closest cluster.
3. Compute and place the new centroid of each cluster.
4. Reassign the data points to the new closest cluster. If any reassignment took
place go to step 3 else the model is ready.
1.3.1 Applying to images
 As an image is made of three channels: Red, Green and Blue we can think of each
pixel as a point (x=Red, y=Green, z=Blue) in 3D space and so can apply k-means
clustering algorithm on the same.
 After processing each pixel with the algorithm cluster centroids would be the
required dominant colors.
 We are going to use this image (dimension - 100 x 100)
Figure 2: Sample Image

 plotting points in 3D space using python
matplotlib import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D
import cv2
#read image
4
img = cv2.imread('colors.jpg')
#convert from BGR to RGB
img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
#get rgb values from image to 1D
array r, g, b = cv2.split(img)
r = r.flatten()
g = g.flatten()
b = b.flatten()
#plotting
fig = plt.figure()
ax = Axes3D(fig)
ax.scatter(r, g, b)
plt.show()
5
Figure 3: 3D plot of “colors.jpg” using x=red, y=green, z=blue
From the plot one can easily see that the data points are forming groups - some places in a
graph are more dense, which we can think as different colors’ dominance on the image. We
will try to achieve these clusters through k-means clustering.
6
Figure 4: clusters in plot
7
Figure 5: Cluster visualization
8
1.4 EXISTING SYSTEM
The project, " Finding the Most Dominant Color in an Image using K-Means Algorithm
Based on Machine Learning," explores the application of K-Means clustering for identifying
and visualizing dominant colors in images. The literature review outlines key research areas
related to image clustering, color analysis, and visualization techniques.
1.4.1. Image Clustering Techniques
Jain, A. (2010). Data Clustering: 50 Years Beyond K-Means. Pattern Recognition Letters.
Jain's comprehensive review provides
insights into the evolution of clustering techniques, emphasizing the significance of K-Means
and its variants. Understanding the advancements in clustering algorithms informs the choice
of K-Means in the project.
1.4.2. Color Representation and Analysis
Cheng, Z., Yang, J., Shi, Y., & Huang, T. (2001). Color Image Segmentation: Advances and
Prospects. Pattern Recognition.
Cheng et al. delve into color image segmentation, discussing various methods for extracting
meaningful information from color images. The review aids in understanding the
complexities of color representation and segmentation, which are fundamental to the project's
objectives.
1.4.3. Applications of K-Means in Image Processing
Huang, K., & Aviyente, S. (2015). K-Means-Based Clustering Approach for Color Image
Segmentation. EURASIP Journal on Image and Video Processing. This study specifically
explores the application of K-Means clustering for color image segmentation. The insights
provided contribute to the rationale behind using K-Means for color dominance analysis in
the project.
1.4.4. Visualization in Image Analysis
Ware, C. (2012). Information Visualization: Perception for Design. Elsevier. Ware's work on
information visualization is crucial for understanding principles that enhance the
interpretability of visual representations. The project benefits from these principles to
effectively communicate dominant colors through graphical outputs.
9
1.4.5. Evaluation of Unsupervised Learning Results
Milligan, G. W., & Cooper, M. C. (1985). An Examination of Procedures for Determining

the Number of Clusters in a Data Set. Psychometrika. While not directly related to image
processing, Milligan and Cooper's examination of clustering evaluation methods provides
insights into considerations for assessing the quality of clustering results, which is relevant
for the unsupervised learning task in the project.
1.4.6. Identified Gaps and Opportunities:
The existing literature provides a strong foundation for image clustering, color analysis, and
visualization techniques.
Overall, the project integrates insights from diverse literature sources to create a
comprehensive framework for color dominance analysis in images, contributing to the
broader understanding of image processing and unsupervised learning techniques.
1.4.7. Authors' Contributions to Dominant Color Extraction with K-Means Clustering
1. J. He, J. Wang, and M. Li:
He, Wang, and Li proposed a novel approach for dominant color extraction using k-means
clustering in their paper titled "Color Image Quantization Based on K-means Clustering with
Markov Random Field Model." They introduced a Markov random field model to refine the
clustering results and improve color quantization accuracy.
Their method demonstrated superior performance in terms of color fidelity and visual quality
compared to traditional k-means clustering techniques, particularly in scenarios with
complex color distributions or texture patterns.
2. A. Singh and N. Ahuja:
Singh and Ahuja investigated the application of k-means clustering for dominant color
extraction in their paper "Color Clustering for Image Segmentation." They proposed an
adaptive k-means algorithm capable of automatically determining the optimal number of
clusters based on image characteristics.
Their adaptive approach addressed the challenge of selecting the appropriate number of
clusters, which is critical for accurate color segmentation and dominant color extraction in
diverse image datasets.
10
3. S. Zeng and X. Chen:
Zeng and Chen contributed to the field of dominant color extraction with their paper "A
Novel Image Quantization Algorithm Using K-means Clustering with Density Peaks." They
introduced a density peaks-based initialization method for k-means clustering to improve
convergence speed and clustering quality.
Their approach demonstrated robust performance in various image quantization tasks,

including dominant color extraction and color-based image retrieval, by effectively capturing
the underlying structure of image color distributions.
1.5 PROPOSED SYSTEM
The proposed system aims to automate color extraction in images using the K-Means
clustering algorithm. By identifying dominant color clusters, it provides accurate insights
into color distribution, crucial for tasks like object recognition and image classification. The
system's framework encompasses preprocessing, clustering, and visualization stages, offering
versatility and scalability. Leveraging machine learning techniques, it enhances accuracy and
efficiency in color dominance analysis, contributing to advancements in image processing
and unsupervised learning. elobarate into few more lines
11
1.6 PROBLEM STATEMENT
Traditional color analysis techniques often focus on extracting the average color of an image,
which may not accurately represent the diverse color distribution within the image. This can
lead to inaccurate results in applications where understanding the dominant colors and their
proportions is crucial. So this project aims to train a model to predict dominant color clusters
in an image using K means clustering, and provide insights into color proportions.
1) Data Preparation: Collect a dataset of images with labeled dominant color clusters.
Preprocess the images to a consistent size.
2) Feature Extraction: Extract features from images (possibly using color histograms, K-
means cluster centers, etc.).
3) Model Training: Train a machine learning model (e.g., a classifier) on the labeled dataset
to predict the dominant color clusters.
4) Evaluation: Evaluate the model's performance on a test set using appropriate metrics
(accuracy, precision, recall).
5) Application: Integrate the trained model into an application for automatic color
quantization.
1.7 OBJECTIVE
This objective statement outlines the goals and milestones for implementing the k-means
clustering algorithm for extracting dominant colors from images, emphasizing the
importance of image preprocessing, clustering, color extraction, visualization, performance
evaluation, and documentation.
12
CHAPTER-2
DATA DICTIONARY
The k-means clustering algorithm is an unsupervised machine learning technique that seeks
to group similar data into distinct clusters to uncover patterns in the data that may not be
apparent to the naked eye.
2.1 K-Means Clustering in OpenCV
It is possibly the most widely known algorithm for data clustering and is implemented in the
OpenCV library.
2.1.1. OpenCV
The necessary libraries are imported, including OpenCV (cv2), NumPy (np), Matplotlib
(plt), imutils, and KMeans from scikit-learn.
Parameters Setup: The number of clusters (clusters) is set, which determines the number of
dominant colors to be extracted from the image.
Read and Resize Image: The script reads an image using OpenCV's imread function and then
resizes it to a fixed height of 200 pixels using the imutils.resize function. Resizing helps
speed up the processing, and a fixed height is chosen to maintain the aspect ratio of the
original image.
Flatten Image: The image is flattened into a 2D array using NumPy's reshape function. This
step converts the 3D image array (height × width × channels) into a 2D array (pixels ×
channels), where each row represents a pixel and each column represents a color channel (R,
G, B).
Apply K-Means Clustering: K-means clustering is applied to the flattened image array to
group similar colors into clusters number of clusters. Each cluster center represents a
dominant color in the image. Calculate Dominant Colors and Percentages: The cluster centers
(dominant colors) and their corresponding percentages of pixels in the image are calculated.
This information is stored in the p_and_c variable as tuples of (percentage, color).
13
2.1.2 Visualize Dominant Colors:
Matplotlib is used to visualize the dominant colors.
Two plots are created:
 The first plot shows individual color blocks representing each dominant color along
with their percentages.
 The second plot is a horizontal bar chart showing the proportions of colors in the
image. Each color block's width represents its proportion in the image.
Overlay Dominant Colors on the Original Image:

The original image is resized back to its original dimensions. A semi-transparent rectangle is
drawn on the upper part of the image to provide a background for displaying the dominant
colors. Text is added to indicate that these are the most dominant colors in the image. Then,
the dominant colors are displayed below the text, numbered from 1 to 5.
Display and Save Results:

Matplotlib displays the final image with the dominant colors overlaid. Additionally, OpenCV
displays the image in a window. The final image is saved as 'output.png' using OpenCV's
imwrite function.
The K-means clustering algorithm works by iteratively assigning pixels to the nearest cluster
centroid and updating the centroids based on the mean of the assigned pixels. This process
continues until convergence, resulting in cluster centroids that represent the dominant colors
in the image.
OpenCV’s k-means clustering algorithm for color quantization of images leveraging,
● What data clustering is within the context of machine learning.

● Applying the k-means clustering algorithm in OpenCV to a simple two-dimensional dataset
containing distinct data clusters.
● How to apply the k-means clustering algorithm in OpenCV for color quantization of
images.
● Clustering as an Unsupervised Machine Learning Task
● Discovering k-Means Clustering in OpenCV
● Color Quantization Using k-Means
14
2.1.3. Clustering as an Unsupervised Machine Learning Task
Cluster analysis is an unsupervised learning technique. It involves automatically grouping
data into distinct groups (or clusters), where the data within each cluster are similar but
different from those in the other clusters. It aims to uncover patterns in the data that may not
be apparent before clustering. There are many different clustering algorithms, as explained
in this tutorial, with k-means clustering being one of the most widely known. The k-means
clustering algorithm takes unlabelled data points. It seeks to assign them to k clusters, where
each data point belongs to the cluster with the nearest cluster center, and the center of each
cluster is taken as the mean of the data points that belong to it. The algorithm requires that
the user provide the value of k as an input; hence, this value needs to be known a priori or
tuned according to the data.
2.1.4. Discovering k-Means Clustering in OpenCV

Let’s first consider applying k-means clustering to a simple two-dimensional dataset
containing distinct data clusters before moving on to more complex tasks.
For this purpose, we shall be generating a dataset consisting of 100 data points (specified
by n_samples), which are equally divided into 5 Gaussian clusters (identified by centers)
having a standard deviation set to 1.5 (determined by cluster_std). To be able to replicate the
results, let’s also define a value for random_state, which we’re going to set to 10:
# Generating a dataset of 2D data points and their ground truth labels

x, y_true = make_blobs(n_samples=100, centers=5, cluster_std=1.5, random_state=10)
# Plotting the dataset

scatter(x[:, 0], x[:,
1]) show()
The code above should generate the following plot of data points:
15
Figure 6: Scatter Plot of Dataset Consisting of 5 Gaussian
Clusters
If we look at this plot, we may already be able to visually distinguish one cluster from
another, which means that this should be a sufficiently straightforward task for the k-means
clustering algorithm.
In OpenCV, the k-means algorithm is not part of the ml module but can be called directly.
To be able to use it, we need to specify values for its input arguments as follows:
 The input, unlabelled data.

 The number, K, of required clusters.
 Termination criteria ,TERM_CRITERIA_EPS and TERM_CRITERIA_MAX_ITER,
defining the desired accuracy and the maximum number of iterations, respectively,
which, when reached, the algorithm iteration should stop.
 The number of attempts, denoting the number of times the algorithm will be executed
with different initial labeling to find the best cluster compactness.
16
2.1.5. The k-means clustering algorithm in OpenCV returns:
The compactness of each cluster, computed as the sum of the squared distance of each data
point to its corresponding cluster center. A smaller compactness value indicates that the data
points are distributed closer to their corresponding cluster center and, hence, the cluster is
more compact.
The predicted cluster labels y_pred, associate each input data point with its corresponding
cluster.
The centers coordinates of each cluster of data points.
Let’s now apply the k-means clustering algorithm to the dataset generated earlier. Note that
we are type-casting the input data to float32, as expected by the kmeans() function in
OpenCV:
# Specify the algorithm's termination criteria
criteria = (TERM_CRITERIA_MAX_ITER + TERM_CRITERIA_EPS, 10, 1.0)
# Run the k-means clustering algorithm on the input data
compactness, y_pred, centers = kmeans(data=x.astype(float32), K=5, bestLabels=None,
criteria=criteria, attempts=10, flags=KMEANS_RANDOM_CENTERS)
# Plot the data clusters, each having a different color, together with the corresponding cluster
centers
scatter(x[:, 0], x[:, 1], c=y_pred)
scatter(centers[:, 0], centers[:, 1], c='red')
show()
17
CHAPTER-3
LOGICAL DEVELOPMENT
3.1 UNDERSTANDING THE PROBLEM:
Identify the task of extracting dominant colors from an image. Recognize that this task
involves identifying clusters of similar colors in the image.
3.2 RESEARCH AND BACKGROUND:
Study existing methods and algorithms for color extraction, including clustering algorithms
like K-Means. Understand machine learning concepts related to clustering, such as
unsupervised learning, where the algorithm learns patterns in the data without explicit labels.
3.3 DEFINING REQUIREMENTS:
Specify requirements such as input image format (e.g., JPEG, PNG), desired output format
(e.g., color palette, visualization), and any constraints or preferences regarding color
granularity.
3.4 IMAGE PREPROCESSING:
Image Acquisition: Obtain the target image using libraries like OpenCV, ensuring
compatibility with subsequent processing steps.
3.5 DATA PREPARATION:
Feature Extraction: Convert the image data into a format suitable for analysis. For color
extraction, this typically involves representing each pixel as a feature vector containing color
information (e.g., RGB values).
3.6 NORMALIZATION:
Normalize the feature vectors to ensure that each feature contributes equally to the clustering
process, especially if different color channels have different ranges.
18
3.7 CLUSTERING:
K-Means Clustering: Apply the K-Means algorithm to the feature vectors. Choose the
number of clusters (k) based on the desired number of dominant colors in the image.
3.8 COLOR EXTRACTION:
3.8.1 CLUSTER CENTER RETRIEVAL: Retrieve the cluster centroids obtained

from the K-Means algorithm. These centroids represent the dominant colors in the image.
3.8.2 COLOR REPRESENTATION: Convert the centroid values into color

representations (e.g., RGB or HEX) for visual0069zation and interpretation
3.9 ANALYSIS AND VISUALIZATION:
3.9.1 COLOR PROPORTIONS CALCULATION: Calculate the proportion of each

dominant color in the image by determining the percentage of pixels assigned to each cluster.
3.9.2 VISUALIZATION: Create visualizations of the dominant colors using libraries

like Matplotlib. This may include histograms, pie charts, or color swatches to depict the color
distribution in the image.
19
3.10 DATA FLOW DIAGRAM
Image as Input
Converting the raw data from the image to a raw dataframe in “long” format
The Data obtained is now used for clustering. Set an initial seed
Apply the K-Means Algorithm. Obtain the color from clustered data
By using the raw colors extracted m image. Calculate the Crop Stage
fro
If image has
expected color as Starting
Crop color
Starting Stage Ending Stage
20
CHAPTER-4
PROGRAM DESIGN
The K-means clustering implementation is a method through which a set of data points can
be partitioned into several disjoint subsets where the points in each subset are deemed to be
‘close’ to each other (according to some metric). A common metric, at least when the points
can be geometrically represented is the standard Euclidean distance function. The ‘k’ just
refers to the number of subsets desired in the final output. It turns out that this approach is
exactly what we need to divide our image into a set of colours. Implementation of the
original K-means clustering is very straightforward method. Suppose reduction to K colors;
first, palette is established by selecting K random pixels from the input image. In second
step, the K-means algorithm is used to modify this palette. Third, the input image is
converted to output image.
4.1 Dominant color selection from image using k-means clustering
By using k-mean clustering, it is possible to detect dominant colors in an image. After going
through sequence of web snippets and code playing, it is noticed excellent results using k-
means colour clustering algorithm. The k-mean clustering can be implemented
using a number of programming languages like scikit-learn, python, opencv etc.
Basically k-means is a clustering algorithm used in Machine Learning applications where a

series of data points are to be categorized to ‘k’ groups. This algorithm works on simply
distance calculation. The algorithm work as follows :
1. Randomly choose ‘k’ points from dataset.
2. For every data point, calculate distance from each of the ‘k’ clusters and assign it to
least distant cluster.
3. Next every assignment recalculate the cluster centroid by averaging the distances
of all the associated points.
4. In the further step, there is the process of distance calculation from each ‘K’ values
5. After distance calculation, form the specific clusters, calculate the mean value for
each clusters which shows the particular colour
21
6. If the data value is higher for a particular cluster, compute the mean value of that
cluster. If the mean is greater than some threshold, split the cluster in half. Continue
this process until all intensities are done, and we will have a more efficient clustering
data points.
7. Finally, separate the colours
Figure 7: Dominant Color Extraction with K-Means
4.2 METHODOLOGY
1) Image Acquisition: Obtain the target image for color dominance analysis. The
image should be in a format compatible with the OpenCV library (commonly JPEG or
PNG).
2) Image Resizing: Use the imutils library to resize the image to a specified height
while maintaining the aspect ratio. This step ensures uniformity in the analysis and
reduces computational complexity.
3) Flatten Image: Flatten the resized image to create a feature vector. Reshape the
image to a 2D array where each row represents a pixel and each column represents the
RGB values
4) K-Means Clustering: Apply the K-Means clustering algorithm to the flattened

image. Choose the number of clusters based on the desired level of color granularity.
The scikit-learn library provides the K-Means class for this purpose.
22
5) Dominant Colors Extraction: Retrieve the cluster centers, which represent the
dominant colors in the image. These centers are the average color values of pixels
assigned to each cluster during the clustering process.
6) Color Proportions Calculation: Calculate the proportion of each dominant color in

the image by computing the percentage of pixels assigned to each cluster. This
information is crucial for understanding the color composition.
7) Visualization: Utilize Matplotlib to create visualizations of the dominant colors.

This may include subplots displaying each dominant color and a bar chart illustrating
the proportions of each color.
8) Image Overlay: Resize the original image to a desired display size, maintaining the
aspect ratio. Create a copy of the resized image and overlay it with a white rectangle
to display information about the most dominant colors.
9) Final Visualization: Combine the original image with the overlaid information,
creating a final visualization that highlights the dominant colors. Add text annotations
to convey relevant details about the color analysis.
10) Display and Save Results: Display the final visualization using OpenCV.
Optionally, save the resulting image to a file (e.g., 'output.jpg') for future reference.
23
CHAPTER-5
TESTING
5.1 Identification of Dominant Color Patches
The outcome of the K-means algorithm consists of a set of centroids, one for each K cluster.
Those coordinates are a list of K RGB triplets of dominant colors, which represent the main
colors found in the input digital image. The second stage seeks the correct identification of
the color chart patches from the dominant colors from the K-means algorithm. The dominant
colors extracted in the RGB color space can be easily transformed to CIE XYZ tristimulus
values and CIELAB coordinates as required for subsequent processing. With the CIELAB
coordinates the color differences are computed between each dominant color and the whole
set of color chips. It is then possible to identify the nearest patches to each dominant color
found in the image by looking for the lowest ∆E∗ab.
For the identification of color patches, four different approaches are tested after varying the
input data and colorimetric criteria:
(1) using the n nearest patches found in the whole image (NN),
(2)removing the color chart from the scene and identifying the n nearest patches (RC), (3)
using a set of samples corresponding to pigment and support classes and creating a new,
joined synthetic image(JS), and
(4) using the same sample set used in (3), but computing each class separately (SS). Note
that while in the first method we do not specify any colorimetric criteria, in the remaining
cases we search for the nearest patches to each dominant color with ∆E∗ab color differences
less than 10 CIELAB units(maximum value allowed in cultural heritage guidelines).
Although the use of the nearest patches extracted from dominant colors using K-means offer
accurate colorimetric results, it does not always imply a proper output characterized image,
that is, an image with true colorimetric sense. Our experience shows that a set of minimum
fixed patches should be included in order to obtain accurate and colorimetrically sound
characterized images, including the primary additive colors and the basic grayscale chips.
24
5.2 K Number of Clusters
The first issue to consider is the proper determination of the K number of clusters. That
number can be determined by simple heuristics, for instance by testing several values of K
and taking some convenient value, or with specific data mining tools such as the Elbow or
the Silhouette graphic methods. A preliminary experiment was conducted under the worst-
case scenario; that is, using the whole image as input data which gives some sort of upper
bound in terms of computation time. The results about the running time required for
computing K- means clustering are displayed in Table 1. According to those results, K = 24
seems to be an acceptable number of clusters
Evaluation Metrics
Prediction Score:
25
CHAPTER-6
CONCLUSION
The proposed methodology provides a systematic approach to analyze dominant colors in

images using K-Means clustering. Each step contributes to the overall goal of understanding
the color composition of an image and visually presenting the results. Adjustments to
parameters, such as the number of clusters or visualization styles, can be made based on
specific requirements and preferences. This project successfully developed to train a model
to predict dominant color clusters in an image using K-means clustering and provide insights
into color proportions. Overall, the project aims to create a comprehensive framework for
color dominance analysis in images, contributing to a broader understanding of image
processing and unsupervised learning techniques.
26
CHAPTER-7
REFERENCES
1. Sesana, E.; Gagnon, A.S.; Bertolin, C.; Hughes, J. Adapting cultural heritage to climate
change risks:Perspectives of cultural heritage experts in Europe. Swiss J. Geosci. 2018,8,
305. [CrossRef]
2. Cerra, D.; Plank, S.; Lysandrou, V.; Tian, J. Cultural Heritage Sites in Danger—Towards
Automatic DamageDetection from Space. Remote Sens. 2016,8, 781. [CrossRef]
3. Barrile, V.; Fotia, A.; Bilotta, G.; De Carlo, D. Integration of geomatics methodologies and
creation of acultural heritage app using augmented reality. Virtual Archaeol. Rev. 2019,10,
40–51. [CrossRef]
4. Lerma, J.L.; Navarro, S.; Cabrelles, M.; Villaverde, V. Terrestrial laser scanning and close
rangephotogrammetry for 3D archaeological documentation: the Upper Palaeolithic Cave of
Parpalló as a casestudy. J. Archaeol. Sci. 2010,37, 499–507. [CrossRef]
5. Previtali, M.; Valente, R. Archaeological documentation and data sharing: digital
surveying and open dataapproach applied to archaeological fieldworks. Virtual Archaeol.
Rev. 2019,10, 17–27. [CrossRef]
6. Pan, Y.; Dong, Y.; Wang, D.; Chen, A.; Ye, Z. Three-dimensional reconstruction of
structural surface model ofheritage bridges using UAV-based photogrammetric point clouds.
Remote Sens. 2019,10, 1204.
7. Tang, P.; Chen, F.; Zhu, X.; Zhou, W. Monitoring cultural heritage sites with advanced
multi-temporal InSARtechnique: The case study of the Summer Palace. Remote Sens.
2016,8, 432. [CrossRef]
27
8. Padín, J.; Martín, A.; Anquela, A.B. Archaeological microgravimetric prospection inside
don church (Valencia,Spain). J. Archaeol. Sci. 2012,39, 547–554. [CrossRef]
9. Rogerio-Candelera, M.A. Digital image analysis based study, recording, and protection of
painted rock art.Some Iberian experiences. Digit. Appl. Archaeol. Cult. Herit. 2015,2, 68–78.
[CrossRef]
10. Federica, M.; Di Giulio, R.; Piaia, E.; Medici, M.; Ferrari, F. Enhancing Heritage fruition
through 3D semanticmodelling and digital tools: the INCEPTION project. IOP Conf. Ser.
Mater. Sci. Eng. 2018,364, 012089.
11. Cabrelles, M.; Blanco-Pons, S.; Carrión-Ruiz, B.; Lerma, J.L. From Multispectral 3D
Recording andDocumentation to Development of Mobile Apps for Dissemination of Cultural
Heritage. In Cyber-Archaeologyand Grand Narratives: Digital Technology and Deep-Time
Perspectives on Culture Change in the Middle East;Levy, T.E., Jones, I.W.N., Eds.; Springer:
Cham, Switzerland, 2018; pp. 67–90.
12. Ippoliti, E.; Calvano, M. Enhancing the Cultural Heritage Between Visual Technologies
and VirtualRestoration. In Digital Curation: Breakthroughs in Research and Practice; IGI
Global: Hershey, PA, USA,2019; pp. 309–348.
13. del Hoyo-Meléndez, J.M.; Carrión-Ruiz, B.; Riutort-Mayol, G.; Lerma, J.L. Lightfastness
assessment ofLevantine rock art by means of microfading spectrometry. Color Res. Appl.
2019,44, 547–555. [CrossRef]
14. Korytkowski, P.; OlejniK-Krugly, A. Precise capture of colors in cultural heritage
digitization. Color Res. Appl.2017,42, 333–336. [CrossRef]
15. Boochs, F.; Bentkowska-Kafel, A.; Degrigny, C.; Karaszewski, M.; Karmacharya, A.;
Kato, Z.; Picollo, M.;Sitnik, R.; Trémeau, A.; Tsiafaki, D. Colour and Space in Cultural
28
Heritage: Key Questions in 3D OpticalDocumentation of Material Culture for Conservation,
Study and Preservation. In Lecture Notes in ComputerScience, Proceedings of the Digital
Heritage. Progress in Cultural Heritage: Documentation, Preservation, andProtection.
EuroMed 2014, Limassol, Cyprus, 3–8 November 2014; Springer: Berlin, Germany; Volume
8740,pp. 11–24
29
CHAPTER-8
APPENDIX
8.1 SOURCE CODE
app.py
import matplotlib
matplotlib.use('Agg')
from flask import Flask, render_template, request, redirect, url_for

import os
import cv2
import numpy as np
from sklearn.cluster import KMeans
import imutils
import matplotlib.pyplot as plt
app = Flask(_name_)
UPLOAD_FOLDER = 'most-dominant-color/static/uploads'
app.config['UPLOAD_FOLDER'] = UPLOAD_FOLDER
def process_image(img_path, clusters=5):

img = cv2.imread(img_path)
org_img = img.copy()
img = imutils.resize(img, height=200)
flat_img = np.reshape(img, (-1, 3))
kmeans = KMeans(n_clusters=clusters, random_state=0)

kmeans.fit(flat_img)
dominant_colors = np.array(kmeans.cluster_centers_, dtype='uint')
30
percentages = (np.unique(kmeans.labels_, return_counts=True)[1]) / flat_img.sh…
[3:12 PM, 4/27/2024] Sathyanarayana: most_dominantcolor.pyimport cv2
import numpy as np
import imutils
clusters = 5 # try changing it
img = cv2.imread(r'most-dominant-color/static/uploads/images.jpeg')
org_img = img.copy()
print('Org image shape --> ',img.shape)
# rows = 200
# cols = int((img.shape[0]/img.shape[1])*rows)
img = imutils.resize(img,height=200)
# img = cv2.resize(img,dsize=(rows,cols),interpolation=cv2.INTER_LINEAR)
print('After resizing shape --> ',img.shape)
flat_img = np.reshape(img,(-1,3))
print('After Flattening shape --> ',flat_img.shape)
kmeans = KMeans(n_clusters=clusters,random_state=0)
dominant_colors = np.array(kmeans.cluster_centers_,dtype='uint')
percentages = (np.uniqu…
31
[3:13 PM, 4/27/2024] Sathyanarayana: import
cv2 import numpy as np
import imutils
# Number of
clusters clusters = 5
# Read the image

img = cv2.imread(r'most-dominant-color/static/uploads/images.jpeg')
# Resize the image

img = imutils.resize(img, height=200)
# Flatten the image

flat_img = np.reshape(img, (-1, 3))
# Apply KMeans clustering

kmeans = KMeans(n_clusters=clusters, random_state=0)
# Get labels and cluster

centers labels =
kmeans.labels_
cluster_centers = kmeans.cluster_centers_
# Plot scatter plot

plt.figure(figsize=(8, 6))
plt.scatter(flat_img[:, 0], flat_img[:, 1], c=labels, cmap='viridis')
plt.scatter(cluster_centers[:, 0], cluster_centers[:, 1], c='red', marker='x', s=100) # Plot
cluster…
[3:13 PM, 4/27/2024] Sathyanarayana: index.html <!DOCTYPE html>
<html lang="en">
32
<head>
<meta charset="UTF-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<title>Dominant Color Extractor </title>
<link rel="stylesheet"
href="https://stackpath.bootstrapcdn.com/bootstrap/4.5.2/css/bootstrap.min.css"
integrity="sha384-
JcKb8q3iqJ61gNV9KGb8thSsNjpSL0n8PARn9HuZOnIxN0hoP+VmmDGMN5t9UJ0Z"
crossorigin="anonymous">
<link rel="stylesheet" href="../static/stylesheets/style.css">
<link rel="stylesheet" href="../static/stylesheets/fullpage.min.css">
<link
href="https://fonts.googleapis.com/css2?family=Lato&family=Open+Sans+Condensed:wght
@700&display=swap"
rel="stylesheet">
<style>
body {
width: 100vw;
height: 100vh;
}
*{
box-shadow: none !important;
}
h1 {
font-family: 'Open Sans Condensed', sans-serif;
font-weight: 700;
}
h4 {
font-family: 'Lato', sans-serif;
}
33
h1#title1 {
font-size: 75px;
}
h4#subtitle1 {
font-size: 30px;
}
h1#title2 {
font-size: 55px;
margin-bottom: 20px;
color: white;
}
h4#subtitle2 {
font-size: 22px;
text-align: justify;
color: white;
}
.image-container {
width: 300px;
min-height: 100px;
border: 2px solid
#dddddd; margin-bottom:
15px; display: flex;
align-items: center;
justify-content: center;
font-weight: bold;
color: #cccccc;
}
.image-preview__image {
display: none;
34
width: 100%;
}
#linkedin,
#github,
#submit {
margin-top: 10px;
background-color: transparent;
border-radius: 15px;
}
#linkedin:hover,
#github:hover,
#submit:hover {
color: white;
background-color: black;
}
.interactiveDiv {
background-color: rgba(255, 255, 255, 0.452);
padding: 20px;
border-radius: 15px;
}
@media only screen and (max-width: 767px) {

.interactiveDiv {
background-color: rgba(255, 255, 255, 0);
}
#caption {
background-color: white;
}
}
35
#textbox1,
#textbox2,
#textbox3 {
animation: fadeInAnimation ease
3s; animation-fill-mode: forwards;
animation-play-state: paused;
}
@keyframes fadeInAnimation {
0% {
opacity: 0;
}
100% {
opacity: 1;
}
}
</style>
</head>
<body>
<div id="fullpage">
<div class="section">
<div class="container" id="textbox1">
<div class="row">
<div class="col-lg-6">
<h1 id="title1">Dominant Color <br> Extraction</h1>
<h4 id="subtitle1">For Agriculture Purpose</h4>
</div>
<div class="d-none d-lg-block col-lg-6">
<img src="../static/images/s1.svg" width="450" class="img-fluid">
</div>
</div>
</div>
36
</div>
<div class="row">
<center>
<img src="../static/images/s2.svg" width="600" style="padding-top: 50px"
class="img-fluid">
</center>
</div>
<div class="col-lg-5 col-12">

{% if starting_stage %}
<p>Starting stage</p>
{% endif %}
{% if ending_stage %}
<p>Ending stage</p>
{% endif %}
</div>
</div>
<div style="margin-left: 700px; margin-bottom: 300px;">
<h1 style="color:aliceblue">Upload an Image</h1>
<form method="POST" enctype="multipart/form-data">
<input type="file" name="file" accept="image/*">
<button type="submit" >Upload</button>
</form>
</div>
</div>
</div>
</div>
</div>
37
<center>
<div class="row">
<div class="col-md-5 col-12 {{'offset-md-3 pl-md-5' if not output}}">
<div class="ml-md-5 interactiveDiv">
<h5 class="pb-1">Interactive Model</h5>
<div class="image-preview mb-3" id="imagePreview">
<img alt="Image Preview" class="image-preview__image img-fluid"
style="width: auto; max-height: 60vh;">
</div>
</div>
<div class="col-md-6 col-12">

{% if p_and_c %}
<h2 style="color:rgb(0, 4, 75);font-weight:bold">Most Dominant Colors in
the Image</h2>
<h3 style="color:rgb(0, 4, 75)">Proportions of colors in the image</h3>
<ul>
{% for p, c in p_and_c %}
<p style="color:rgb(0, 4, 75)">Color: {{ c }} - Percentage: {{ '%.2f' % (p
* 100) }}%</p>
{% endfor %}
</ul>
<img src='../static/uploads/color_proportions.png' alt="color proportion"
height="200px" style=" margin-left: 280px; margin-top:0;">
{% endif %}
</div>
</center>
</div>
38
<div class="row">
<div class="col-lg-6 col-12 mb-4 mb-md-0">
<h1>ABOUT</h1>
<h4 style="text-align: justify;">This project is made by Neelanjan Manna, a
fellow Machine
Learning and Data Science enthusiast currently enrolled in Bachelor of
Technology course at
KIIT University.</h4>
<a href="https://linkedin.com/in/neelanjan00/" target="blank"><button
class="btn" id="linkedin"
style="border: 1px solid black;">My Linkedin</button></a>
<a href="https://github.com/neelanjan00/" target="blank"><button class="btn
ml-3" id="github"
style="border: 1px solid black;">My Github</button></a>
<h1 class="mt-5">REFERENCES</h1>
<h4>
<a href="https://ieeexplore.ieee.org/abstract/document/8953774/"
style="color: black;">1.
ManTra-Net: Manipulation Tracing Network for Detection and
Localization of Image
Forgeries With Anomalous Features</a><br><br>
<a href="https://towardsdatascience.com/image-forgery-detection-
2ee6f1a65442"
style="color: black;">2. Image forgery detection, Using the power of

CNN's to detect
image manipulation</a><br>
</h4>
</div>
<img src="../static/images/s3.svg" width="500" class="pt-5 img-fluid">
</div>
</div>
</div>
</div>
39
</div>
<script src="https://code.jquery.com/jquery-3.4.1.slim.min.js"
integrity="sha384-
J6qa4849blE2+poT4WnyKhv5vZF5SrPo0iEjwBvKU7imGFAV0wwj1yYfoRSJoZ+n"
crossorigin="anonymous"></script>
<script src="../static/scripts/fullpage.min.js"></script>
<script>
var fullpage_api = new fullpage('#fullpage', {
autoScrolling: true,
scrollHorizontally: true,
verticallyCentered: true,
responsiveWidth: 767,
anchors: ['section1', 'section2', 'section3', 'section4'],
afterLoad: function (origin, destination, direction) {

if (destination.index === 0)
document.getElementById('textbox1').style.animationPlayState = 'running';
}
});
$(".custom-file-input").on("change", function () {
var fileName = $(this).val().split("\\").pop();
$(this).siblings(".custom-file-label").addClass("selected").html(fileName);
});
const inpFile = document.getElementById('customFile');

const previewContainer = document.getElementById('imagePreview');
const previewImage = previewContainer.querySelector('.image-preview__image');
inpFile.addEventListener('change', function () {
40
const file = this.files[0];
if (file) {
const reader = new FileReader();
previewImage.style.display = "block";
reader.addEventListener("load", function () {
previewImage.setAttribute("src", this.result);
});
reader.readAsDataURL(file);
}
})
</script>
{% if output %}
<script>
fullpage_api.silentMoveTo(3);
</script>
{% endif %}
</body>
</html>
41
8.2 SCREEN SHOTS
42
Input image
43
44

BDU - Document - Dominant Color in An Image Using K

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

BDU - Document - Dominant Color in An Image Using K

Uploaded by

Copyright:

Available Formats

S.NO TABLE OF CONTENTS PAGE.

1 About the Project 1

ABOUT THE PROJECT

1.2 K-MEANS CLUSTERING ALGORITHM

The k-means clustering algorithm mainly performs two tasks:

The working of the K-Means algorithm is explained in the below steps:

Step-1: Select the number K to decide the number of clusters.

Step-6: If any reassignment occurs, then go to step-4 else go to FINISH.

Step-7: The model is ready.

1.3 WORKING METHODS

2. Assign each data point to closest cluster.

3. Compute and place the new centroid of each cluster.

1.3.1 Applying to images

Figure 2: Sample Image

matplotlib import matplotlib.pyplot as plt

from mpl_toolkits.mplot3d import Axes3D

#convert from BGR to RGB

img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)

#get rgb values from image to 1D

1.4.1. Image Clustering Techniques

1.4.2. Color Representation and Analysis

1.4.3. Applications of K-Means in Image Processing

1.4.4. Visualization in Image Analysis

Milligan, G. W., & Cooper, M. C. (1985). An Examination of Procedures for Determining

1.4.6. Identified Gaps and Opportunities:

1.4.7. Authors' Contributions to Dominant Color Extraction with K-Means Clustering

1. J. He, J. Wang, and M. Li:

2. A. Singh and N. Ahuja:

Their approach demonstrated robust performance in various image quantization tasks,

1.5 PROPOSED SYSTEM

2.1 K-Means Clustering in OpenCV

Overlay Dominant Colors on the Original Image:

Display and Save Results:

● What data clustering is within the context of machine learning.

2.1.4. Discovering k-Means Clustering in OpenCV

# Generating a dataset of 2D data points and their ground truth labels

# Plotting the dataset

 The input, unlabelled data.

 Termination criteria ,TERM_CRITERIA_EPS and TERM_CRITERIA_MAX_ITER,

which, when reached, the algorithm iteration should stop.

with different initial labeling to find the best cluster compactness.

The centers coordinates of each cluster of data points.

# Specify the algorithm's termination criteria

criteria = (TERM_CRITERIA_MAX_ITER + TERM_CRITERIA_EPS, 10, 1.0)

# Run the k-means clustering algorithm on the input data

compactness, y_pred, centers = kmeans(data=x.astype(float32), K=5, bestLabels=None,

criteria=criteria, attempts=10, flags=KMEANS_RANDOM_CENTERS)

scatter(x[:, 0], x[:, 1], c=y_pred)

scatter(centers[:, 0], centers[:, 1], c='red')

3.1 UNDERSTANDING THE PROBLEM:

3.2 RESEARCH AND BACKGROUND:

3.3 DEFINING REQUIREMENTS:

3.4 IMAGE PREPROCESSING:

3.5 DATA PREPARATION:

3.8 COLOR EXTRACTION:

3.8.1 CLUSTER CENTER RETRIEVAL: Retrieve the cluster centroids obtained

3.8.2 COLOR REPRESENTATION: Convert the centroid values into color

3.9 ANALYSIS AND VISUALIZATION:

3.9.1 COLOR PROPORTIONS CALCULATION: Calculate the proportion of each

3.9.2 VISUALIZATION: Create visualizations of the dominant colors using libraries

Starting Stage Ending Stage

4.1 Dominant color selection from image using k-means clustering

using a number of programming languages like scikit-learn, python, opencv etc.