Professional Documents
Culture Documents
Group 5 Project
Group 5 Project
Fashion e-commerce
Soumya (22M2411)
Sr. No. Topic Page No.
1. Using Clickstream data for targeted approach 1
when acquiring new customers
Using Clickstream data for targeted approach when acquiring new customers
By utilising clickstream data and adopting a targeted approach, e-commerce businesses can attract and retain
high-value customers, leading to higher revenue and profitability. Personalised advertising scenarios can be a
useful tool for boosting e-commerce sales and reducing advertising costs.
Data Requirements Customer Data:- Unique customer IDs or user identifiers to track
individual customer behaviour
The Specific Item ID link is connected to the second file Product DB (PDB),
which contains detailed information about the products, such as product name,
price, current discount, etc. By analyzing the product-related data, the proposed
strategy can recommend personalized advertising scenarios that are tailored to
each customer’s preferences and needs
The CM index is the total weighted sum of the visited links, where the weights
of possible links are denoted as w1, w2, w3, and w4. The CM index takes into
account all types of adjustments and parameter settings, such as defining the
minimum time spent on the webpage to ensure that the customer’s awareness
of the presented information is sufficient. Only the customers with a sufficient
value of the index CM can be assigned to the advertising campaign for showing
ads to them.
The session starts with the customer visiting the webpage showing a group of
items, then it follows by clicking on a specific product item and transferring it to
the basket. The customer then proceeds to preview another product before
making a purchase of a specific item. At each step of the browsing session, the
CM index is recalculated and checked if it has reached the limit value L.
Customers with a CM index above the limit L are assigned to the advertising
campaign, and their CM index is set to the initial value for starting new
calculation. If the CM index does not reach the limit value L, it is updated
dynamically based on the customer’s activities. The initial CM index can be set
to different values depending on the customer’s merits. The default initial value
is 0, but it can be set as a bigger value for the customers who have registered
with the e-shop, agreed to receive promotional information, or engaged in other
interactions that might be feasible to start dynamic calculation of CM from
higher initial CM index value
Identify & personalize ad to target lookalike customers
Data Model
Techniques to be incorporated:
● Feature Engineering to create new features from available data-points ( average order value, product
category preference)
● Collaborative filtering for Modeling
A comprehensive data model forms the bedrock of a successful CF system, capturing information from diverse
sources to paint a detailed picture of individual preferences and trends.
Customer Data:
Product Data:
● Product Attributes: Category, brand, style, color, size, material, price, and seasonality.
● Product Descriptions and Reviews: Enrich data with details and customer sentiment.
● Product Images and Videos: Showcase product details and visual characteristics.
Data Sources:
By harnessing this diverse data set, the system creates a user-item matrix, where rows represent users and
columns represent items. Each cell reflects the user's interaction with the corresponding item, such as
purchase history, ratings, or browsing behavior.
Among the various data-driven and knowledge-driven techniques suitable for e-commerce personalization,
collaborative filtering (CF) is both, powerful and versatile approach.
CF leverages the fundamental premise that users with similar preferences tend to purchase similar items. It
operates in three core steps:
● Building the User-Item Matrix: As mentioned earlier, this matrix forms the foundation of CF, capturing
user interactions with various items.
● Identifying Similar Users: Employing similarity measures like cosine similarity, Pearson correlation
coefficient, or Jaccard similarity, the system identifies users who exhibit similar tastes and preferences
based on their interaction patterns within the user-item matrix.
● Recommendation Generation: The system analyzes the purchase history or interactions of similar
users and recommends items that these users have interacted with but which the target user has not
yet encountered.
Advantages of CF:
● Scalability: CF can handle large data sets effectively, making it suitable for e-commerce platforms with
a vast user base.
● Cold Start Problem Mitigation: Even for new users with limited purchase history, CF can leverage
data from similar users to generate relevant recommendations.
● Interpretability: Understanding the logic behind CF recommendations allows for easier troubleshooting
and improvement.
While CF offers a strong foundation, it can be further enhanced by incorporating other techniques like content-
based filtering, which recommends items based on the attributes of previously purchased items, or hybrid
approaches that combine CF with other techniques for a more comprehensive recommendation strategy.
Implementation
Building a successful CF system involves a well-defined process, encompassing several key stages:
● Data Preprocessing: Ensure data quality by cleaning missing values, handling inconsistencies, and
normalizing data if necessary.
● Model Selection: Choose an appropriate similarity measure and recommendation generation algorithm
based on factors like data characteristics, desired outcomes, and computational resources.
● Model Training: Train the chosen model using the prepared user-item interaction data. This training
process allows the model to learn the underlying relationships between users and items.
● Evaluation: Measure the model's performance using metrics like precision, recall, and NDCG
(Normalized Discounted Cumulative Gain). These metrics assess the accuracy and relevance of the
recommendations generated by the model.
● Continuous Refinement: Regularly update the model with new data and experiment with different
hyperparameters and algorithms to optimize its performance and tailor it to the specific needs of the
platform.
Anticipated Outcomes
A well-implemented collaborative filtering recommender system promises tangible benefits and positive
outcomes:
By adopting CF techniques, e-commerce platforms create highly personalized experiences for their customers.
This translates into numerous benefits:
● Effortless Product Discovery: CF uncovers items that align with their unique preferences, reducing
the time and effort spent browsing through vast collections.
● Elevated Fashion Sense: Exposure to a wider range of products and styles, handpicked based on
their preferences, expands their sartorial horizons and encourages experimentation.
● Streamlined Shopping Experience: Recommendations tailored to their interests and preferences
create a smoother and more enjoyable shopping experience.
● Enhanced Confidence in Purchases: Reviews and recommendations from users with similar tastes
reduce uncertainty and boost confidence in purchase decisions, particularly for new or unfamiliar items.
Accurate recommendations not only boost sales but also drive up the average order value as customers are
more likely to add multiple suggested items to their carts. Personalization strengthens customer relationships
by fostering loyalty, reducing churn, and stimulating positive word-of-mouth, contributing to long-term growth.
Moreover, analyzing successful recommendations provides valuable insights into customer preferences,
market trends, and growth opportunities. Leveraging collaborative filtering for recommendation systems
enables targeted marketing campaigns, optimizing marketing spend by efficiently reaching the most
relevant audiences with customized offers.
Image recognition system:
● Data Required:
Product Images: High-quality images of your fashion items from various angles (front, back, sides) with
consistent backgrounds and lighting.
Product Labels: Information associated with each image, such as product ID, name, brand, category (e.g.,
dress, shirt), material, color, size, etc.
User Interaction Data: If possible, track user interactions with product images (clicks, views, purchases) to
understand user preferences and refine recommendations.
● Data Sources
Internal Data: Your product images and associated information stored in your product management system.
External Data: Collaborate with fashion photographers for diverse, high-quality images or consider licensing
image datasets specific to fashion.
● Data Preprocessing:
Resizing and Normalization: Uniform image size across the dataset for consistent processing.
Data Cleaning: Remove blurry, low-resolution images, or images with incorrect labels.
Data Augmentation: Artificially create variations of existing images (e.g., rotations, flips) to increase training
data size and improve model robustness.
● Modelling:
Convolutional Neural Networks (CNNs): The standard choice for image-related tasks due to their ability to
learn hierarchical visual features. Popular CNN models for image recognition include:
● VGGNet
● ResNet
Pretrained Models and Transfer Learning: Start with an existing, pre-trained model (like those listed above)
trained on large image datasets (like ImageNet). Refine it on your specific fashion dataset. This often leads to
faster training and better results than training from scratch.
2. Feature Extraction
The CNN acts as a feature extractor: Input an image; the lower layers of the CNN learn basic features like
edges and color gradients. Higher layers learn more complex combinations of features, representing different
parts or styles of clothing.
Output: The output from the CNN is a set of features (sometimes called an "embedding") that represent the
image in a compact, numerical form.
3. Classification
Model Type: Use a classifier like a Support Vector Machine (SVM), Random Forest, or a simple fully
connected neural network layer.
Training: This classifier is trained on the extracted features and their corresponding labels (product categories,
attributes). The goal is to learn to map these features to the correct labels.
Prediction: When given a new image, the same CNN extracts its features, and the trained classifier predicts its
product category or attributes.
4. Similarity Search & Recommendation Engine
Feature Space: The features extracted by the CNN act as a "fashion search index" – items that look similiar
are placed close together in a multidimensional space.
Nearest Neighbors: When a user interacts with an image, find the closest images (nearest neighbors) in this
feature space. These closest images are the system's findings.
Example: Recommending shirt in our catalog similar to image
Introduction
We proposed a personalized Fashion Recommender system that generates recommendations for the user
based on an input given. Unlike the conventional systems that rely on the user's previous purchases and
history, this project aims at using an image of a product given as input by the user to generate
recommendations since many-a-time people see something that they are interested in and tend to look for
products that are similar to that. We use neural networks to process the images from Fashion Product Images
Dataset and the Nearest neighbour backed recommender to generate the final recommendations.
Methodology
Data Preprocessing
The initial stage involves preparing the fashion images to be compatible with the ResNet50 model. This
includes resizing, normalization, and augmentation techniques to enhance the diversity and quality of
the dataset for better training outcomes.
Model Training
The core of the system relies on transfer learning from ResNet50, a proven model in image recognition
tasks. The project enhances the model by adding custom layers tailored to the specifics of fashion item
recognition, enabling fine-tuning to the project's unique requirements.
Recommendation Generation
For generating recommendations, the system extracts features from the user-input fashion image using
the trained CNN. These features represent high-level attributes of the clothing item, such as texture,
shape, and color. The Nearest Neighbor algorithm then searches the inventory database for items with
similar features. This search can be accelerated using approximation methods or indexing structures to
ensure scalability. The most similar items are presented to the user as recommendations, providing a
personalized shopping experience.
As shown in the figure Initially, the neural networks are trained and then an inventory is selected for generating
recommendations and a database is created for the items in inventory. The nearest neighbour’s algorithm is
used to find the most relevant products based on the input image and recommendations are generated.
Once the data is pre-processed, the neural networks are trained, utilizing transfer learning from ResNet50.
More additional layers are added in the last layers that replace the architecture and weights from ResNet50 in
order to fine-tune the network model to serve the current issue. The figure shows the ResNet50 architecture.
Due to constraints in time and resources, the images from Kaggle Fashion Product Images Dataset are used
for the experiment. The inventory is then run through the neural networks to classify and generate embeddings
and the output is then used to generate recommendations. The Figure shows a sample set of inventory data
To generate recommendations, our proposed approach uses Sklearn Nearest neighbours. This allows us to
find the nearest neighbours for the given input image. The similarity measure used in this Project is the Cosine
Similarity measure. The top 5 recommendations are extracted from the database and their images are
displayed.
KBS for VENDOR/SUPPLIER MANAGEMENT
Introduction: We will be utilizing a rule-based expert system with the following functionalities
● Product ID
● Vendor ID
● Vendor historic transactions
● Product Name: Ensure consistency and adherence to naming conventions.
● Attributes (Size, Color, Material): Validate and standardize attribute values.
● Category: Ensure accurate categorization based on predefined categories and subcategories.
● Brand: Validate and standardize brand names.
Sources: Vendor Input Forms, API Integrations ( Integrate with vendor systems through APIs to validate
and standardize data during data exchange),External Data Providers
Automated Categorization:
● Category IDs: Assign products to appropriate categories and subcategories based on predefined rules.
● Attribute details (e.g., Style, Type): Use attributes to further refine categorization.
Sources: Fashion Taxonomy Databases( for categories and subcategories, as well as attributes),
ML model for categorization.
Pricing Rules:
● Price: Apply pricing rules based on factors such as cost, competitor prices, or market demand.
● Discounts, Promotions: Implement rules for applying discounts and promotions dynamically.
Sources: Competitor Pricing Data, Marketplace Data, Cost Data (from vendors)
1. Knowledge Base:
b. Automated Categorization:
● Category Rules:
● Develop rules to categorize products based on attributes.
● Utilize historical data and fashion taxonomies to inform categorization.
c. Pricing Rules:
● Dynamic Pricing Rules:
● Formulate rules for dynamic pricing based on market trends, competition, and cost.
● Incorporate strategies for discounts, promotions, and pricing adjustments.
2. Inference Engine:
● Decision-Making Logic:
● Implement the logic that interprets rules and makes decisions.
● Use if-then statements or rule-based engines for processing.
7. Feedback Loop:
● User Feedback Mechanism:
● Establish a system for vendors and administrators to provide feedback on the effectiveness of
rules.
● Use feedback to refine and improve the knowledge base continuously.
Benefits:
Customer Profile: Customers who follow fashion influencers and enjoy discovering new trends. She/he values
quality and is willing to pay a premium for unique pieces but is budget-conscious and seeks good deals.
Data Types:
● Categorical: Product categories (dresses, tops, etc.),
brands, colours, sizes, discount codes, influencer names,
weather conditions.
● Numerical: Prices, purchase frequency, time spent
browsing, number of clicks, social media engagement
metrics, income level.
● Sequential: Product browsing history (sequence of viewed
items), purchase history (chronological order of purchases).
AI/ML Techniques:
Model Selection:
Model Training:
● Data Preprocessing: Clean, normalise, and engineer features from various data sources.
● Model Training and Evaluation: Train and evaluate different AI/ML models based on chosen
techniques and data types.
● API Development: Develop an API to connect the chosen model to the e-commerce platform for real-
time price adjustments.
● Integration and Deployment: Integrate the API with the platform and deploy the dynamic pricing
system for individual customers.
● Monitoring and Refinement: Continuously monitor model performance, customer behavior, and
market trends. Retrain and refine the model based on new data and insights.
The expected outcomes for this implementation are as follows:
● Increased conversion rate: Personalized pricing nudges consumers towards purchasing items she's
interested in at prices she finds acceptable.
● Improved customer satisfaction: consumer feels valued and appreciates the tailored shopping
experience that caters to her preferences and budget.
● Optimized pricing strategy: The company maximizes revenue by offering consumers the right price
without compromising margins.
● Enhanced customer segmentation: AI/ML insights help the company create targeted marketing
campaigns and promotions for different customer segments.
● Data-driven decision making: The company can make informed decisions about product pricing,
inventory management, and marketing strategies based on real-time customer data and AI/ML
analysis.
For Consumers:
● Personalized shopping experience: consumer finds products she loves at reasonable prices, leading
to a more enjoyable shopping experience.
● Transparency and choice: The company can explain the factors influencing dynamic pricing, allowing
consumers to make informed purchase decisions.
Data Requirements Text data reviewing products and product specific service
Data Source Customer Reviews, Order feedback inputs, return reasoning and
other text based inputs from customers from CRMs
For initial sentiment, we’ll be going through the text data, determining whether a review, comment or feedback
related to a particular product ID is negative or positive. This can easily be done by going through product and
user ID attribution to a review, preprocessing, tokenization and n-grams based sentiment scoring, using nltk
libraries.
https://colab.research.google.com/drive/1XMGLlaHhnVtYIN9sZ2TgrJWQza4hvhQK?usp=sharing
For this, simple NLP libraries won’t be sufficient and thus would
involve integration with existing GPT implementations (like
FashionGPT), with in-built Fashion quotient into their systems. These GPT systems will be able to attribute
negative sentiments to each of the product factors effectively due to their inbuilt LLMs that have been acclimated
to fashion queries and issues.
When clustered on Multidimensional planes (product, brand, seller) using simple clustering methodologies like
K-means clustering, we can identify which sellers and brands are plagued by what kind of complaints in the
negative sentiment. This can help us take action against sellers with repeatedly bad performance in areas of
accountability and also share the same info with the brands with feedback/warnings regarding the same.
Outcome: Through this, the feedback received by customers will be acted upon, if based on high volumes of
negative sentiment surrounding a specific product, focusing on specific attributes for improvement, thus
appealing to existing customers (their voices being heard).
Image Recognition and matching for item selection and customer profiling
Data Requirements Image datasets for clothing, i/p images from customer interest,
description text from all product and image postings
Data Source Internal display images, MNIST fashion for training, social media
fashion image interests, clickstream data from images, banners,
This involves either training a model from scratch on fashion image datasets or utilizing existing models like R-
CNN Fashion-MNIST or CLIP and adding additional layers on top using images from our portfolio of fashion
products. This will build a functional model which will be able to:
a. Separate all the fashion objects from a given image into separate
entities
b. Identify each fashion entity by mapping it into a given object in
catalog (in case of lower matching scores, mapping it to a generic
fashion name instead)
Outcomes: Search improvement within own collection of products, use of image clicks to develop interests
and give user exactly what they need
Personalizing websites with dynamic content layouts - Personalized filtering
Data Requirements Product metadata that the user interacts with, user-generated search data
Data Source Analytics engine, clickstream data, purchase data, filter data, CRM, etc
● Dynamic Content Blocks: Creating dynamic sections on websites that adapt based on user behavior.
For example - displaying a “Recommended for You” section with personalized product
recommendations.
● Personalized Product Listings:Arranging product listings based on user preferences.For example -
showing products in the user’s preferred category or style at the top.
● Tailored Navigation Menus: Customized navigation menus based on user interests. For example -
highlighting relevant categories or collections.
● Individualized Product Pages: Customized product pages with content relevant to the user. For
example - show reviews, related products, and styling tips.
1. Gathering data: Metadata connected to a product that they have recently interacted with is collected.
The visitor then places 2 items in their cart, a red and a green t-shirt, but ultimately purchases the red
one. The following data is gathered:
2. Engagement score calculation: The score of each attribute value is updated by the type of interaction
with it as well as when it occurred. A correlation between the user interaction and an attribute value
must first be determined.
In our example, let's say the default weights for interactions are as follows:
● Purchase: 4X
● Add to cart: 3X
● Add to wishlist: 2X
● Product view: 1X
Engagement score = interaction type weight x attribute value count
3. Recency score calculation: Since user behavior and preferences change over time, importance must be
given to how recently the interaction has taken place.
Now the affinity profile reflects two strong scores (engagement and recency), enabling us to showcase
products the user is more likely to engage with at any given moment in time.
Outcomes: In this manner, dynamically re-ordering content and product categories according to user
preferences will allow visitors to quickly find what they’ve come for.
● Instead of defaulting search results to a ‘featured products’ page, retailers can utilize user-generated
search data to personalize results automatically, arranging product grids according to each user’s
buying preferences.
● If a user constantly filters according to rating, for instance, they can automatically be presented with a
default sorting order that features the highest-rated products first.
● If user tends to purchase for himself in fixed size over last few purchases, set auto filters for his gender,
his size on products being searched, to enable quick selection and purchase for a better optimized
journey
Employing a sophisticated personalization platform with predictive analytics that automatically tailors and
transforms the layout according to the preferences of individual user segments.