You are on page 1of 17

Re-usable Big Data

tools & some factory


use cases
Shaoang Zhang, Saikat Chatterjee, Anand Gupta, Saket Giri, Jonathan Lloyd, Caren Kee,
Amit Chattopadhyay
Big Data & Industrial IoT, HDD Ops

04/21/2021

1
ML in our team – building & scaling re-usable tools
Current state of ML usage & potential directions
• “AI” overstates current capability – “Functional ML” is more appropriate
• Perform isolated tasks  Narrow-AI at best
• Perform tasks more efficiently compared to legacy methods  $$ (capex savings, quality improvement)
• In terms of actual awareness of surroundings, little to none

• Next steps
• ML replicates current result  Native-ML design
• Active policy coordination – Narrow-AI agents informing each other
• Build some contextual awareness (efficient Federated or Few-shot learning)

• Some key challenges


• ML-Ops & Lifecycle management (data, model & code refresh)
• Integration with legacy (really legacy) systems
• Hardware allocation strategy

08/27/2021 © 2021 Western Digital Corporation or its affiliates. All rights reserved. | WESTERN DIGITAL CONFIDENTIAL 2
Smarter tools & smarter device?
• Data size huge
Business problems rolled up to categories • Variables/features in Millions
defined by • Often no structure or labels
• Requires massive compute & storage
- Data size we need to handle platforms (Cloud Edge)
- Compute & latency requirements ns•e Very high clock-speed GPUs
o
- Efficient s/w stack to support the esp • Massively parallelized s/w stack
r
application uic k
i ngq Sentinel;
w
• Data size medium– MB
n allo Computer Vision framework
l uti o
• Variables/features in 1000s
so
• Nominal storage/compute
y
c
requirements
ate n
• Parallelized code needed for low Visual data from low-cost cameras
L
ow
latency
– L
me
e
Th Praetor & Scout;
mon Mfg Tool mgmt
m
Co
• Data size small – kB
• Variables/features in 100s
• Very limited storage & compute Low-level streaming data from tool logs
(Device Edge)
• RAM used – 700-800 kB
• Compute on ARM SOC

Sparse-ML; Embedded
inference
HDD as computing device (millions of low cost processors)
HiveMind/Sentinel – unified neural network platform

Factories with disparate local platforms

Factories utilizing one unified deep learning


and computer vision enabled platform for
defect detection, quality checks, data
integration and sharing.

08/27/2021 © 2021 Western Digital Corporation or its affiliates. All rights reserved. | WESTERN DIGITAL CONFIDENTIAL 4
HiveMind/Sentinel – unified neural network platform

ML/AI Computer Vision


??

HDD Edge Product Process

ML
Scout Praetor TCR
Tako Gripper OSA Candela/IRIS Auto Label Polish Wash Plating Packing

08/27/2021 © 2021 Western Digital Corporation or its affiliates. All rights reserved. | WESTERN DIGITAL CONFIDENTIAL 5
Talent Acquisition and Team setup
• OpenCV • Kafka streams
1. Hiring of a healthy mix of senior and • Mask-RCNN • RabbitMQ/ActiveMQ
• Tesseract • Java
junior software engineers and data • TensorFlow • Scala
• Keras • Spark
scientists
2. Establishment of a Scaled Agile Framework
(SAFe) for multi-project and multi-client Computer Messaging
management Vision Layer
3. Establishment of Agile cadence and best
practices
4. Setting up process maturity roadmaps
5. Collating development phases into
milestones with deploy-ready outputs Web service
Data Layer
layer
6. Risk identifications and management
7. Portfolio management for different • Distributed database • NodeJS
projects • Cassandra
• MongoDB
• FastAPI
• ExpressJS
8. Project/team reporting to higher • Redis
• ElasticSearch
• OpenAPI

management at regular intervals

08/27/2021 © 2021 Western Digital Corporation or its affiliates. All rights reserved. | WESTERN DIGITAL CONFIDENTIAL 6
Tools and solutions being developed
Various Deployment Stages Development phase

• Sentinel: Computer Vision framework • Generative ML models (VAE; GAN) for large-scale
o MO, SDSM, PRB, BPI, FJ Dev team data augmentation & change detection
o Sarawak (staging)
o Use case for Basalt (collab with R&D team/Mipsology)
• Native Neural Network to model and predict HDD
• Streaming Data: ML on “time series” data from mfg failure modes
tools
o Scout @ MO (PN & SZ)
o Praetor @ PRB (Paris-C Clean Room lines)

• Sparse-ML : HDD as an Edge Device


o ML-based toolchains embedded in-drive
o Drive optimization
o Defect mapping & management (underway)

©2018 Western Digital Corporation or its affiliates. All rights reserved. 08/27/2021 7
Sentinel: computer vision framework

Metadata manager

Service Layer (binary image, Json,


image, Video, segmentation
(Client interface) coordinates,…)

Rule Engine

Queue manager Currently deployed as Edge Service


Model manager
(pipeline for entire system; (low-level metadata 
processing using metadata & model mgr) Database abstraction layer)

©2018 Western Digital Corporation or its affiliates. All rights reserved. 08/27/2021 8
Sentinel - for label checks
Auto Vision station
Current flow
AOI (Automatic Optical Inspection)

• Template-Matching based vision system


• 3rd party vendor-locked
• Underlying techniques don’t generalize. Higher
chance of mistakes

New flow
Auto Verify

Auto Verify ‘pings’ Sentinel as a


Upload label image (3-5MB) to Sentinel
service

Executes a collection of RPNs (Mask R-CNNs in this case)

Accelerators help here 


Basalt partnership

SDSM data – UAT phase


BPI line data - >500K requests

1s latency
Sputter Gripper tolerance control
Gripper tip alignment sample result
Vertical Tolerance = 0~+0.3 mm
Horizontal Tolerance = +/- 0.1mm
Detecting pattern changes in high-frequency tool data

Problem 1: overall change Problem 2: hidden anomalies

Blue = old pattern Orange = new pattern

Why?  Profiles may shift causing material change, but static SPC limits are not violated

• Scout (MO sputter)  Motif detection/segmentation/drift to monitor & reduce variation in sputter process

• Praetor (Drive Assembly in HDD Clean Room)  LSTM/CNN Auto-encoders to detect low-lying anomalies
Sputter sequence and data
Chamber Layer
P24 NCVD
Carbon Overcoat Layer
P23 Youtec
P22 NCT Cap Etching Layer
P21 Heater
P20 Cap
P19 ECL-5
P18 Mag-5
P17 ECL-4
P16 Mag-4
P15 ECL-3
P14 Mag-3 Mag Layer

P13 ECL-2
P12 Mag-2
P11 ECL-1
P10 Mag-1
P9 GIIL
P8 Heater
P7 ILRu2
InterLayer
P6 ILRu1
Blank
P5 Seed-2
Seed Layer
P4 Seed-1
P3 SUL 2
P2 SUL Ru Soft Under Layer
• 24 Chambers and total of 410 tool-level parameters
P1 SUL 1

©2017 Western Digital Corporation or its affiliates. All rights reserved. Confidential. 08/27/2021 12
Scout – Motif Detection algorithm
Golden Profile
 Matrix Profile:
… …  Establish a reference motif; calculate distance
relative to it
  1𝑑
𝑑   2  𝑑 𝑖 𝑑
  𝑛
 Lightweight compute; 1 tunable hyper-parameter
Monitoring Profile
 A near-universal time series similarity and anomaly
detection approach

Golden Profile Monitoring Profile

Quantitative
Map to Matrix Profile space Pattern
• Currently deployed in production across all sputter Difference
lines in media
• PN MO  significant improvement in line-to-line
variations
Layout of Assembly area
1
17

16 15 14 13 12 11 10 9 8 7 6 5 4 3 2

# MC name # MC name # MC # MC name # MC # MC name


name name

1 Base Load 2 Disk 3 Clamp 4 Disk 5 Clamp 6 Balance


Mount mount Balancing Fasten Measure

7 Spoiler 8 Ramp 9 HSA 10 Flex/VCM 11 Crash 12 Clip remove


mount mount Mount Screw mount & Vision

13 Gang 14 Manual 15 Cover 16 Cover 17 Base


Vacuum inspect mount Screw Unload

Optical/ Touch
Sensor output
profile
Praetor : Architecture & threat detection

Compression 1 < Compression Compression


factor = 1 factor < 5 factor = 5

C
O
M
P
A
R
E

• Current FD system gets most things right, like most legacy Ops
Bottleneck layer compresses control knobs
input sequence to the time-
• Comparison done with a custom loss invariant features which describe • However, some things it gets right late – subtle pattern changes
function a non-anomalous input that cause issues downstream
• Loss function  Anomaly score
• Praetor as a 2nd layer is the plan
• Highlights threats, rather than faults
• Currently in the UAT phase by Mfg Eng teams
Sparse-ML : embedding ML code in FW
Custom port of open-source ML inference framework to convert Neural Networks

TPI
Input features

Conversion Utility

BPI

Multi-task Network Computational C++03 model code


Model for a Graph of Model integrated into PTM
constrained FW code
optimization problem

DNN ML framework Embedding in FW

With the framework now demo-d, can be extended to harder problems/more complex models

You might also like