You are on page 1of 63

ProxySG Performance Monitoring and

Troubleshooting
April 2016

Rob Ritchardson: Product Support Specialist


Why is performance monitoring important?

• ProxySG has a finite amount of resources which it uses to process


traffic.
• Internal and external issues can create situations where available
resources become scarce and traffic processing is impacted.
• It is critical that ProxySG administrators understand what those
resources are, what data is available, how to monitor that data and
how to react to issues impacting resources.
• Three key areas of performance monitoring on ProxySG:
• CPU
• Memory
• Bandwidth
Agenda

• CPU monitoring via the management console and CLI


• Statistics available and their use, CPU Monitor 
• Identifying and investigating current or past CPU issues
• Memory monitoring via the management console and CLI
• Statistics available and their use, Threshold monitor 
• Identifying and investigating current and past memory issues
• Bandwidth monitoring via the management console
• Understanding bandwidth impact on proxy
• Troubleshooting
Troubleshooting CPU and memory issues
CPU monitoring via the management
console and CLI
CPU statistics in the management console UI

• CPU percentages
percentages are available in many places within the
management console UI:
• Statistics->Summary->Device

• Statistics->Health Monitoring->General

• Statistics->System->Resources->CPU
CPU statistics in the management console UI

• Statistics->System->Resources->CPU includes historical graphs


Statistics->System->Resources->CPU
of CPU usage over different time periods.
CPU statistics in the management console UI

• All three CPU reports/graphs are generated from the same set
of data on ProxySG.
ProxySG. This data is available in the
the persistent
data manager (PDM) statistics in a sysinfo and default
snapshots.
• All three CPU reports/graphs show a single CPU percentage
• On ProxySG 6.6.1.x or older (6.5, 6.4, etc.) the busiest CPU on multiple
CPU ProxySG platforms is shown in these reports/graphs
• On ProxySG 6.6.2.x and newer an average of all the platforms CPUs is
shown in these reports/graphs
• The CPU percentage shown is the average CPU over 60 seconds
• Very short spikes in CPU usage might not show in these reports/graphs
CPU statistics in the CLI

• CLI also has many ways to display


CPU:
• ‘show status’
• ‘show cpu’
#show cpu
Current
Current maximum
maximum CPU usage
usage (%):
(%):
3.5

• Single CPU reports follow same


calculations as managemen
managementt
console UI reports.
CPU statistics in the CLI
#show
show cpu
cpu all
Current CPU usage (%):

• ‘show cpu’ has two optional flags to CPU 0: 3.3


give additional information: CPU 1: 0.6

• [all] which shows all CPUs individually #sho


#show
w cpu
cpu exte
extend
nded
ed
• [extended] which shows CPU usage Current maximum CPU usage (%):
over 1, 5, 10 and 60 second averages
1sec 5sec 10sec 60sec
• Command examples are from a 2 CPU CPU: 3.3 3.3 3.3 3.5
SG900
#show cpu extended all
• The extended information contains Current CPU usage (%):
shorter timeframes for CPU average
1sec 5sec 10sec 60sec
which allows for short CPU spike CPU 0: 3.2 3.3 3.3 3.5
visibility. CPU 1: 0.5 0.6 0.6 0.6
CPU statistics in the advanced URLs

• Advanced URLs include two CPU


statistics pages:
• CPU Usage Statistics
• CPU Monitor 
• CPU Usage Statistics Information:
• Advanced URL:
/Diagnostics/CPU/Statistics
• CPU 0 at the top and increments
going down
• 1, 5, 30 and 60 second averages
CPU Monitor general information

• CPU Monitor is a tool that allows an administrator a way to


identify suspect ProxySG processes or componen
components ts when
investigating high CPU issues. This information is key for quicker
resolution of high CPU issues.
• CPU Monitor is the ProxySG’s equivalent to Linux’s top
command or Windows’s task manager application
• CPU Monitor is available in an advanced URL and CLI
• CPU Monitor configuration retained after a reboot
• Running CPU Monitor incurs 1-2% CPU overhead.
CPU Monitor general information

Configured interval duration: 5 seconds


• Example: Current interval complete in: 4 seconds
CPU 0 99%
• Configurable HTTP and FTP 89%
interval duration Object Store 10%
Access Logging 1%
• CPUs listed with its Miscellaneous 1%
CPU 1 22%
CPU usage TCPIP 20%
(rounded up) DNS service 1%

• Components shown
• Most component names are meaningful
• Two components commonly seen that need clarification:
• Object store – 
store – Kernel,
Kernel, Cache Engine, Storage
• Miscellaneous – 
Miscellaneous Processing that does not fit into a main component
 – Processing
CPU Monitor in the advanced URL

• CPU Monitor advanced URL:


/Diagnostics/CPU_Monitor/Statistics
• Advanced URL options:
• Start CPU Monitor 
• Stop CPU Monitor 
• View CPU Monitor data
(automatic browser refresh)
• CPU Monitor advanced
URL included in sysinfo
and default snapshots
s napshots
CPU Monitor in the CLI

• Configuration done in ‘configure terminal’ mode.


• CPU Monitor’s interval configuration in CLI only.

#conf t <enter> 

#(config)diagnostics <enter> 

#(config
#(config diagnost
diagnostics)
ics)cpu-m
cpu-monit
onitor
or ?

disable
disable Disable
Disable the CPU Monitor
Monitor
enabl
enable
e Enable
Enable the CPU Monito
Monitor
r
interval Configure the CPU Monitor interval
CPU Monitor in the CLI

• Viewing CPU Monitor output from the CLI must be done from f rom
‘enable’ mode
• Command to view CPU monitor is ‘show cpu-monitor’
• Data is not updated #show cpu-monitor
until interval expires CPU Monitor:
Configured interval duration: 59 seconds
• Time in interval Current interval complete in: 18 seconds
remaining also
CPU 0 6%
displayed Console Agent 3%
Miscellaneous 2%
CPU Health check alerting

• How to view CPU usage is clear but constant viewing in


anticipation of
of a CPU
CPU issue is not reasonable. Health
Monitoring includes CPU alerting along with many other
resource alerting.
• Statistics->Health Monitoring->General shows:

• CPU utilization

• Current state

• Health shows states in prioritized


priori tized order: Critical, Warning, OK
CPU Health check alerting

• Health Monitor states are controlled by configurable


thresholds. Changes in state
state trigger alerts to all of the
configured alerting mechanisms.
• Configurations in Maintenan
Maintenance->Health
ce->Health Monitoring->Ge
Monitoring->General
neral
• Configurable CPU percentage thresholds for critical and warning state
• Configurable intervals for critical and warning state
• Selectable log, email and trap alerting facilities
• Email and trap alerts require additional configurations on the
ProxySG
ProxySG to function
function prop
properly.
erly.
CPU Health check alerting

• Sample image of CPU alert configurations:

• Typically thresholds and intervals are not changed.


Identifying and investigating an active CPU issue

• When investigating an active CPU issue the main goal is to first


identify the component(s) using the most CPU.
• Check health status in the management console UI. Is it Warning
Warning or
Critical state?
• Observe CPU usage trends in Statistics->System->R
Statistics->System->Resources->CPU.
esources->CPU.
Constant high CPU usage versus CPU spikes must be kept in mind
when working with CPU monitor.
• Enable CPU monitor and record samples based on the CPU
usage trends seen in Statistics->System->Resources->CPU.
Statistics->System->Resources->CPU.
• If the management console UI is unresponsive use the CLI (SSH
or serial console) and use ‘show cpu’ and ‘show cpu-monitor’
Identifying and investigating an active CPU issue

• Once a suspect component is identified analysis can begin


• Suspect: Policy
Configured interval duration: 5 seconds
• Investigation: Current interval complete in: 4 seconds
• Policy change? CPU 0 99%
Policy
Policy evalua
evaluatio
tion
n - HTTP
HTTP 50%
50%
• Regex policy rules? HTTP and FTP 35%
• Active sessions analysis Object Store 10%
Access Logging 1%
• Access logs Miscellaneous 1%
• More examples like this in CPU 1 22%
TCPIP 20%
the troubleshooting section DNS service 1%
Identifying and investigating a past CPU issue

• Investigating past CPU issues are more difficult since CPU


Monitor is typically not enabled.
enabled. Graph oror statistical data
analysis is needed to match CPU trends that are found.
• Check each graph duration in Statistics->System->Resources-
>CPU to see if the CPU
CPU issue can be seen. Use that duration
duration in
other graph data (Traffic Mix, Client Workers, etc.) to see if
there are matching trends to help identify a root cause.
• SNMP monitoring can greatly assist in this type of investigation
• SNMP monitoring knowledge asset: https://
https://youtu.be/PvH30MLfEQY
youtu.be/PvH30MLfEQY
• SNMP resource monitoring document on BTO
Identifying and investigating a past CPU issue

• If nothing is found in the graph data, statistical analysis is an


option
• Persistent Data Management (PDM) statistical data isi s available in the
ProxySG’s sysinfo, default snapshots and heartbeats.
• All graphs within the management console UI are built from PDM data.
• Snapshots and heartbeats provide historical statistics that can
provide visibility into past CPU issues, sometimes at a very
granular level.
• Snapshots can be viewed and download from the t he
/Diagnostics/Snapshot advanced URL
• Heartbeats are sent on a daily bases to all email recipients configured
Identifying and investigating a past CPU issue

• PDM statistical data example:


system:cpu-usage~hourly@Fri, 01 Apr 2016 00:08:00 UTC[07](60, 60): 9 9 9 9 9
9 9 9 9 9 9 9 10 9 9 9 9 9 9 9 12 9 10 9 10 9 10 9 9 9 9 9 9 9 9 9 9 9 9 9 9
10 9 9 9 9 9 9 9 12 10 9 18 12 9 10 10 10 9 9

• The above shows CPU usage over an hour of time


• Name of statistic follows PDM syntax: system:cpu-usage~hourly
• Date and time mentioned
• Current/newest sample is on the right side
• (x, y) or (60, 60) above;
a bove; x = number of samples and y = time in seconds
for each sample. 60 samples * 60 60 seconds = 3600 seconds
seconds or 1 hour 
• Each sample the value at that time, not an average.
Identifying and investigating a past CPU issue

• All CPU related data available:


system:cpu-usage~hourly@Fri, 01 Apr 2016 00:08:00 UTC[07](60, 60): 9 9 9 9 9 9 9 9 9 9 9 9 10 9 9 9
9 9 9 9 12 9 10 9 10 9 10 9 9 9 9 9 9 9 9 9 9 9 9 9 9 10 9 9 9 9 9 9 9 12 10 9 18 12 9 10 10 10 9 9
system:cpu-usage~daily15minute@Fri, 01 Apr 2016 00:00:00 UTC[95](96, 900): 2 1 1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 5 5 5 5 5 5 5 5 5 11 9 9 9 9 9 9 9 9 9
system:cpu-usage~daily@Fri, 01 Apr 2016 00:00:00 UTC[23](24, 3600): 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
1 1 1 4 5 7 9 9
system:cpu-usage~weekly@Fri, 01 Apr 2016 00:00:00 UTC[19](28, 21600): 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1 1 1 5
system:cpu-usage~monthly@Fri, 01 Apr 2016 00:00:00 UTC[17](31, 86400): 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2
system:cpu-usage~yearly@Sun, 27 Mar 2016 00:00:00 UTC[15](52, 604800): 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1

• Default snapshots
snapshots contain all of the above.
above. Heartbeats only
only
contain ‘daily15minute’.
Identifying and investigating a past CPU issue

• Once a CPU issue is found in the PDM data other types of data
can be analysed to find correlati
correlations.
ons. An example:
system:cpu-usage~hourly@Fri, 01 Apr 2016 00:08:00 UTC[07](60, 60): 9 9 9 9 9
9 9 9 9 9 9 9 10 9 9 9 9 9 9 9 12 9 10 9 10 9 10 9 9 9 9 9 9 9 9 9 9 9 9 9 9
10 9 9 9 9 9 9 9 12 10 9 50 90 100 46 10 10 9 9
users:current~hourly@F
users:current~hourly@Fri,
ri, 01 Apr 2016 00:08:00 UTC[07](60, 60): 1 0 1 1 1 0 0
0 1 0 1 0 0 1 1 0 0 0 0 0 1 0 0 0 1 0 0 0 0 0 1 0 1 0 1 1 0 0 0 0 1 0 0 0 1 0
0 0 0 0 1 0 3057 5932 6348 2680 1 0 0 0

• In this example we can see that a spike in


i n user counts
correlates with the spike in CPU usage.
Identifying and investigating a past CPU issue

• There are many PDM statistics that are helpful in investigating


CPU issues. Most of the names are meaningful
meaningful so that you can
determine what they track.
• PDM data is space separated for ease of graphing in the
spreadsheet tool of your choice.
• If the CPU issue has a predictable pattern use Health
Monitoring’s ‘Warning’ state to alert you early enough that the
issue is going to happen so it can be investigated live.
• Enable CPU monitor and leave it enabled if future occurrences
of the CPU issue are expected.
Memory monitoring via the management
console and CLI
Memory statistics in the management console UI

• Like CPU, Memory percentages


percentages are available in many places
within the management console UI:
• Statistics->Summary->Device:
• Only historical view
• Statistics->Health Monitoring->General
Monitoring->General::

• Statistics->System->Resources->Memory
Statistics->System->Resources->Memory Use:
Memory statistics in the management console UI

• Statistics->System->Resources->Memory
Statistics->System->Resources->Memory Use includes a number
of data points.
• For memory issues look at
the following:
• Committed and available
memory at the top
• Committed and free
application memory at the
bottom
• Issues are usually with
application memory
Memory statistics in the CLI

• Two CLI commands show


available memory:
• ‘show status’
• ‘show resources’ contains disk
and memory information
Memory statistics in the advanced URLs

• Advanced URLs include two


Memory statistics pages:
• System Memory Statistics
• Threshold Monitor 
• System Memory Statistics
Information:
• Advanced URL: /System/memory
• Similar information in the
management console UI
• Committed memory / Application
data
Threshold monitor general information

• Threshold Monitor is a set of statistics that track bytes of


memory allocated
allocated per ProxySG
ProxySG componen
componentt in 3 different
different time
intervals.
• Threshold Monitor allows an administrator a way to identify
suspect ProxySG componen
components ts when
when investigating
investigating high
high
memory issues. This information is key for quicker resolution
resolution of
high memory issues.
• Threshold Monitor is available in an advanced URL and the
advanced URL data can be shown in the CLI.
• The memory allocation statistics are cleared on a reboot.
Threshold monitor general information

• Threshold Monitor statistics are included in the default


snapshots and sysinfo.
• Threshold Monitor statistics in the advanced URL displays
componentt names where the same statistics in the default
componen
snapshots
snaps hots and sysinf
sysinfo
o does not. Exam
Example:
ple:
Adva
Advanc
nced
ed UR
URLL (fr
(from
om ho
hour
urly
ly,, Lin
Linea
earr mem
memor
ory
y sta
stats
ts)) Snap
Snapsh
shot
ot/S
/Sys
ysin
info
fo
1, Miscellaneous TM004.1.0
1, Authentication TM004.1.1
• Comparison of the advanced
advanced URL
URL of a ProxySG
ProxySG to the
snapshot/
snapshot/sysin
sysinfo
fo data is recommende
recommended. d.
Threshold monitor advanced URL

• Threshold Monitor advanced URL: /TM/Statistics


• Statistics contain:
• Each ProxySG
ProxySG compo
component
nent and its memory
memory usage in bytes
bytes
• Current/newest entries on the right
• ‘-’ indicate no data recorded (reboot or short uptime)
• Three grouping of sample intervals
• 60 minutes total; 30 samples at 2 minutes each
• 1 day total; 24 samples at 1 hour each
• 1 month total; 30 samples at 1 day each
• Two groupings of the above; Linear and Physical memory
• CSV values for easy graphing in preferred spreadsheet application
Threshold monitor advanced URL

• TCPIP’s memory usage over the last hour, static (this is OK):
1, TCPIP: 3358720, 3358720, 3358720, 3358720, 3358720, 3358720, 3358720, 3358
720, 3358720, 3358720, 3358720, 3358720, 3358720, 3358720, 3358720, 3358720,
335
3358720,
720, 33587
58720,
20, 3358
33587
720,
20, 335
335872
8720, 335
3358720
8720,
, 3358
335872
720
0, 3358
358720,
720, 335
335872
8720, 33587
587
20, 3358720, 3358720, 3358720, 3358720, 3358720,

• SSL’s memory usage over the last hour, changes drastically:


1, SSL and Cryptography: 2195826688, 2200390451, 2131156718, 2001598054,
1963358617, 1908186180, 1840209783, 1707371861, 1525337702, 1403926664,
1203994077, 989983402, 978771285, 1045835776, 1109430408, 1493174681,
1624545416, 1783512541, 1830913092, 1929578222, 1929619046, 1950044706,
2146023833, 2117209838,

• ProxySG
ProxySG had 4GBs
4GBs of RAM in the above example
exampless
Memory Health check alerting

• Similar to CPU alerting, Health Monitoring also allows


all ows for
memory usage alerts.
• Statistics->Health Monitoring->General shows:
• Memory utilization

• Current state

• Health shows states in prioritized


priori tized order: Critical, Warning, OK
Memory Health check alerting

• Health Monitor states are controlled by configurable


thresholds. Changes in state
state trigger alerts to all of the
configured alerting mechanisms.
• Memory warning threshold
threshold is 90% and critical threshold is 95%
• Interval times are the same; 120 seconds
• TCP regulation (Memory protection function that delays new requests,
more on this on the next slide) occurs at 80% memory usage,
changing the warning threshold to 75% will alert you about this event
• Use the warning threshold to proactively alert administrators of
upcoming memory issues that need to be resolved.
Identifying memory issues

• Understanding TCP Acceptance Regulation


• At 80% memory pressure the ProxySG will go into TCP Regulation.
• When this occurs the Proxy will STOP accepting new TCP
Connections until memory drops below the threshold ( lower limit ).
• Recorded in the event log and Threshold Monitor statistics.
stat istics.
Identifying memory issues

• Understand memory usage patterns as a climb in memory


usage might not always be an indication of an issue
• Memory will rise and fall
during operational times

• A leak is an ever
increasing value over
time.

• A small, slow leak can be


hidden within normal
usage
Identifying and investigating an active memory issue

• When investigating an active memory issue the main goal is to


first identify the component(s) using the most memory.
• Check health status in the management console UI. Is it Warning
Warning or
Critical state?
• Observe memory usage trends in Statistics->Summary->Device
• Check memory statistics
stati stics in Statistics->System->Resources->Memory
Statistics->System->Resources->Memory Use
• Access Threshold Monitor’s advanced URL, save output and
analyze the data to find the suspect
suspect component.
component.
• If the management console UI is unresponsive use the CLI (SSH
or serial console) to enter enable mode and save output of
‘show advanced-url /TM/Statistics’ command.
Identifying and investigating an active memory issue

• Once a suspect component is identified analysis can begin


1, Configuration: 2689495040, 2756556526, 3015291153, 3166623744, 3238112597,
3353121177, 3356315648, 3426601642, 3452166144, 3385105203, 3356364800,
3407458850, 3452166144, 3452166144, 3452165597, 3452149760, 3452149760,
3452149760, 3448963481, 3547983872, 3598999278, 3643637760, 3621263906,
3643473920,

• Suspect: Configuration • Unusual component to be


• Investigation: leaking memory
• Configuration change? • This is a bug
• Director involvement? • More examples like this in the
• Configuration failures in troubleshooting section
event logs
Identifying and investigating a past Memory issue

• Investigating a past memory issue is difficult as usually this is


done after a reboot. A reboot clears
clears all of the memory
memory related
statistics.
• Aspects of investigating CPU issues applies to memory issues
• Analysis of the graphs in the management console UI can be
helpful.
• Statistics->Protocol details, most proxy types include client count
graphs. Spike in clients can cause memory issues.
• Statistics->Traffic mix, look for large spikes in load
• SNMP monitoring can greatly
greatly assist in this type of investigation
Identifying and investigating a past Memory issue

• If nothing is found in the graph data, statistical analysis is an


option using PDM data which we previously discussed for CPU
issue investigation.
• Memory statistics are included with CPU statistics in the same
snapshot and heartbeat historical data.
• PDM statistical data example:
system:memory-usage~hourly@Tue, 21 Oct 2014 15:34:00 UTC[33](60, 60): 78 79
78 78 78 78 79 79 78 79 79 79 79 78 78 78 78 78 78 79 79 78 79 79 79 79 79 79
79 79 79 79 79 79 79 79 79 79 79 79 79 79 78 79 78 79 79 79 79 79 79 79 79 76
76 76 76 76 76 76
Identifying and investigating a past memory issue

• All memory related data available:


system:memory-usage~hourly@Tue, 21 Oct 2014 15:34:00 UTC[33](60, 60): 78 79 78 78 78 78 79 79 78 79
79 79 79 78 78 78 78 78 78 79 79 78 79 79 79 79 79 79 79 79 79 79 79 79 79 79 79 79 79 79 79 79 78
79 78 79 79 79 79 79 79 79 79 76 76 76 76 76 76 76
system:memory-usage~daily15minute@Tue, 21 Oct 2014 15:30:00 UTC[61](96, 900): 70 70 70 70 70 71 71
72 73 73 73 74 74 75 74 74 74 74 74 74 74 74 72 72 71 71 71 70 70 70 70 70 69 69 68 69 68 68 67 66
66 67 66 65 64 66 66 66 66 66 66 66 66 66 66 66 66 66 66 65 65 66 66 66 66 66 65 64 63 62 62 62 62
62 63 62 63 64 65 66 67 69 72 74 75 76 76 77 77 77 77 78 78 78 79 78
system:memory-usage~daily@Tue, 21 Oct 2014 15:00:00 UTC[14](24, 3600): 70 70 72 74 74 74 71 70 69
68 67 65 66 66 66 65 66 63 62 63 67 74 77 78
system:memory-usage~weekly@Tue, 21 Oct 2014 12:00:00 UTC[09](28, 21600): 0 0 0 0 0 0 0 0 0 0 0 0 0
21 60 55 56 56 55 53 54 55 54 55 68 72 66 64
system:memory-usage~monthly@Tue, 21 Oct 2014 00:00:00 UTC[16](31, 86400): 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 5 57 54 62
system:memory-usage~yearly@Sun, 19 Oct 2014 00:00:00 UTC[44](52, 604800): 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 8
Identifying and investigating a past memory issue

• Correlate memory usage trends to trends found in the


Threshold Monitor statistics. Example:
system:memory-usage~daily@Tue, 21 Oct 2014 15:00:00 UTC[14](24, 3600): 70 70
72 74 74 74 71 70 69 68 67 65 66 66 66 65 66 63 62 63 67 74 77 78
2, SSL and Cryptography: 2915475456, 2923571063, 2963431150, 2999666824,
2990583125, 2951294020, 2793216955, 2659983633, 2433017173, 2283298542,
2088039628, 1958240529, 1905309832, 1818618265, 1810595976, 1804682854,
1752943001, 1280302557, 1158922376, 1299194675, 1788627217, 2681711684,
2884126446, 2936206677,

• In this example we can see how the drops and increases in


SSL’s memory usage match the memory usage percentage.
• Investigating SSL load, worker counts, connections, etc. is next step.
Bandwidth monitoring via the
management console
Bandwidth impact on ProxySG

• Client and server side bandwidth processing directly affects


the ProxySG’s resources; CPU and memory.
• Each ProxySG
ProxySG platform is sized to be able to process a certain
amount of bandwidth while maintaining an appropriate level
of resource usage.
• Increases in bandwidth where the amount is over what was sized for
the platform can cause high CPU and/or memory usage.
• Changes in the types of traffic
t raffic being processed while maintaining the
same bandwidth can also cause high CPU and/or memory usage.
• Understanding bandwidth processing is needed for normal
operations and problem investigations.
Bandwidth statistics in the management console UI

• Statistics->Traffic Details->Traffic Mix

• Total processed bandwidth for


client side, server side and
bypassed traffic.
• Multiple durations available
• By service name or proxy type
reporting
• Total bytes counted and
savings calculated
Bandwidth statistics in the management console UI

• Statistics->Traffic Details->Traffic History

• Processed bandwidth for client


side, server side and bypassed
traffic per service name or
proxy type
• Multiple durations available
• Separate graphs showing
client and server bandwidth
together or individuall
i ndividually
y
• Bytes counted and savings
calculated
Bandwidth statistics in the management console UI

• Both Traffic Mix and Traffic History use the same service names
and proxy types.
• Service name
name tracks bandwidth
bandwidth matching
matching IPs or
or ports. Within
that traffic different proxies can process the traffic.
• Explicit HTTP can contain HTTPS traffic within it
• Proxy reports give better visibility into types of traffic processed
Troubleshooting CPU and memory issues
Troubleshooting
Troubleshooting CPU and memory issues

• Suspect component identification using CPU monitor or


Threshold Monitor speeds up investigation time
ti me dramatically.
• Without these graph data from thet he management console UI or
statistical data from snapshots or heartbeats must be analyzed for
correlations.
• For memory issues understand memory usage for normal
operations versus a leak 
• If a reboot is planned to resolve either issue then a full core
should be dumped
• KB http://bluecoat.force
http://bluecoat.force.com/knowledgebase
.com/knowledgebase/articles/Solution/How-
/articles/Solution/How-
do-I-enable-a-full-core-d
do-I-enable- a-full-core-dump-on-the-ProxySG
ump-on-the-ProxySG
Copyright © 2016 Blue Coat Systems
Systems Inc. All Rights Reserved.
Reserved. 55
High CPU in TCPIP component

• CPU Monitor shows high CPU in TCPIP


• Check connection table CPU 0 99%
• Advanced URL: /TCP/connections TCPIP 90%
SSL and Cryptography 4%
• Time-wait entries (2MSL)
HTTP and FTP 3%
• Attacks Policy evaluation – HTTP 1%
• Check interface statistics Object Store 1%
Access Logging 1%
• Statistics->Network-Interface
Miscellaneous 1%
• Errors?
• Bypass data in transparent deployments
• Extremely high packets per second

Copyright © 2016 Blue Coat Systems


Systems Inc. All Rights Reserved.
Reserved. 56
TCP connections advanced URL

• TCP connection table


• Advanced URL: /TCP/connections
• Lists all incoming and outgoing
connections
• Can be very large
• Shows problematic clients that open
to many connections
• Connection states listed
• Large time_wait lists can
can consume
consume
CPU
• Half-opened connections
connections could be
a sign of an attack 
High CPU in HTTP or Policy components

• CPU Monitor shows high CPU in HTTP or Policy


CPU 0 99%
• Check active sessions
Policy evaluation – HTTP 64%
• Many connections from a single SSL and Cryptography 16%
client? HTTP and FTP 10%
• Suspicious destinations? TCPIP 10%
• User counts and connections Object Store 1%
Access Logging 1%
• Connection table Miscellaneous 1%
• TCP user counts in advanced
URL: /TCP/users

• View access log for suspicious activity live on the ProxySG


• Statistics->Access Logging,
Logging, Tail main access log
TCP users advanced URL

• User information tracked in the TCPIP stack 


• Advanced URL: /TCP/users
• Active users list shows list of IP addresses and number of connections
they have opened

• High connection counts on a single IP can be a sign of an issue


High CPU in SSL component

• CPU Monitor shows high CPU in SSL and Cryptography


CPU 0 99%
• SSL interception consumes
SSL and Cryptography 62%
CPU HTTP and FTP 20%
• Configurations lower CPU: Policy evaluation – HTTP 13%
• Disable DHE support TCPIP 7%
Object Store 1%
• Increase certificate timeout
Access Logging 1%
• Add splash text to policy Miscellaneous 1%

• KB Article:
http://bluecoat.force.com/knowledg
http://bluecoat.force.com/knowledgebase/articles/Solution/0
ebase/articles/Solution/000024136
00024136
• SSL interception on exception default mode in SGOS 6.2+
• Add splash text to SSL interception on exception policy rule
CPU issues caused by general load

• Proxy operations involves many components


• HTTP, SSL, TCPIP, Policy, DNS, Object Store
• High CPU issues divided
divi ded amongst these components
components typically
points to sizing issues (high bandwidth, high user counts)
CPU 0 99%
• Busiest: HTTP, SSL, Policy SSL and Cryptography
Cryptography 33%
HTTP and FTP 25%
Policy evaluation – HTTP 20%
• Middle: Object store, TCPIP TCPIP 15%
Object Store 7%
Access Logging 1%
• Lowest: DNS, Access Logging Miscellaneous 1%
CPU issues caused by general load

• Check the following when CPU pattern looks like general load:
• Did something in the environment change that triggered the CPU?
• More traffic moving from HTTP to HTTPS?
• Cloud services being adopted?
• Bandwidth being processed from Traffic
Traffic Mix graphs.
graphs. Is the proxy sizing
correct?
• User counts and connections, are the values expected?
• Statistics->System->Resources->Concurrent
Statistics->System->Resources->Concu rrent Users
• TCP Users
Users advanced
advanced URL: /TCP/users
• Active sessions, clients accessing data that should be controlled?
High memory in HTTP/TCP/SSL/ADN components

• Analyze load in Traffic Mix. Is the bandwidth too high?


• Examine connection counts, users, and worker counts
• Connection table: /TCP/connection
• User’s connections: /TCP/users
• Management console UI Statistics->Protocol details
• Each protocol has a worker or client count, are the values the same as the
baseline
• If load look good, verify dependency health
• Statistics->Health checks, response times for DNS, Auth, and ICAP
• Statistics->ICAP, queued connections? (sign of an ICAP server issue)
Blue Coat Customer Forums

• Community where you can learn from and share your valuable
knowledge and experience with other Blue Coat customers
• Research, post and reply to topics relevant to you at your own
convenience
• Blue Coat Moderator Team ready to offer guidance, answer
questions, and help get you on the right track
• Access at forums.bluecoat.com and register for an account
today!
Thank you for Joining Today!

• Please provide feedback on this webcast and suggestions for


future webcasts to:
 john.dyer@bluecoat.c
 john.dyer@bluecoat.com
om

•Webcast replay and slide deck found here within 48 hours:


https://bto.bluecoat.c
https://bto.bluecoat.com/training/custo
om/training/customer-support-tech
mer-support-technical-
nical-
webcasts
(Requiress BTO log-in)
(Require
Quick Survey

We are truly committed to continuous improvement for these


Technical Webcasts.
Webcasts. At the end of the event
event you will be re-
directed to a very short survey about satisfaction with this
Program. Please help us out by taking two minutes to complete
it. Thank you!

Questions for Rob?