You are on page 1of 15

3/15/23, 3:51 PM Mail - Lamar Jones - Outlook

NFIT-Command Center-Incident Response > Closed_P4_APAC_Intermittent Svr


Issue_INC0447543: channel conversation from Microsoft Teams
Lamar Jones <lamarjones@hostgator.com>
Wed 3/15/2023 3:51 PM
To: Lamar Jones <lamarjones@hostgator.com>
Microsoft Teams

Peter Jose 13 hours ago


[01:00 pm] Shoma Ann Koshy
Brand Impact : APAC Retail and APAC-BHI
Server Impacted:  cs2002.webhostbox.net     md-in-27.webhostbox.net    
sh002.hostgator.in     sh004.hostgator.in     sh020.webhostingservices.com: APAC-BHI
sh111.webhostingservices.com: APAC-BHI sh205.hostgator.in    
Number of customers connect:  BH Volume Chats: 23 chats in queue Calls: 9 calls in queue
Retail Chats: 13 chats in queue Calls: 52 calls in queue
Issue Description : Cannot access website softcure.co.in mahabalitech.com
Edited

[01:02 pm] Peter Jose


Harish VR/Ravindrakumar Dhote ^

[01:03 pm] Peter Jose


Shoma Ann Koshy - Impact is : Customers cannot access the sites ?

[01:03 pm] Shoma Ann Koshy


Yes

[01:03 pm] Aravind Sridharan


Are these flapping or down?

[01:03 pm] Peter Jose


Copy, Will spin a P3 

[01:03 pm] Shoma Ann Koshy


Now active http alerts for below servers only:
md-ht-1.webhostbox.net    sh002.hostgator.in    sh012.webhostingservices.com   
sh019.webhostingservices.com    sh111.webhostingservices.com

[01:04 pm] Shoma Ann Koshy

https://outlook.office.com/mail/inbox/id/AAQkADU3ZGZjNDIxLWUwM2QtNGI2Zi04MDIwLThjMjA2MGI4MWI3MgAQAFxH2VDBRaRIiwl0mCgJWNU%3D 1/15
3/15/23, 3:51 PM Mail - Lamar Jones - Outlook

Services seems to be down as there is an alert for the same. But comes back up after
a while

[01:04 pm] Aravind Sridharan


Ok. We are aware of the issues with the server. SA team is checking on it. 

[01:05 pm] Shoma Ann Koshy


ok thanks Aravind Sridharan

Ravindrakumar Dhote 13 hours ago


Sam Varghese

Peter Jose 13 hours ago

[01:06 pm] Shoma Ann Koshy


Is there any particular reason identified? or ETA which we can provide
customer/

[01:06 pm] Gopalakrishna Sheni


Aravind Sridharan We have 49 chats in queue 

[01:06 pm] Shoma Ann Koshy


Sunil Kumar Ayush Kodal ^^

[01:06 pm] Gopalakrishna Sheni


Impact is huge 

[01:07 pm] Gopalakrishna Sheni


can we have an incident 

[01:07 pm] Ayush Kodal


20 chats and 10 calls in queue for BH.in

[01:07 pm] Aravind Sridharan


ok

[01:07 pm] Sivaranjani P


Aravind Sridharan impact is high on both the brands 

[01:08 pm] Aravind Sridharan


https://outlook.office.com/mail/inbox/id/AAQkADU3ZGZjNDIxLWUwM2QtNGI2Zi04MDIwLThjMjA2MGI4MWI3MgAQAFxH2VDBRaRIiwl0mCgJWNU%3D 2/15
3/15/23, 3:51 PM Mail - Lamar Jones - Outlook

Understood. We are getting an incident. 


(2 liked)

Ravindrakumar Dhote 13 hours ago


sam is working on sh002.hostgator.in

Peter Jose 13 hours ago


Ravindrakumar Dhote Sam Varghese - Do let us know if any teams needs to be
engaged here 

Ayush Kodal 13 hours ago


Peter Jose Can you please page Ajay singh to add a chat banner for BH.in

Peter Jose 12 hours ago


Ajay Singh/Indranil Ghosh

Sam Varghese 12 hours ago


hi

Sam Varghese 12 hours ago


I was checking the server..and  found that the fstab is missing.. 

Sam Varghese 12 hours ago


could not find backup

Sam Varghese 12 hours ago


so I am trying to recreate it manually.. 

Sam Varghese 12 hours ago


mounted / and /var and server is up now..  trying to figure out the correct home
partition to mount.

Sowmya Iyer 12 hours ago


Ayush Kodal Can you check if chat banner is up now?

Sam Varghese 12 hours ago


server is up now.. 

Sam Varghese 12 hours ago


fstab corrected.. 

Ayush Kodal 12 hours ago


I can see its up now, Thanks

Sam Varghese 12 hours ago


Ravindrakumar Dhote

https://outlook.office.com/mail/inbox/id/AAQkADU3ZGZjNDIxLWUwM2QtNGI2Zi04MDIwLThjMjA2MGI4MWI3MgAQAFxH2VDBRaRIiwl0mCgJWNU%3D 3/15
3/15/23, 3:51 PM Mail - Lamar Jones - Outlook

Sam Varghese 12 hours ago


can you check service.. 

Ravindrakumar Dhote 12 hours ago


checking sam

Sam Varghese 12 hours ago


I am fixing the backup thing.. 

Peter Jose 12 hours ago


FYI - we still have alert in graphana for :

hostname=sh002.hostgator.in

Sam Varghese 12 hours ago


i will clear in soon.. 

Ravindrakumar Dhote 12 hours ago


it will resolve soon

Ravindrakumar Dhote 12 hours ago


services are up and working

Sam Varghese 12 hours ago


thanks

Sam Varghese 12 hours ago


checking few things

Ravindrakumar Dhote 12 hours ago


Thank you Sam

Peter Jose 12 hours ago


So the fix was : Manually recreating the backup and mounting / and /var ?

Ravindrakumar Dhote 12 hours ago


created fstab entry which was missing and remount the partition.

Sam Varghese 12 hours ago


Summary: Initially me found that fstab was missing but server was coming up.   so
singled the server, and recreate basic  fstab  , booted the server back and , took fstab
from backup and  restored and rebooted the server

Peter Jose 12 hours ago


Alerts are cleared too, Do we have any pending actions here ?

https://outlook.office.com/mail/inbox/id/AAQkADU3ZGZjNDIxLWUwM2QtNGI2Zi04MDIwLThjMjA2MGI4MWI3MgAQAFxH2VDBRaRIiwl0mCgJWNU%3D 4/15
3/15/23, 3:51 PM Mail - Lamar Jones - Outlook

Sam Varghese 12 hours ago


still need some fix I am checking that.. but service are good now.. 

Sam Varghese 12 hours ago


I will update in the inc.. 

Peter Jose 12 hours ago


Copy, Let me de-escalate this to a P4 as you finish up the remedial actions

Sam Varghese 12 hours ago


sure thanks

Shoma Ann Koshy 12 hours ago

> Summary: Initially me found that fstab was missing but server was
coming up. so singled the server, and recreate basic fstab , booted the
server back and , took fstab from backup and restored and rebooted the
server

This was the fix for which server? 

Mrityunjay Trivedi 11 hours ago


Hi all, I am taking over as IM for this incident.

Mrityunjay Trivedi 11 hours ago


Sam Varghese is there any update/progress on the pending actions?

Ayush Kodal 10 hours ago


SH033.webhostingservices.com server is having a lot intermittent issues. Can
someone check if we have any active alert for that server?

Mrityunjay Trivedi 10 hours ago


Yup, it's alerting in Grafana as well. Sam Varghese can you please check.

Sivaranjani P 10 hours ago

 
https://outlook.office.com/mail/inbox/id/AAQkADU3ZGZjNDIxLWUwM2QtNGI2Zi04MDIwLThjMjA2MGI4MWI3MgAQAFxH2VDBRaRIiwl0mCgJWNU%3D 5/15
3/15/23, 3:51 PM Mail - Lamar Jones - Outlook

can see few other servers as well 

Ayush Kodal 10 hours ago


sh050.webhostingservices.com| rajukarnak.com | 162.214.80.130 
Cpanel and websites inaccessible 

Sahul Hameed 10 hours ago


Servers: sh050.webhostingservices.com
sh033.webhostingservices.com
SH017.webhostingservices.com
sh020.webhostingservices.com
 
The above servers also facing issues and are not able to access cpanel and website.
 
Can this be checked?

Mrityunjay Trivedi 10 hours ago


getting teams

Mrityunjay Trivedi 10 hours ago


 We are engaging teams. Meanwhile, may I know how many connects we have
received so far?

Mrityunjay Trivedi 10 hours ago


Sunny Thakur is checking on the issue.

Sunny Thakur 10 hours ago


Checking the servers 

sh050.webhostingservices.com
sh033.webhostingservices.com
SH017.webhostingservices.com
sh020.webhostingservices.com

Sunny Thakur 10 hours ago


Also can you share the connects we are currently facing 

Mrityunjay Trivedi 10 hours ago


Ayush Kodal, Sahul Hameed ^^

Ayush Kodal 10 hours ago


I have 15 connects tagged to SRQBHI-461

Ayush Kodal 9 hours ago


sh050.webhostingservices.com| rajukarnak.com | 162.214.80.130

Ayush Kodal 9 hours ago


sh021.webhostingservices.com| camera24.in | 162.214.80.61

https://outlook.office.com/mail/inbox/id/AAQkADU3ZGZjNDIxLWUwM2QtNGI2Zi04MDIwLThjMjA2MGI4MWI3MgAQAFxH2VDBRaRIiwl0mCgJWNU%3D 6/15
3/15/23, 3:51 PM Mail - Lamar Jones - Outlook

Mrityunjay Trivedi 9 hours ago


As we are getting connects, bumping it back to P3.

Mrityunjay Trivedi 9 hours ago


May I know, how we are progressing here?
Ayush Kodal please update us with the connect volumes.

Ayush Kodal 9 hours ago


0 chats 4 calls in queue. Volume is manageable as of now. 

Sunny Thakur 9 hours ago

sh050.webhostingservices.com
sh033.webhostingservices.com

Above servers are stable for now 

Mrityunjay Trivedi 9 hours ago


May we know the corrective action for these servers?

Sahul Hameed 9 hours ago


Sunny Thakur Still we are unable to access cpanel on sh050.webhostingservices.com
server

Sunny Thakur 9 hours ago


noted will check

Vysakh Nair 9 hours ago


sh008.hostgator.in|162.241.85.160|mysql server is Offine|connects:naukriuae.com,
naukriuae.com, hilknightly.com

Ashish Shetty 9 hours ago

Ashish Shetty 9 hours ago


We are getting connects for this server from last 45+ mins 

Mrityunjay Trivedi 9 hours ago


Sunny Thakur can we please check this
 
Brand Impact : APAC Retail
 

Server Impacted:
sh008.hostgator.in
 
https://outlook.office.com/mail/inbox/id/AAQkADU3ZGZjNDIxLWUwM2QtNGI2Zi04MDIwLThjMjA2MGI4MWI3MgAQAFxH2VDBRaRIiwl0mCgJWNU%3D 7/15
3/15/23, 3:51 PM Mail - Lamar Jones - Outlook

connects:naukriuae.com
naukriuae.com
hilknightly.com Issue:
 
Issue: mysql server is Offine getting below error
 

Retail
Chats: 13 chats in queue
 

Issue Description : mysql server is Offine getting below error


 
connects:naukriuae.com
naukriuae.com
hilknightly.com Issue:

Sunny Thakur 9 hours ago


Okay checking sh008.hostgator.in

Mrityunjay Trivedi 9 hours ago


 
Thanks

Sunny Thakur 9 hours ago


Sahul Hameed can you check cpanel sh050.webhostingservices.com server

Sunny Thakur 9 hours ago


let me know for which customer cpanel is not loading

Sahul Hameed 9 hours ago


sh050.webhostingservices.com server

Sahul Hameed 9 hours ago


https://i.bluehost.in/cgi/admin/user?type=domain&entry=adhyansh.in&x=0&y=0

Sahul Hameed 9 hours ago


Now its loading Sunny Thakur

Mrityunjay Trivedi 9 hours ago


so as of now, the only affected server is sh008.hostgator.in or are we seeing connects
for any other server too?

Mrityunjay Trivedi 9 hours ago


Ayush Kodal , Ashish Shetty ^^

Sunny Thakur 9 hours ago


mysql is up and running for sh008.hostgator.in

https://outlook.office.com/mail/inbox/id/AAQkADU3ZGZjNDIxLWUwM2QtNGI2Zi04MDIwLThjMjA2MGI4MWI3MgAQAFxH2VDBRaRIiwl0mCgJWNU%3D 8/15
3/15/23, 3:51 PM Mail - Lamar Jones - Outlook

Mrityunjay Trivedi 9 hours ago


Thanks Sunny

Sahul Hameed 9 hours ago


Sunny Thakur again the issue persists on sh050.webhostingservices.com server

Sahul Hameed 9 hours ago


Getting multiple connects for the same not able to access cpanel and website

Sahul Hameed 9 hours ago


Sivaranjani P

Sunny Thakur 9 hours ago


Yes there is SSL attack im working on it - sh050.webhostingservices.com server

Sivaranjani P 9 hours ago


did we engage the NEtops to mitigate the attack? 

Mrityunjay Trivedi 8 hours ago


Do we need Netops Sunny Thakur?

Sunny Thakur 8 hours ago


not needed as of now 

Mrityunjay Trivedi 8 hours ago


Noted, let me know in case we need to engage any other teams.

Mrityunjay Trivedi 8 hours ago


Can we get an update on connects volume?

Ashish Shetty 8 hours ago


We have 15 chats and 4 calls in queue for retail

Mrityunjay Trivedi 8 hours ago


And how many are related to ongoing issue?

Ashish Shetty 8 hours ago


we wont be able to tell that till they connect to us. 

Sunny Thakur 8 hours ago

sh050.webhostingservices.com server is fine atm

https://i.bluehost.in/cgi/admin/user?type=domain&entry=adhyansh.in&x=0&y=0 -
this is loading

https://outlook.office.com/mail/inbox/id/AAQkADU3ZGZjNDIxLWUwM2QtNGI2Zi04MDIwLThjMjA2MGI4MWI3MgAQAFxH2VDBRaRIiwl0mCgJWNU%3D 9/15
3/15/23, 3:51 PM Mail - Lamar Jones - Outlook

Sunny Thakur 8 hours ago


may i know any connects we have for this server

Ashish Shetty 8 hours ago


Ayush Kodal Kuldeep Singh can you check this ^

Akshay Rao 8 hours ago


currently, we do have 3 connects, but the services are going down intermittently.

Sivaranjani P 8 hours ago


Sunny Thakur why the services are going down intermittently? 

Ishani Pandya 8 hours ago


Can we add PSSH for visibility on this? 

Sunny Thakur 8 hours ago


Yes please 

Mrityunjay Trivedi 8 hours ago


Sure

Sunny Thakur 8 hours ago


Akshay Rao can you share the connects list 

Mrityunjay Trivedi 8 hours ago


Gabriel Pineda

Akshay Rao 8 hours ago


https://i.bluehost.in/cgi/admin/user/cpanel/star-knowledge.com
https://i.bluehost.in/cgi/admin/user/cpanel/oyr.jup.mybluehostin.me
https://i.bluehost.in/cgi/admin/user?type=domain&entry=techieindia.in&x=0&y=0

Sunny Thakur 8 hours ago


Mrityunjay Trivedi mysql issue for sh008.hostgator.in was different one

Ishani Pandya 8 hours ago


Hi Gabriel Pineda we got an open Problem on on APAC sites' performance issues.
Today, we've received higher connects than usual with multiple servers listed above.
Despite workarounds and reboots, it doesn't seem to be fully resolving. 

Ishani Pandya 8 hours ago


SA, do we have server names alerting right now? I'd like to know if Monarx has been
deployed on these servers and if we've done any other troubleshooting besides
restarts

Sunny Thakur 8 hours ago


Akshay Rao cpanel is loading fine for all below 
https://i.bluehost.in/cgi/admin/user/cpanel/star-knowledge.com
https://outlook.office.com/mail/inbox/id/AAQkADU3ZGZjNDIxLWUwM2QtNGI2Zi04MDIwLThjMjA2MGI4MWI3MgAQAFxH2VDBRaRIiwl0mCgJWNU%3D 10/15
3/15/23, 3:51 PM Mail - Lamar Jones - Outlook

https://i.bluehost.in/cgi/admin/user/cpanel/oyr.jup.mybluehostin.me
https://i.bluehost.in/cgi/admin/user?type=domain&entry=techieindia.in&x=0&y=0

Akshay Rao 8 hours ago


Yes, as stated above it's happening intermittently, few mins back cpanel was not able
to access.
This is causing issues as customers are facing it every now and then,
the above-stated accounts the customers are still on chat complaining the same.

Sunny Thakur 8 hours ago


Few some-time back and having attack , apache couldn't handle the request as there
was port 443 attack on server which is fixed now 
currently i do not see any issue while loading cpanel for above ones
Akshay Rao Sivaranjani P

Mrityunjay Trivedi 8 hours ago

 
FYI Ishani Pandya
bh-ht-17.webhostbox.net , sh011.webhostingservices.com, sh014.webhostingservices.com,
sh017.webhostingservices.com

Ishani Pandya 8 hours ago


And this has been intermittent for 5 hours?

Ishani Pandya 8 hours ago


Or are these completely new servers?

Mrityunjay Trivedi 8 hours ago


Some are new and some are not. But yes its quite intermittent in Grafana monitoring
tool as well.
 
Server Impacted:
cs2002.webhostbox.net md-in-27.webhostbox.net sh002.hostgator.in
sh004.hostgator.in sh020.webhostingservices.com: APAC-BHI
sh111.webhostingservices.com: APAC-BHI sh205.hostgator.in,
sh033.webhostingservices.com, SH017.webhostingservices.com,
sh020.webhostingservices.com,sh021.webhostingservices.com

Ishani Pandya 8 hours ago


Thanks. Was PSSH paged? It looks like whatever SA tried to mitigate the issue
temporarily isn’t working

https://outlook.office.com/mail/inbox/id/AAQkADU3ZGZjNDIxLWUwM2QtNGI2Zi04MDIwLThjMjA2MGI4MWI3MgAQAFxH2VDBRaRIiwl0mCgJWNU%3D 11/15
3/15/23, 3:51 PM Mail - Lamar Jones - Outlook

Mrityunjay Trivedi 7 hours ago


Yes, we haved paged and ack by Gabriel Pineda

Gabriel Pineda 7 hours ago


i'm looking at sh011 right now

Sunny Thakur 7 hours ago


Currently bh-ht-17 is alerting and i m checking sh014.webhostingservices.com mysql
was down

Sunny Thakur 6 hours ago


Forgot to update here , bh-ht-17 & sh014.webhostingservices.com - fixed
Do we have anything issues or customer connects ?

Mrityunjay Trivedi 6 hours ago


How things are looking now Ashish Shetty?

Gabriel Pineda 6 hours ago


so discussing various things with Sunny Thakur. Don't think there's a single reason
here we can point to. There are things we can probably work on. 
1) shXXX that are in provo appear to be in MDT instead of UTC. backups/upcp and
stats processing crons are running during peak traffic hours. on avg APAC servers
tend to have more resource usage per customer so we def do not want these  tasks
running during APAC daytime hours
2) performance.pl  appears to not be running on some of these boxes. Service
appears to be crashed and I think this related to the issue we are seeing here where
cPanel tries to change the MySQL password and fails. when performance.pl cannot
connect to MySQL it dies and dies in a way that systemd thinks is still running. So it
never gets restarted.

Ashish Shetty 6 hours ago


nothing in queue for retail Mrityunjay Trivedi 

Ashish Shetty 6 hours ago


Akshay Rao I see some calls are in queue can you check if we still have any server
connnects in BHI

Akshay Rao 6 hours ago


not at the moment

Mrityunjay Trivedi 6 hours ago


Are we good to downgrade this to P4 and keep it under monitoring for few hours?

Mrityunjay Trivedi 6 hours ago


If no offense, downgrading this to P4.

Ashish Shetty 6 hours ago


Sure 

https://outlook.office.com/mail/inbox/id/AAQkADU3ZGZjNDIxLWUwM2QtNGI2Zi04MDIwLThjMjA2MGI4MWI3MgAQAFxH2VDBRaRIiwl0mCgJWNU%3D 12/15
3/15/23, 3:51 PM Mail - Lamar Jones - Outlook

Akshay Rao 6 hours ago


Sure

Mrityunjay Trivedi 6 hours ago


Sunny Thakur may we know, at high level what we have done to fix the issue ?

Sunny Thakur 6 hours ago


For the ones where we had customer connects for cpanel /website not loading

There were high CPU consumption where we need to  restart apache and stop
ped sophos and restarted tanium to reduce high connections 
and CPU performance .
Blocked few IPs which were hitting continuously and blocked domain 
which was having SSL attack on one of the affected server - sh050.webhost
ingservices.com

Mrityunjay Trivedi 5 hours ago


Thanks. keeping it under monitoring.

Ashna Vahab 5 hours ago


sh007.hostgator.in|69.49.227.135| cpanel and website inaccessible
|connects:badgefree.com

Ashish Shetty 5 hours ago


Mrityunjay Trivedi ^^

Varsha Kumari 5 hours ago

Varsha Kumari 5 hours ago


Sunny Thakur can you check this server ?

Sunny Thakur 5 hours ago


yes there is alert for sh007.hostgator.in, will check and update

Gabriel Pineda 5 hours ago


is already responding again

Ashish Shetty 5 hours ago


Vinay Miranda Will be night MOD
EOS

https://outlook.office.com/mail/inbox/id/AAQkADU3ZGZjNDIxLWUwM2QtNGI2Zi04MDIwLThjMjA2MGI4MWI3MgAQAFxH2VDBRaRIiwl0mCgJWNU%3D 13/15
3/15/23, 3:51 PM Mail - Lamar Jones - Outlook

Sunny Thakur 5 hours ago


yes working  
sh007.hostgator.in|69.49.227.135| cpanel and website inaccessible
|connects:badgefree.com

Ashna Vahab 4 hours ago


sh008.hostgator.in|
162.241.85.160| cpanel and website inaccessible |connects:tannerspride.com,
firsefit.com

Gabriel Pineda 4 hours ago


this might be outside of the scope of what can be done in this incident. But who
would we need to check with if we want to possibly change the timezone for APAC
boxes so that they are in IST instead of combination of MDT for some and UTC for
others. Generally speaking is better if they are in a timezone that matches where the
majority of the traffic is coming from. Right now crons run on some servers during
basically peak traffic hours of the box since they are set to run based on night time for
the timezone of the server

Gabriel Pineda 4 hours ago


not saying this is root issue for our all problems but for APAC servers we are trying to
find anything we can do to make them more stable 

Ishani Pandya 4 hours ago


Hi Gabe, we have a Problem meeting this afternoon. I think we can assign this out to
someone

Ishani Pandya 4 hours ago


It wouldn't be addressed in this incident unfortunately 

Gabriel Pineda 4 hours ago


ty. It would be good if we can raise that question there. 

Lamar Jones 3 hours ago


Hi teams I will be taking over here as Incident Manager. Please let us know if
anything is needed from our end, or when we have new information to share from
the upcoming problem meeting.

Lamar Jones 29 minutes ago


Hi teams, is there any new information to be shared from the Problem meeting? Are
we good to call this closed?

Kristal Cerutti-Harden 29 minutes ago


teams have taken the suggested items in the Problem Meeting

Lamar Jones 24 minutes ago


Sounds good, we will be closing this incident shortly.

Lamar Jones just now


Summary: On 2023-03-15 at 02:08:43 we received reports that 7 servers were giving
HTTP alerts, which caused degraded web. services for customers. The resolver team
identified that the filesystem table was missing, and they restored it from the backup
https://outlook.office.com/mail/inbox/id/AAQkADU3ZGZjNDIxLWUwM2QtNGI2Zi04MDIwLThjMjA2MGI4MWI3MgAQAFxH2VDBRaRIiwl0mCgJWNU%3D 14/15
3/15/23, 3:51 PM Mail - Lamar Jones - Outlook

and rebooted the server to resolve the issue. We downgraded to a P4 to monitor


and as we were in a monitoring state we received new connects and escalated it back
to a P3 status. Resolver teams were reengaged and identified there was a high influx
of traffic coming from a particular source. Teams blocked these IP addresses and
restarted services to resolve the issue. Teams then associated this incident with
PRB0042243 to complete remedial actions. 

Go to Teams >

https://outlook.office.com/mail/inbox/id/AAQkADU3ZGZjNDIxLWUwM2QtNGI2Zi04MDIwLThjMjA2MGI4MWI3MgAQAFxH2VDBRaRIiwl0mCgJWNU%3D 15/15

You might also like