You are on page 1of 3

5.

2 reset issues:
mainly hitting this --> "/var/log is not a mountpoint"
https://rubrik.atlassian.net/wiki/spaces/~peter.abromitis/pages/961970665/
More+Upgrade+Woes+and+cluster+operations+we+ve+seen+in+CDM+5.2+and+later...#[hardBr
eak]SDRESET-on-5.2-prior-to-5.2.0-p2-is-stuck:
To avoid it start with reboot:
1. reboot
sudo systemctl --force --force reboot
2. reconnect, check logs, create .rubrik_install_in_progress so you won't be
disconnected during sdreset , remove lock/cron file if exists and sdreset:
sudo touch /home/ubuntu/.rubrik_install_in_progress
tail /tmp/sdtests/reset_node.out.txt
sudo touch /home/ubuntu/.rubrik_install_in_progress
sudo ls /home/ubuntu/.rubrik_install_in_progress

sudo ls /var/lib/rubrik/sdreset_lock
sudo ls /etc/cron.d/sdreset_crontab
sudo rm -rf /home/ubuntu/.rubrik_install_in_progress
sudo rm /var/lib/rubrik/sdreset_lock
sudo /opt/rubrik/src/scripts/dev/sdreset.sh
rkcli cluster reset_node force

keep_broadcast_interface
skip_ipmi_network_reset

sudo vim /opt/rubrik/src/scripts/dev/sdreset_custom.sh


sudo chmod 777 /opt/rubrik/src/scripts/dev/sdreset_custom.sh;
cd /opt/rubrik/src/scripts/dev/;
sudo touch /home/ubuntu/.rubrik_install_in_progress;
sudo ./sdreset_custom.sh;
sudo rm -rf /home/ubuntu/.rubrik_install_in_progress
sudo rm /opt/rubrik/src/scripts/dev/sdreset_custom.sh

For in-place node replacement preserve data:


date; time sudo PRESERVE_HDD=1 /opt/rubrik/src/scripts/dev/sdreset.sh
to keep BROADCAST domain intact:
date; time sudo KEEP_BROADCAST_INTERFACE=1 /opt/rubrik/src/scripts/dev/sdreset.sh
After sdreset is successfully completed:
mv /etc/cron.d/sdreset_crontab /etc/cron.d/.sdreset_crontab
If sdreset hangs for couple of minutes on "Waiting for cluster config and node
monitor to be available"
ctrl+c
sudo reboot
rerun sdreset

sudo /sbin/vconfig add bond0 132

MANAGEMENT NETWORK:
---------------------------------------
IP Address:
Subnet Mask:
Gateway IP:

DATA NETWORK(OPTIONAL)
---------------------------------------
IP Address
Subnet Mask:
IPMI NETWORK

IPMI IP Address:
---------------------------------------
Subnet Mask:
Gateway IP:

VLAN CONFIG:
---------------------------------------
VLAN ID:
VLAN IP:

rkcl exec all 'sudo /opt/rubrik/src/scripts/node-monitor/hw_health.sh 2> /dev/null


| grep -A2 FRU '| paste - - -

rubrik_tool.py create_route_config 0.0.0.0 0.0.0.0 10.133.232.1 bond0

tail -f /var/log/node-monitor/current | grep -ie 'grace\|consec'

rktail -f /var/log/health-monitor/current | grep -ie 'grace\|consec'

tail -f /var/log/node-monitor/current | grep -ie 'validation'

scp -6 trusted-certificates.pem rksupport@\


[fe80::3eec:efff:fe4f:a003%bond0.2226\]:/opt/rubrik/conf/release_signing/

scp -6 rubrik-image-8.0.0-p2-21860.tar.gz* rksupport@\


[fe80:0:0:0:3eec:efff:fe4e:ec07%bond0\]:/home/rksupport/

scp rubrik-image-7.0.3-p2-16069.tar.gz* rksupport@10.70.0.104:/home/rksupport/

sudo service avahi-daemon start;avahi-browse -rat | grep -i -A1 rvm |grep -i -A2
ipv6 | grep -v _rubrik._tcp|grep -A2 ^=| grep -A2 bond0

scp trusted-certificates.pem
rksupport@10.41.174.11:/opt/rubrik/conf/release_signing/

scp -6 trusted-certificates.pem
rksupport@\[fe80::3eec:efff:fe20:e3ce%bond0\]:/opt/rubrik/conf/release_signing/

rkcl exec
RVM184S002313,RVM183S049171,RVM184S014090,RVM183S049338,RVMHM181S004249,RVMHM185S00
2700,RVMHM185S001993,RVMHM188S004896 "ifconfig bond0 | grep -i inet6"

scp -6 /opt/rubrik/conf/release_signing/trusted-certificates.pem
rksupport@[fe80:0:0:0:3eec:efff:fe3a:baf3%bond0]:/opt/rubrik/conf/release_signing/
scp -6 /opt/rubrik/conf/release_signing/trusted-certificates.pem
rksupport@[fe80::3eec:efff:fe20:e3ce%bond0]:/opt/rubrik/conf/release_signing/

sudo /opt/rubrik/src/scripts/debug/FailJobTool.sh -jobId


STAGE_CDM_SOFTWARE_GLOBAL_253dade3-3f55-40a0-b3dc-3a97791a3257 -instanceId 0
cqlsh -ksd -e "select * from job_instance where
job_id='TIER_EXISTING_SNAPSHOTS@PARALLELIZABLE_EXECUTE_TIER_EXISTING_SNAPSHOTS_TIER
_EXISTING_SNAPSHOTS_253eb2e2-8f74-4494-a293-b19381150eba_72f91df5-1504-4362-8968-
916438185005###0_363' and instance_id=0"

sqlite3 /var/lib/rubrik/node_monitor_check_history.db "select id,


datetime(timestamp / 1000, 'unixepoch', 'localtime'), check_name, success from
check_history" | sort | cut -d\| -f2-| column -s\| -t | sed 's/0$/Fail/g' | sed
's/1$/Pass/g' | egrep -i 'OtherNodesChecker'

You might also like