Procedure

1) Rack up the 4 nodes and connect the cabling to all the nodes.
2) We must have the cable connect to the port 3 OR 4 which is BOND0 and a mandatory cable for
the port 5 which is our IPMI port.
3) Once the cabling is done to the switch from all 4 nodes. We can power-on the nodes.
4) We will have to be on the login screen on each node.
5) There will be no communication between the nodes or the IPMI IP as they are still holding the
older IP configuration.
6) Connect to each node using a KVM.
7) Login to the node as “rksupport” user
8) We will be sharing the passwords for all the nodes when we start the procedure.
9) After logging in as rksupport, we need to run the below commands to setup the new IPMI IP
configuration, and this step must be done on each node.
• sudo ipmitool lan set 1 ipsrc static
• sudo ipmitool lan set 1 ipaddr <ip_address>
• sudo ipmitool lan set 1 netmask <ip_address>
• sudo ipmitool lan set 1 defgw ipaddr <ip_address>
10) Please keep the new set of IP’s for the IPMI configuration.
11) Once the IPMI IP is changed on all the 4 nodes, we should be able to login to the IPMI using the
browser.
FROM IPMI WE NEED TO FOLLOW THE BELOW PROCEDURE:
Step-by-step Guide
1. Stop services on all nodes
sdservice.sh "*" stop
2. On nodes with wrong IP, fix the bond configuration. Connect to each node through IPv6 link
local, and perform following steps.
2.1) Make a copy of /etc/network/interfaces.d/, just in case
2.2) Go to /etc/network/interfaces.d/ and remove all files except bond0.cfg

and bond1.cfg.
2.3) Update bond configuration.

if management and data are same
sudo /opt/rubrik/src/scripts/forge/configure_network.py -a -f -g <GATEWAY_IP> -mvl

<MANAGMENT_VLAN> -mip <MANAGEMENT_IP> -mn <MANAGEMENT_NETMASK>
or if management and data are split
sudo /opt/rubrik/src/scripts/forge/configure_network.py -a -f -g <GATEWAY_IP> -mvl

<MANAGMENT_VLAN> -mip <MANAGEMENT_IP> -mn <MANAGEMENT_NETMASK> -dip
<DATA_IP> -dn <DATA_NETMASK> -dvl <DATA_VLAN>
This step can also be done manually if preferred. In that case, refer to Section "Standard
form of network configuration file" below for expected configuration.
2.4) restart network and confirm now correct IPs are used. It's recommended to do so one
node at a time, and log in through IPv6 link local during the process.
sudo systemctl restart networking.service
Please try this 2-3 times if it does not go through. If network still fails to restart, do node
reboot.
3. Fix ansible host vars (If need to fix multiple nodes):
Pick a driving node

Manually update /var/lib/rubrik/ansible/host_vars to reflect the new DATA IPs
This step makes it possible to use rkcl to do following node specific steps from this driving
node
Note: these files are actively managed by node monitor based on information from meta
datastore. Keep services down to avoid manual edits to be overwritten.
4. On all nodes, update cockroach:
4.1 Stop cockroachdb service on node (sudo service cockroachdb stop)
4.2 Update listen_address in /etc/cassandra/cassandra.yaml to indicate new IP for

the node.
4.3 Update seeds in seeds_provider section. Randomly pick two new IP addresses as

seeds (NOTE: use same seeds for all nodes)
4.4 Take backup of old cockroachdb certificate (just in case). Then rm -f

/var/lib/rubrik/certs/cockroachdb/node.*
4.5 sudo /opt/rubrik/dist/gen_tls_cert_cockroachdb.pex --mode=node

--certs-dir='/var/lib/rubrik/certs/cockroachdb' (this will use the new listen_address listed
in cassandra.yaml)
4.6 sudo touch /var/lib/rubrik/flags/cockroach_certs_backed_up

4.7 Create /var/lib/cockroachdb/kronos/re_ip_host_mapping.json (format of each line is
OLD_IP : NEW_IP). Note, NEW_IP is what you want to see after re_ip recovery. The following
example assumes you are moving forward (manually continue the re_ip). If you are going
BACK (restoring cluster to use previous IPs) the mappings will be different.
{
"10.10.222.1": "10.20.26.1",
"10.10.222.2": "10.20.26.2",
"10.10.222.3": "10.20.26.3",
"10.10.222.4": "10.20.26.4"
}
4.8 sudo /opt/rubrik/src/scripts/cockroachdb/rkcockroach kronos cluster backup --data-

dir=/var/lib/cockroachdb/kronos
4.9 sudo touch /var/lib/rubrik/flags/kronos_metadata_backed_up
4.10 sudo /opt/rubrik/src/scripts/cockroachdb/rkcockroach kronos cluster re_ip --mapping-

file=/var/lib/cockroachdb/kronos/re_ip_host_mapping.json
--data-dir=/var/lib/cockroachdb/kronos
4.11 Start cockroachdb service on node (sudo service cockroachdb start)
If 4.10 or 4.11 fails, double check if the new IPs include IP(s) reused from removed nodes. If
that's the case, apply the recovery steps in that section.
Note: until nodes reboot, cockroachdb service will not be able to talk to each other across
nodes. This is OK and expected. Only the local cockroach service is needed for following steps
until reboot.
After nodes reboot (STEP 6 at the end), Node Monitor service will add necessary iptables rules
to allow cockroach (and all other services) communication between nodes.
4.12 Update IP config in cockroach node table
cqlsh -e "consistency local_quorum; update sd.node set data_ip_address='XXXX',

management_ip_address='XXXX' where node_id='XXXX' and cluster_id='cluster' "
It was observed in a few cases, iptables was blocking cockroach ports at this step and that
makes above command fail.
To overcome this, relax iptables rules as follows on each node.
sudo iptables -A IN-INTERNODE-WHITELIST -s x.x.x.x/32 -d x.x.x.x/32 -m comment --

comment "Manual" -j ACCEPT"
here -d option is the local node's data IP, -s is the data IP of other node (one rule per
each other node)
Or we can flush iptables all-together. After node reboot, all iptables rules will be
recreated. (Use this only as last resort. Try above first.)
sudo iptables -P INPUT ACCEPT

sudo iptables -P FORWARD ACCEPT
sudo iptables -P OUTPUT ACCEPT
sudo iptables -F
4.13 Check Cockroachdb and Kronos status:
rkcockroach node status --all
rkcockroach kronos status
5. Start services all all nodes, confirm services come up ok, and "rknodestatus" shows all nodes
in OK state.
sdservice.sh "*" start on all nodes
6. reboot all nodes and confirm everything still working

Procedure

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Procedure

Uploaded by

Copyright:

Available Formats

1) Rack up the 4 nodes and connect the cabling to all the nodes.

FROM IPMI WE NEED TO FOLLOW THE BELOW PROCEDURE:

sdservice.sh "*" stop

2.1) Make a copy of /etc/network/interfaces.d/, just in case

2.2) Go to /etc/network/interfaces.d/ and remove all files except bond0.cfg

2.3) Update bond configuration.

sudo /opt/rubrik/src/scripts/forge/configure_network.py -a -f -g <GATEWAY_IP> -mvl

or if management and data are split

sudo /opt/rubrik/src/scripts/forge/configure_network.py -a -f -g <GATEWAY_IP> -mvl

sudo systemctl restart networking.service

3. Fix ansible host vars (If need to fix multiple nodes):

Pick a driving node

4. On all nodes, update cockroach:

4.1 Stop cockroachdb service on node (sudo service cockroachdb stop)

4.2 Update listen_address in /etc/cassandra/cassandra.yaml to indicate new IP for

4.3 Update seeds in seeds_provider section. Randomly pick two new IP addresses as

4.4 Take backup of old cockroachdb certificate (just in case). Then rm -f

4.5 sudo /opt/rubrik/dist/gen_tls_cert_cockroachdb.pex --mode=node

4.6 sudo touch /var/lib/rubrik/flags/cockroach_certs_backed_up

4.8 sudo /opt/rubrik/src/scripts/cockroachdb/rkcockroach kronos cluster backup --data-

4.9 sudo touch /var/lib/rubrik/flags/kronos_metadata_backed_up

4.10 sudo /opt/rubrik/src/scripts/cockroachdb/rkcockroach kronos cluster re_ip --mapping-

4.11 Start cockroachdb service on node (sudo service cockroachdb start)

4.12 Update IP config in cockroach node table

cqlsh -e "consistency local_quorum; update sd.node set data_ip_address='XXXX',

To overcome this, relax iptables rules as follows on each node.

sudo iptables -A IN-INTERNODE-WHITELIST -s x.x.x.x/32 -d x.x.x.x/32 -m comment --

sudo iptables -P INPUT ACCEPT

4.13 Check Cockroachdb and Kronos status:

rkcockroach node status --all

rkcockroach kronos status

sdservice.sh "*" start on all nodes

6. reboot all nodes and confirm everything still working

You might also like