Elasticsearch is a complex piece of software by itself, but complexity is further increased when you
00:12.280 --> 00:14.410
spin up multiple instances to form a cluster.
00:15.280 --> 00:18.040
This complexity comes with the risk of things going wrong.
00:18.730 --> 00:22.660
In this lecture, we're going to explore some common issues that you're likely to encounter on your
00:22.660 --> 00:23.770
Elasticsearch journey.
00:24.550 --> 00:27.550
There are plenty more potential issues that we can squeeze into this lesson.
00:27.550 --> 00:33.820
So let's focus on the most prevalent ones, mainly related to a node setup, a cluster formation and
00:33.820 --> 00:34.690
the cluster state.
00:38.440 --> 00:44.290
The potential Elasticsearch issues can be categorized according to the Elasticsearch lifecycle node
00:44.290 --> 00:44.740
setup.
00:45.790 --> 00:49.060
Potential issues include the installation and initial startup.
00:49.570 --> 00:53.710
The issues can differ significantly depending on how you run your cluster, like whether it's a local
00:53.710 --> 00:56.920
installation running on containers or a cloud service, etc..
00:57.880 --> 01:02.860
In this lesson, we'll follow the process of a local setup and focus specifically on bootstrap checks,
01:02.860 --> 01:04.960
which are very important when starting a node up.
01:06.130 --> 01:07.750
Discovery and cluster formation.
01:08.650 --> 01:13.060
This category covers issues related to the discovery process when the nodes need to communicate with 01:13.060 --> 01:15.220 each other to establish a cluster relationship.
01:16.000 --> 01:21.010
This may involve problems during the initial bootstrapping of the cluster nodes, not joining the cluster
01:21.010 --> 01:22.750
and problems with master elections.
01:24.090 --> 01:25.650
Indexing data and sharding.
01:26.400 --> 01:29.040
This includes issues related to index settings and mapping.
01:29.220 --> 01:33.540
But as this is covered in other lectures, we'll just touch upon how sharding issues are reflected in
01:33.540 --> 01:34.350
the cluster state.
01:35.580 --> 01:41.070
Searching search being the ultimate step of the set up journey can raise issues related to queries that
01:41.070 --> 01:44.460
return less relevant results or issues related to search performance.
01:44.910 --> 01:47.160
This topic is covered in another lecture in this course.
01:50.550 --> 01:54.990
Now that we have some initial background of potential issues with Elasticsearch, let's go one by one.
01:54.990 --> 01:59.550
Using a practical approach will expose the pitfalls and show how to overcome them.
02:01.180 --> 02:06.280
So before we start messing up our cluster to simulate real world issues, let's back up our existing
02:06.280 --> 02:06.850
indices.
02:07.210 --> 02:08.320
This will have two benefits.
02:08.470 --> 02:12.580
After we're done, we can get back to where we ended up and just continue on in the course and we'll
02:12.580 --> 02:16.510
better understand the importance of backing up to prevent data loss while troubleshooting. 02:17.020 --> 02:18.700 First, we need to set up our repository.
02:18.850 --> 02:23.350
So let's open up our Elasticsearch YAML file using your favorite editor.
02:23.800 --> 02:24.270
I like that.
02:24.280 --> 02:25.780
No, let's see.
02:26.080 --> 02:27.040
Elasticsearch.
02:28.150 --> 02:29.260
Elasticsearch.
02:29.710 --> 02:30.310
Why yaml?
02:35.870 --> 02:39.290
And we want to make sure we have a registered repository path on our machine.
02:39.300 --> 02:41.270
So we're looking for the path dot repo.
02:42.650 --> 02:43.630
Don't think there's one in here.
02:45.990 --> 02:47.190
Let's go ahead and add one then.
02:52.510 --> 02:53.440
Path got repo.
02:55.430 --> 02:56.090
Square bracket.
02:56.420 --> 03:00.590
Home student backups that should do the job.
03:01.550 --> 03:01.910
All right.
03:02.210 --> 03:04.310
Control o, enter control x.
03:04.430 --> 03:05.180
So that saved.
03:06.290 --> 03:10.340
And we might want to save a copy of this config file now as well so we can get back to it at the end
03:10.340 --> 03:10.880
of the lesson.
03:10.880 --> 03:20.090
So let's make a copy, we'll say pseudo c.p, that's the Elasticsearch Elasticsearch dot y html and
03:20.090 --> 03:25.460
let's just copy that into our home directory and that way we can just copy that back when we're done
03:25.460 --> 03:27.440
if we need to restore any of those settings later on.
03:28.370 --> 03:28.840
Okay.
03:28.850 --> 03:33.020
So we need to make sure that the directory exists that we're going to be storing that repository into
03:33.050 --> 03:34.780
and that Elasticsearch can write into it.
03:34.790 --> 03:39.320
So let's say maker dash p home student backups.
03:42.110 --> 03:48.020
And will change the group on that to elasticsearch like so sudo changed group Elasticsearch.
03:48.770 --> 03:49.970
Home Student Backups.
03:52.940 --> 03:54.620
And finally make it rideable.
03:55.040 --> 03:59.420
Pseudo change mod g plus w home student backups.
04:02.580 --> 04:06.170
And we need to restart Elasticsearch to pick up that configuration change we made.
04:06.180 --> 04:13.530
So let's say sudo in system control stop Elasticsearch start service.
04:19.250 --> 04:20.330
And we'll restart it.
04:24.650 --> 04:24.980
Okay.
04:24.980 --> 04:30.590
So now we can register the new repository to Elasticsearch at the path we configured with Curl request.
Quote, Curly type will be filesystem and the settings will have a location of home students backups,
05:00.140 --> 05:05.570
backup dash repo, close everything out and looks like a took.
05:06.080 --> 05:10.430
Now we can initiate the snapshot process to do the backup with kernel request.
05:10.700 --> 05:14.540
Put localhost 200 underscore snapshot.
05:17.080 --> 05:21.780
Back up dash repo and we'll call it snapshot dash one.
05:24.440 --> 05:25.460
So it looks like that worked.
05:25.610 --> 05:29.910
We can check the status of that with a simple get request with kernel request.
05:30.620 --> 05:33.980
Get local host native snapshot.
05:35.060 --> 05:35.900
Back up repo.
05:37.190 --> 05:38.030
Snapshot one.
05:39.800 --> 05:40.670
And we'll make a pretty.
05:43.640 --> 05:44.420
Looks like it worked.
05:44.510 --> 05:48.830
It says the state was to success.
05:49.070 --> 05:49.580
All right, cool.
05:50.090 --> 05:50.600
Very good.
05:50.630 --> 05:54.230
Now that we have our data backed up, we can now proceed to nuke our cluster.
05:54.770 --> 05:55.730
So let's get started.
05:56.360 --> 05:58.130
Well, let's recap on the basics about logs. 05:58.400 --> 06:00.710 So we'll start by looking at the Elasticsearch logs.
06:01.160 --> 06:04.470
Their location will depend on the path that logs setting in your Elasticsearch.
06:04.670 --> 06:05.240
Why yaml?
06:05.540 --> 06:10.640
By default they are found in var log Elasticsearch slash whatever your cluster name is, start log.
06:11.390 --> 06:15.680
So basic tailing commands come in handy to monitor the logs in real time.
06:15.680 --> 06:17.870
And so say want to keep an eye on these logs off to the side?
06:18.230 --> 06:23.030
I'm actually going to start a different terminal window here, so let's go ahead and start a new Telnet
06:23.030 --> 06:23.690
client here.
06:31.940 --> 06:34.470
It would help if I typed in my password correctly.
06:34.490 --> 06:34.940
There we go.
06:36.320 --> 06:36.710
All right.
06:36.950 --> 06:38.870
And let's see where those logs live.
06:39.140 --> 06:41.690
Those are going to be in var log Elasticsearch.
06:44.250 --> 06:47.160
So our account has insufficient rates to actually read these logs.
06:47.250 --> 06:49.260
Now there are various options to solve this.
06:49.410 --> 06:54.450
For example, a valid group assignment of your Linux user or one generally simpler approach is to provide
06:54.450 --> 06:57.570
the user sudo permission to run Shell as the Elasticsearch user.
06:58.200 --> 07:02.130
We can do this by editing the pseudo file using the pseudo under route.
07:02.280 --> 07:04.290
So let's just say pseudo vs pseudo.
07:11.710 --> 07:13.420
And we will add the following line.
07:16.410 --> 07:17.460
The center to the bottom here.
07:19.170 --> 07:21.060
How about username?
07:21.840 --> 07:24.030
All equals parentheses.
07:24.090 --> 07:26.270
Elasticsearch parentheses.
07:27.120 --> 07:28.500
No password.
07:29.640 --> 07:31.710
All that should do it.
07:31.890 --> 07:32.820
So control O.o.
07:33.330 --> 07:34.110
Control X.
07:35.530 --> 07:39.250
So after we've done that, we can run the following command to launch a new shell as the Elasticsearch
07:39.250 --> 07:42.430
user sudo dash S2 Elasticsearch.
07:44.440 --> 07:44.810
Cool.
07:45.220 --> 07:46.360
So now we should have the permissions.
07:46.360 --> 07:48.970
We need to actually look at these logs, so let's try that again.
07:49.780 --> 07:50.740
CD bar.
07:50.770 --> 07:52.000
Log Elasticsearch.
07:53.080 --> 07:53.750
That's better.
07:54.250 --> 07:59.080
And now we can do things like tailed ash and want to look at the last 100 lines in this log file.
07:59.560 --> 08:03.790
And our cluster name is actually Elasticsearch start log because we haven't changed it.
08:04.420 --> 08:05.110
And there you have it.
08:05.770 --> 08:08.260
Or sometimes you just want to look for error messages, right?
08:08.260 --> 08:15.550
So for example, we could look at the last 500 log lines and pipe that into grep for error and that
08:15.550 --> 08:16.670
would just show us any errors.
08:17.140 --> 08:20.170
Fortunately, we don't have any because our cluster is healthy, so that's cool.
08:20.680 --> 08:25.690
And sometimes it can also be useful to grab a few surrounding log lines with the context parameter because
08:25.690 --> 08:28.990
the messages and struct stack traces can be multi-line sometimes.
08:28.990 --> 08:34.870
So we could say, for example, cat Elasticsearch dot log grep bootstrap.
08:36.260 --> 08:36.560
Dash.
08:36.560 --> 08:40.610
Dash context equals three to get the three surrounding lines for each hit there.
08:41.390 --> 08:45.530
So for example, here we have a bootstrap hit and the three lines before and after it as well.
08:46.040 --> 08:48.350
So those are some useful tricks for looking at the logs.
08:48.710 --> 08:49.220
All right.
08:49.220 --> 08:51.440
So let's start talking about bootstrap checks.
08:51.440 --> 08:53.150
We'll go back to our primary terminal here.
08:53.900 --> 08:58.580
Bootstrap checks are pre-flight validations performed during a node start, which ensure that your node
08:58.580 --> 09:00.320
can reasonably perform its functions.
09:00.830 --> 09:03.950
There are two modes which determine the execution of bootstrap checks.
09:04.580 --> 09:10.190
Development mode is when you bind your node only to a loopback address localhost or with an explicit
09:10.190 --> 09:12.860
discovery type of single dash node.
09:13.400 --> 09:18.050
No bootstrap checks are performed in development mode, and then in production mode is when you bind
09:18.050 --> 09:24.440
your note to a non loopback address like 0.0.0.0, thus making it reachable by other nodes.
09:24.800 --> 09:26.960
This is the mode where bootstrap checks are executed.
09:27.620 --> 09:31.520
Let's see them in action because when the checks don't pass, it can become tedious work to find out
09:31.520 --> 09:32.300
what's going on.
09:33.680 --> 09:38.450
So one of the first system settings recommended by elastic is to disable heap swapping.
09:39.110 --> 09:43.670
This makes sense because Elasticsearch is highly memory intensive and you don't want to load your memory
09:43.670 --> 09:44.510
data from disk.
09:45.110 --> 09:46.370
There are two options for this.
09:46.670 --> 09:49.940
One is to remove swap files entirely or minimize sloppiness.
09:50.540 --> 09:54.080
This is the preferred option but requires considerable intervention as the root user.
09:54.650 --> 09:59.150
Or we can add the bootstrapped on memory lock parameter in the last search dot, y'know.
09:59.570 --> 10:01.080
So let's try that second option.
10:01.160 --> 10:09.470
Let's go ahead and open our main configuration file with pseudo nano and see Elasticsearch. 10:09.650 --> 10:11.930 Elasticsearch dot y IMO.
10:13.780 --> 10:16.330
And we'll go ahead and find the bootstrap drop memory lock setting.
10:19.490 --> 10:21.740
And uncomment that to allow it to be true.
10:23.840 --> 10:26.960
Write that out and quit and let's go ahead and restart our service.
10:27.200 --> 10:30.830
So studio system control stop Elasticsearch start service.
10:32.880 --> 10:33.930
And let's restart it.
10:37.760 --> 10:42.200
And after a short wait, we should see some indication of what's happening.
10:44.650 --> 10:45.070
All right.
10:45.070 --> 10:47.800
So, yeah, we actually got an error as a result of doing that.
10:47.800 --> 10:50.110
So let's check our logs and find out what happened.
10:50.650 --> 10:54.010
So let's go spelunking through here and see what went wrong.
10:54.580 --> 10:57.970
Just got to hit the up arrow here to do a fresh tail of my log.
11:00.280 --> 11:01.360
And there we have it.
11:01.360 --> 11:06.100
So there's our error and it says bootstrap checks failed memory locking requested for Elasticsearch
11:06.100 --> 11:06.520
process.
11:06.520 --> 11:07.930
But memory is not locked.
11:08.560 --> 11:10.160
But didn't we just lock it before?
11:10.840 --> 11:11.650
Well, not really.
11:11.650 --> 11:15.370
We just requested the lock, but it didn't actually get locked, so we hit the memory lock. 11:15.370 --> 11:16.360 Bootstrap check here.
11:17.140 --> 11:21.490
Now, the easy way to fix this in our case is to allow locking and overwrite into our system to a unit
11:21.490 --> 11:22.480
file like this.
11:22.810 --> 11:24.580
So let's go back to our other window here.
11:25.270 --> 11:27.430
Sudo system control.
11:28.030 --> 11:30.670
Edit Elasticsearch thought service.
11:32.980 --> 11:36.880
And we're going to put in the following config parameter here service.
11:39.640 --> 11:43.120
Limit mem lock equals infinity.
11:47.640 --> 11:48.120
All right.
11:48.270 --> 11:49.980
And let's try spinning that up again.
11:57.460 --> 11:59.170
And this time it should be okay.
12:03.170 --> 12:03.500
All right.
12:03.500 --> 12:04.400
Looks like success.
12:05.930 --> 12:06.380
Okay.
12:06.650 --> 12:08.390
So let's talk about heap settings next.
12:08.780 --> 12:13.310
Now, if you start playing with the JVM settings in the JVM dot options file, which you will likely
12:13.310 --> 12:17.450
need to do because by default these settings are set to low for actual production usage.
12:17.960 --> 12:20.480
You may face a similar problem as as we just did.
12:21.320 --> 12:21.980
So how is that? 12:22.310 --> 12:27.170 Well, by setting the initial heap size lower than the max size, which is actually quite usual in the
12:27.170 --> 12:27.890
world of Java.
12:28.490 --> 12:32.240
Let's open up that option file and lower the initial heap size to see what's going to happen.
12:32.780 --> 12:38.540
So sudo nano etsi elasticsearch JVM dot options.
12:38.840 --> 12:40.430
VM dot options.
12:43.740 --> 12:46.980
And let's go ahead and change these memory settings here.
12:49.180 --> 12:50.500
Then a comment, not the original one.
12:50.500 --> 12:59.380
So I can go back to them later and we'll set some new ones X and S 500 megabytes and slash x and x one
12:59.380 --> 12:59.890
gigabyte.
13:01.540 --> 13:01.810
All right.
13:01.810 --> 13:03.340
So we've lowered the initial heap size.
13:03.700 --> 13:05.110
Let's go ahead and save this setting.
13:07.470 --> 13:09.150
And we'll restart our service again.
13:11.920 --> 13:12.420
Stop it.
13:13.790 --> 13:14.630
And I'll start it.
13:17.170 --> 13:18.970
And we'll see what happens as it spins up.
13:23.220 --> 13:24.360
Well, looks like we had an error.
13:24.370 --> 13:26.800
So let's go back to our logs and see what's going on.
13:26.850 --> 13:30.750
So back to the other window and I'll hit the up arrow just to tail the last one lines again.
13:32.440 --> 13:32.920
All right.
13:32.930 --> 13:33.760
Well, there we have it.
13:34.060 --> 13:34.960
Error bootstrap.
13:34.960 --> 13:35.560
No validation.
13:35.560 --> 13:37.060
Exception bootstrap checks failed.
13:37.060 --> 13:39.820
Initial heap size not equal to maximum heap size.
13:40.330 --> 13:42.910
So that's telling us pretty explicitly what the problem was there.
13:43.690 --> 13:47.800
Now, generally speaking, this problem is also related to memory logging, where they need to increase
13:47.800 --> 13:51.400
the heap size during program operations may have undesired consequences.
13:52.120 --> 13:56.770
So remember to set those numbers to equal values and for the actual values, follow the recommendations
13:56.770 --> 14:01.960
by elastic, which in short is lower than 32 gigabytes and up to half of the available RAM memory.
14:02.080 --> 14:04.150
Let's go ahead and change those back before we forget.
14:08.670 --> 14:09.180
Yeah.
14:09.510 --> 14:15.270
So we'll set that back to one gig for both and I'll just use Control K to get rid of those lines and
14:15.270 --> 14:18.600
control o to save and control x, so we should be back in option.
14:19.110 --> 14:20.970
Let's try starting it up again, just to be sure.
14:25.460 --> 14:25.790
All right.
14:25.790 --> 14:27.110
That time has started successfully.
14:28.190 --> 14:31.370
So let's talk about some other system checks you may want to perform when things go wrong.
14:31.610 --> 14:36.290
There are many other bootstrap checks on the runtime platform and its settings, including a file descriptors
14:36.290 --> 14:41.750
check a maximum number of threads, check a maximum size, virtual memory check and many others.
14:42.410 --> 14:46.280
You should definitely browse through their descriptions in the docs, but as we're running the official
14:46.280 --> 14:51.170
Debian distribution that comes with a predefined system D unit file, most of these issues are resolved
14:51.170 --> 14:52.850
for us in the unit file, among others.
14:53.270 --> 14:56.630
We can check that unit file to see the individual parameters that get configured.
14:56.900 --> 14:58.760
Let's take a look at that unit file to see what's in it.
14:59.040 --> 15:02.630
We can say sudo cat user lib system.
15:02.630 --> 15:05.990
DX System, elasticsearch dot service.
15:07.900 --> 15:08.200
All right.
15:09.010 --> 15:12.250
So just take a look at the different things that you have at your disposal here.
15:12.400 --> 15:16.390
All sorts of things that could go wrong, but by default, they should be okay in our installation.
15:18.160 --> 15:22.480
So just remember that if you run the Elasticsearch binary on your own, you will need to take care of
15:22.480 --> 15:23.410
these settings as well.
15:24.730 --> 15:28.270
Now, the last check we'll run is the one that will carry us nicely to the next section of the lesson
15:28.270 --> 15:29.270
dealing with clustering.
15:29.290 --> 15:33.790
But before we dive in, let's see what are the configuration parameters that Elasticsearch checks during
15:33.790 --> 15:36.550
its startup with a discovery configuration check?
15:37.270 --> 15:41.740
There are three key parameters which govern the cluster formation and discovery process.
15:41.980 --> 15:43.870
Let's pull up our wine ML file to take a look.
15:44.710 --> 15:51.080
Pseudo nano etsy elasticsearch elasticsearch dot y IMO.
15:52.690 --> 15:52.960
All right.
15:52.960 --> 15:57.550
So one is discovery dot seed hosts should be down here.
16:00.680 --> 16:00.930
Yep.
16:01.610 --> 16:06.200
Now, this is a list of ideally all the master eligible nodes in the cluster that we want to join and
16:06.200 --> 16:07.820
draw the last cluster state from.
16:08.330 --> 16:12.850
Now there's also a discovery dot seed underscore provider setting that you could set here as well,
16:12.860 --> 16:16.760
and that would allow you to provide the seed hosts lists in the form of a file that gets reloaded on
16:16.760 --> 16:20.840
any change instead of specifying it within the configuration file itself.
16:21.500 --> 16:24.530
Also, let's look at the cluster dot initial master node setting here.
16:25.070 --> 16:29.540
This is a list of the node names, not hostnames for the very first master elections.
16:30.230 --> 16:34.190
So before all of these join and vote, the cluster setup won't be completed.
16:35.330 --> 16:39.470
But what if you don't want to form any cluster, but rather just want to run in a small single node 16:39.470 --> 16:39.830 setup?
16:40.070 --> 16:43.130
Well, you might think you could just eliminate these settings and the y small file.
16:44.150 --> 16:44.470
Right.
16:45.200 --> 16:46.040
But no, that won't work.
16:46.130 --> 16:50.180
After starting up, you would hit another bootstrap error, since at least one of those parameters needs
16:50.180 --> 16:52.490
to be set to pass a bootstrap check.
16:52.970 --> 16:56.390
So we're going to go ahead and put those back because you can't actually get away with that.
16:56.930 --> 17:00.680
So let's see why this is and dive deeper into troubleshooting the discovery process.
17:01.310 --> 17:02.540
First, I'll exit out of here.
17:04.520 --> 17:06.560
And let's shut down our cluster before we forget.
17:12.900 --> 17:14.040
Just stop the service.
17:15.030 --> 17:15.420
All right.
17:16.470 --> 17:20.850
So after we've successfully passed the bootstrap checks and started up our node for the first time,
17:20.850 --> 17:23.910
the next phase in its lifecycle is the discovery process.
17:24.540 --> 17:28.440
Now, to simulate the formation of a brand new cluster, we're going to need a clean node.
17:28.710 --> 17:33.420
So we need to remove all the data of the node and thus lose all previous cluster state information.
17:33.450 --> 17:35.580
That's why we backed everything up to a snapshot earlier.
17:36.120 --> 17:39.390
Now, remember, this is really just to experiment in a real production setup.
17:39.690 --> 17:42.090
There would be very few reasons to do this.
17:42.480 --> 17:45.870
I'm going to go to this other window here where I'm logged in and see Elasticsearch user because I'm
17:46.290 --> 17:48.090
going to need its permissions to do this stuff.
17:48.750 --> 17:52.500
Armed RF var lib elasticsearch.
17:53.980 --> 17:54.750
Last star.
17:56.100 --> 17:56.430
All right.
17:56.430 --> 17:58.320
We blew away our entire node there.
17:58.950 --> 18:04.020
So now let's imagine a situation where we already had a cluster and we just want the node to join in.
18:04.650 --> 18:10.110
So we need to make sure the cluster name is correct and linked to some seed host either by IP or hostname
18:10.110 --> 18:10.530
and port.
18:11.400 --> 18:14.160
So let's go ahead and open up our y am file.
18:15.330 --> 18:17.520
We use vim because that's what's installed under this account.
18:18.030 --> 18:22.380
That's the Elasticsearch Elasticsearch dot waymo.
18:24.600 --> 18:26.910
All right, so we need to make sure that we have a cluster name.
18:28.860 --> 18:29.070
Hit.
18:29.070 --> 18:31.140
I'd go on to insert mode and now I can edit it.
18:32.740 --> 18:36.370
Will change my application to lecture cluster.
18:36.730 --> 18:37.720
It would help if I typed it right. 18:40.190 --> 18:42.380 And we need to set our Discovery seat hosts.
18:45.460 --> 18:46.010
Do.
18:52.170 --> 19:00.240
There they are and we'll change that to 127.00.1 Colin 9301 Now this is just a demonstration, so we're
19:00.240 --> 19:01.320
using a loopback address.
19:01.350 --> 19:06.420
Normally you put an hostname or an IP here and the actual transport port of one or more of your nodes
19:06.420 --> 19:07.080
in the cluster.
19:08.820 --> 19:12.360
And just to force the failure that we're interested in, I'm going to comment out this line for the
19:12.360 --> 19:15.600
initial master nodes, and that way it's not going to be able to reach the master.
19:15.630 --> 19:17.310
We'll see what happens when we hit that failure.
19:18.240 --> 19:21.300
Let's go ahead and hit escape colon WQ.
19:21.630 --> 19:23.280
Exclamation point to right and quit.
19:24.090 --> 19:25.590
And now let's start up our service.
19:35.260 --> 19:35.560
All right.
19:35.560 --> 19:36.870
It looks like it started successfully.
19:36.880 --> 19:39.850
Let's check our route, End Point, to see if it really is running.
19:39.850 --> 19:40.300
Okay.
19:40.450 --> 19:43.120
Curl Local host, coordinated 100.
19:46.550 --> 19:47.330
All right.
19:47.600 --> 19:54.110
So we did get a nice response with various details here, but something is missing the cluster UUID.
19:55.010 --> 19:57.200
This means that our cluster is not actually formed.
19:57.360 --> 20:02.090
And we can confirm this by checking the cluster state with the cluster health API.
20:02.670 --> 20:03.590
Let's say curl.
20:04.340 --> 20:07.070
Local host coordinates 200 slash underscore.
20:07.070 --> 20:08.330
Cluster slash health.
20:12.240 --> 20:15.090
And after about 30 seconds of waiting will get an exception.
20:19.230 --> 20:20.340
Indeed we did, master.
20:20.340 --> 20:21.450
Not discovered exception.
20:21.870 --> 20:26.760
All right, let's Taylor logs and see that the note didn't discover any master and will continue the
20:26.760 --> 20:27.740
discovery process.
20:27.750 --> 20:30.780
So let's check our logs and see what happened.
20:31.560 --> 20:35.850
Let's look at the past 500 lines here and those that's lecture cluster.
20:35.850 --> 20:37.920
Don't log this time because we changed the cluster name.
20:40.220 --> 20:42.620
That's the relevant message here, master not discovered.
20:42.770 --> 20:46.880
This note is not previously joined a bootstrap cluster and cluster initial master nodes is empty on
20:46.880 --> 20:49.280
this node so it's going to continue.
20:49.280 --> 20:53.030
Discovery on 120 7.0.1 9301 from the host providers.
20:53.660 --> 20:58.970
But yeah, that's basically telling us that we had a problem actually electing a master because we didn't 20:58.970 --> 21:01.220 list any master nodes and it couldn't find any makes sense.
21:01.220 --> 21:01.460
Right?
21:02.750 --> 21:07.070
So these issues are going to be very similar when forming a new cluster and we can simulate that in
21:07.070 --> 21:09.560
our environment with the cluster initial master node settings.
21:09.830 --> 21:12.290
So again, let's make sure there's no previous data on our node.
21:12.680 --> 21:13.970
We'll go ahead and blow away that.
21:14.360 --> 21:15.980
Let's stop the service before we forget, huh?
21:16.610 --> 21:18.590
So back to this other site here.
21:20.700 --> 21:21.540
Stop service.
21:22.560 --> 21:23.670
Now we're going to blow away.
21:24.150 --> 21:25.530
Var Lib Elasticsearch again.
21:26.800 --> 21:27.370
Like so.
21:29.690 --> 21:30.080
All right.
21:30.080 --> 21:32.720
And now we can edit our way and I'll file again.
21:37.490 --> 21:41.240
And now we're going to go back to make sure our cluster name is still a lecture cluster.
21:41.240 --> 21:43.010
And now we're gong to set our initial master nodes.
21:45.560 --> 21:47.660
So it was complaining before that we had an empty list there.
21:47.660 --> 21:54.800
So let's give it a a list and I'll go to insert mode and now we can edit this line, uncomment it and
21:54.800 --> 21:59.120
we'll set it to the list of Node one, note two and Node three. 22:04.580 --> 22:06.080 So let's go ahead and hit escape.
22:06.320 --> 22:11.720
Colin WQ Exclamation point two writing quit and we'll restart the note again.
22:13.340 --> 22:14.240
Start the service.
22:16.120 --> 22:17.460
And see what happens this time.
22:20.290 --> 22:20.610
All right.
22:20.620 --> 22:22.030
Looks like it went okay.
22:22.030 --> 22:23.800
But again, let's check and make sure.
22:23.830 --> 22:24.490
Let's hit the route.
22:24.490 --> 22:24.940
End Point.
22:27.750 --> 22:29.880
Still we have no cluster EOD.
22:29.910 --> 22:32.430
So we didn't actually join a cluster that failed.
22:32.460 --> 22:33.750
And if we do a health check again.
22:37.560 --> 22:39.330
We'll have to wait 30 seconds for that to time out.
22:41.460 --> 22:41.820
All right.
22:41.820 --> 22:42.630
Same deal, Master.
22:42.630 --> 22:43.320
Not discovered.
22:43.380 --> 22:45.690
Let's check the logs again to see what happened this time.
22:46.320 --> 22:48.540
So we'll just tail those last 500 lines again.
22:49.640 --> 22:54.200
And we're going to look for something about discovering master eligible notes to do. 22:58.080 --> 22:59.490 Probably should have crept for Warren, huh?
23:05.380 --> 23:06.160
This looks interesting.
23:07.480 --> 23:07.900
All right.
23:07.900 --> 23:09.640
Node one not discovered yet.
23:10.660 --> 23:12.370
This node must discover massive eligible nodes.
23:12.370 --> 23:14.660
Node one no to a node three to bootstrap a cluster.
23:14.680 --> 23:15.850
We only discovered node one.
23:16.630 --> 23:19.270
So, yeah, you can't just specify nodes that don't exist there.
23:20.260 --> 23:24.310
All right, so we have performed some experiments here, so we'll need to use your imagination to complete
23:24.310 --> 23:24.880
the picture.
23:25.000 --> 23:29.080
Now, in a real production scenario, there are many reasons why this problem often appears.
23:29.650 --> 23:34.030
Since we're dealing with a distributed system, many external factors such as network communication
23:34.030 --> 23:36.940
come to play and may cause the notes to be unable to reach each other.
23:37.000 --> 23:40.640
So the problem might not just be that I listed a bunch of fictitious hosts there.
23:40.780 --> 23:44.860
It might be that those are valid hosts, but they can't be reached for some reason to resolve these
23:44.860 --> 23:45.280
issues.
23:45.370 --> 23:46.870
You need to triple check all your settings.
23:47.230 --> 23:49.420
So again, let's go back into them.
23:50.440 --> 23:54.190
We need to make sure the cluster name, all the notes are joining or forming the right cluster.
23:54.910 --> 24:00.520
The no name and a miss type in the no names can cause invalidity for the master elections and the seed
24:00.520 --> 24:04.120
hostnames, APIs and supports down here somewhere.
24:05.970 --> 24:10.570
Got to make sure those all have valid seed hosts linked and that the ports are actually the configured
24:10.570 --> 24:10.960
ones.
24:11.710 --> 24:14.680
We need to check connectivity between the nodes and the firewall settings.
24:14.800 --> 24:19.600
So use telnet or similar tools to inspect your network and make sure it's open for communication between
24:19.600 --> 24:22.390
the nodes, the transport layer and the ports especially.
24:23.170 --> 24:24.250
Also check SSL.
24:24.250 --> 24:29.080
Intel's communication encryption is a vast topic and we're not going to touch that here, but it's a
24:29.080 --> 24:33.790
usual source of troubles invalid certificates and untrusted certificate, authorities and things like
24:33.790 --> 24:34.090
that.
24:34.840 --> 24:38.320
Also be aware that there are special requirements on the certs when encrypting No.
24:38.320 --> 24:39.370
Two, No communication.
24:40.900 --> 24:44.800
All right, the last thing we're going to explore is the relationship between the shard allocation and
24:44.800 --> 24:47.440
cluster state as these two things are tightly related.
24:48.010 --> 24:52.240
But first, we need to change the Elasticsearch y email configuration to let our notes successfully
24:52.240 --> 24:53.470
form a single node cluster.
24:53.950 --> 24:59.770
So back in our configuration file here, let's just set the initial master as the node itself and start
24:59.770 --> 25:00.340
the service.
25:01.580 --> 25:03.170
So to take out No to a No.
25:03.170 --> 25:05.300
Three and just hit I had to go to insert mode.
25:06.410 --> 25:07.910
Forgot I was in vim there for a second.
25:09.420 --> 25:09.890
Escape.
25:10.140 --> 25:12.060
Colin WQ exclamation point.
25:12.270 --> 25:12.980
We wrote that out.
25:12.990 --> 25:15.600
So now let's restart our service yet again.
25:17.540 --> 25:17.990
Stop it.
25:19.590 --> 25:20.040
Started.
25:22.680 --> 25:23.190
All right.
25:23.430 --> 25:25.710
And again, we'll carry the cluster health API.
25:25.810 --> 25:26.610
Let's see what happened.
25:30.150 --> 25:32.590
So we can see the cluster status is, in fact, green.
25:32.640 --> 25:33.210
That's good.
25:33.930 --> 25:35.520
So what does cluster status mean?
25:35.730 --> 25:39.180
Well, it actually reflects the worst state of any of the indices that we have in our cluster.
25:39.900 --> 25:41.220
The different options include red.
25:41.730 --> 25:44.850
That means one or more shards of the index is not assigned in the cluster.
25:45.360 --> 25:49.920
This can be caused by various issues at the cluster level, like disjoint nodes or problems with disks
25:49.920 --> 25:50.670
and things like that.
25:51.450 --> 25:56.460
Generally, the red status marks very serious issues, so be prepared for some potential data loss.
25:57.150 --> 25:58.200
It could also be yellow.
25:58.230 --> 26:00.750
In that case, the primary data are not yet impacted.
26:01.080 --> 26:04.500
All the primary shards are okay, but some replica shards are not assigned.
26:05.130 --> 26:09.540
Like, for example, replicas won't be allocated on the same node as the primary shard by design.
26:10.290 --> 26:15.420
This status marks a risk of losing data and green means all shards are well allocated.
26:15.840 --> 26:20.160
However, it doesn't mean that the data is safely replicated as a single node cluster, since with a
26:20.160 --> 26:22.560
single shard index it would be green as well.
26:23.280 --> 26:26.670
So now let's create an index with one primary shard and one replica.
26:27.510 --> 26:30.210
We'll do that with curl request.
26:30.450 --> 26:42.000
Put local host 9200 slash test what's called the index test backslash slash data raw curly with the
26:42.000 --> 26:42.900
following settings.
26:43.620 --> 26:55.590
Curly bracket number of groups of shards will be set to one and the number of replicas will be set to
26:55.590 --> 26:56.220
one as well. 26:56.820 --> 26:57.630 Close everything out.
26:58.350 --> 27:03.570
All right, so suddenly our cluster will turn yellow because our worst performing index, the only one
27:03.570 --> 27:05.040
we have, is also yellow.
27:05.310 --> 27:06.900
Let's check our health again.
27:08.370 --> 27:08.630
Yep.
27:08.640 --> 27:09.270
Now we're yellow.
27:10.350 --> 27:15.510
Now you can also check the shards assignment with the Cat Shards API and see what's going on there.
27:15.540 --> 27:20.550
So let's say curl localhost 9200 slash underscore cat slash shards.
27:20.910 --> 27:23.620
Question mark v aha.
27:24.660 --> 27:27.180
So we can see that we have unassigned shards here.
27:28.470 --> 27:32.190
Or if you want a more descriptive information, you can use the cluster allocation.
27:32.190 --> 27:36.750
Explain API, which provides an explanation as to why the individual shards were not allocated.
27:36.960 --> 27:43.680
To do that will say Karl local host 9200 cluster allocation.
27:44.660 --> 27:46.260
Explain pretty.
27:49.390 --> 27:52.810
And that tells you very explicitly what's going on in our case, as I mentioned before.
27:53.140 --> 27:57.670
The reason is due to the allocation of the data replica to the same node being disallowed, since it
27:57.670 --> 28:01.090
makes no sense from a resiliency perspective, you wouldn't have a replica on the same node.
28:01.090 --> 28:01.930
That's that's silly.
28:02.740 --> 28:03.790
So how would you resolve this?
28:03.820 --> 28:04.930
Well, we have two options.
28:05.350 --> 28:08.620
One would be to remove the replica shard, which is not a real solution.
28:08.620 --> 28:10.870
But if you need the actual status, it will work out.
28:11.440 --> 28:14.530
Or you could add another node on which the shards could be reallocated.
28:14.830 --> 28:16.210
So let's take that second route.
28:18.120 --> 28:23.280
So to simulate the following failures, I actually have two different nodes running on the same host
28:23.280 --> 28:27.510
here, and setting that up is kind of involved and we're going to do that later in the course as we
28:27.510 --> 28:28.410
go into failover.
28:28.830 --> 28:32.490
So for now, I just want you to watch and not actually try to follow on yourself.
28:32.670 --> 28:34.350
So I've already done some of the grunt work here.
28:34.650 --> 28:39.750
Basically, you need to set up a separate system to a unit file for the second node and a server configuration
28:39.750 --> 28:40.800
and stuff like that.
28:40.800 --> 28:43.480
So just watch from this point on.
28:43.500 --> 28:43.800
Okay.
28:44.640 --> 28:48.570
So anyway, let's start by reviewing the main configuration file of that second note that I've already
28:48.570 --> 28:52.560
set up and will ensure that it will join the same cluster with our existing nodes.
28:52.560 --> 28:59.130
So let's say sudo nano etsi Elasticsearch dash node two is where I put that. 29:05.690 --> 29:07.460 All right, so we have the same cluster name.
29:08.030 --> 29:10.130
We're calling our node here Node two.
29:10.790 --> 29:14.540
And we can see that our seed hosts is set to a loopback address, hopefully.
29:17.870 --> 29:18.090
Yep.
29:18.470 --> 29:21.950
And we can see that our master knows consists of node one and node two.
29:23.110 --> 29:29.770
Let's go ahead and exit out of here and start that second node sudo system control start Elasticsearch
29:30.460 --> 29:32.940
that's node two dot service.
29:35.310 --> 29:35.640
Okay.
29:35.640 --> 29:38.790
So at this point I have started up a second node on the same VM.
29:38.790 --> 29:42.960
Again, there's quite a bit of configuration behind making that happen, so just watch for this part
29:42.960 --> 29:43.200
of it.
29:43.860 --> 29:47.220
So now that we have a second node spun up, we should be back in a green status.
29:47.220 --> 29:48.120
So let's check.
29:48.240 --> 29:50.010
Let's say kernel dashed silence.
29:50.970 --> 29:51.780
Local host.
29:53.260 --> 29:56.080
200 slash underscore cluster slash.
29:56.080 --> 29:56.500
Health.
29:57.190 --> 29:57.580
Pretty.
29:59.190 --> 30:00.780
And we'll just grep for the status line. 30:02.380 --> 30:03.610 And our status is green.
30:03.640 --> 30:04.000
Great.
30:04.840 --> 30:08.740
Okay, so we've resolved her issue and the replica shards were automatically reallocated.
30:08.740 --> 30:09.160
Perfect.
30:10.120 --> 30:11.590
So let's continue with this example.
30:11.590 --> 30:16.660
To simulate the red cluster state, let's start by removing the index and creating it again, but this
30:16.660 --> 30:19.390
time with only two primary shards and no replica.
30:19.390 --> 30:21.610
And we'll quickly see why this is a bad idea.
30:21.970 --> 30:31.200
So first of all, delete the one that we have with Curl Bash Dash request delete local host 9200 slash
30:31.210 --> 30:40.750
test and I will recreate it with curl request put localhost 9200 slash test backslash.
30:42.720 --> 30:52.380
And dash, dash data, dash raw, quick curly settings will set the number of shards.
30:54.510 --> 31:03.660
Two, one, two, two, rather four because we have two notes to work with and number of replicas to
31:03.660 --> 31:04.140
zero.
31:05.160 --> 31:06.750
So this seems like a pretty bad idea.
31:06.780 --> 31:09.900
You know, we have our shard split across two nodes, but no backups anywhere.
31:11.720 --> 31:13.100
All right, but so far, so good.
31:13.130 --> 31:14.840
You know, it's at least storing it.
31:15.170 --> 31:17.750
Let's check the shards of salmon to see what's actually going on here. 31:17.960 --> 31:24.980 Carol, local host, 9000 underscore cat slash shards, verbose.
31:26.420 --> 31:26.750
Okay.
31:27.110 --> 31:31.280
So we can see that each primary shard is on a different node, which follows the standard allocation
31:31.280 --> 31:33.860
rules set at the cluster level and at the index level.
31:34.340 --> 31:35.930
And you likely know where we're heading.
31:36.650 --> 31:41.120
So imagine the situation where some network issue emerges and your cluster splits up, resulting in
31:41.120 --> 31:46.070
disabled node communication, or even worse, some disk malfunctions leading to the improper functioning
31:46.070 --> 31:46.550
of a node.
31:47.240 --> 31:50.090
Now, the easiest way to simulate this is to just stop one of our nodes.
31:50.360 --> 31:51.680
So let's go ahead and kill No.
31:51.680 --> 31:56.480
Two with a pseudo slash spin slash system control.
31:57.590 --> 31:59.750
Stop Elasticsearch Dash No.
31:59.750 --> 32:00.980
Two dot service.
32:03.020 --> 32:04.010
And down it goes.
32:04.430 --> 32:07.230
So now if we check our status again to do.
32:10.570 --> 32:11.860
We are now in red status.
32:12.700 --> 32:13.390
That's a bad thing.
32:13.840 --> 32:17.110
So now let's check the explain API to learn more about what's going on.
32:17.230 --> 32:19.270
Curl local host.
32:19.990 --> 32:24.040
A200 slash underscore cluster slash allocation.
32:25.640 --> 32:27.080
Explain pretty.
32:29.940 --> 32:30.500
All right.
32:30.510 --> 32:35.220
So we cannot allocate it because a previous copy of the primary chart existed but can no longer be found
32:35.220 --> 32:36.210
on the nodes in the cluster.
32:36.390 --> 32:37.630
Well, that tells you what's going on.
32:37.650 --> 32:38.610
It's pretty well described.
32:39.030 --> 32:40.680
A node left as we have turned it off.
32:41.040 --> 32:46.320
But in the real world that has various potential causes and no valid shard copy can be found in the
32:46.320 --> 32:48.780
cluster, in which case we're missing data.
32:49.440 --> 32:54.060
Unfortunately, there's no easy solution to this scenario, as we do not have any replicas and there's
32:54.060 --> 32:55.500
no way we could remake our data.
32:56.520 --> 33:00.450
So firstly, if you are dealing with some network problems, try to thoroughly inspect what could go
33:00.450 --> 33:06.210
wrong like a misconfiguration of firewalls and inspect it as a priority, since data cannot consistently
33:06.210 --> 33:07.380
be indexed in this state.
33:08.280 --> 33:12.690
Now, depending on the document routing, many indexing requests can be pointed toward the missing shard
33:12.690 --> 33:13.770
and end up timing out. 33:14.460 --> 33:17.050 For example, this to try to insert a document and see what happens.
33:17.070 --> 33:18.200
Curl request.
33:19.560 --> 33:29.190
Post local host 200 slash test underscore doc data raw and we'll just say a message.
33:31.360 --> 33:31.930
It's data.
33:35.720 --> 33:37.310
And this should lead to an exception.
33:39.600 --> 33:42.780
And after about 30 seconds or so, it finally timed out on me.
33:43.560 --> 33:48.480
Now, secondly, if no possible solution was found, the only option left to get the index to work properly
33:48.480 --> 33:49.920
may be to allocate a new shard.
33:50.400 --> 33:54.990
But be aware that even if the lost node will come back afterwards, the new shard will just overwrite
33:54.990 --> 33:56.700
it because it is in a newer state.
33:57.480 --> 34:00.720
Now we can allocate a new shard with the cluster reroute API.
34:00.840 --> 34:05.700
So here we will allocate one for the test index on the node dash one that operates correctly.
34:06.210 --> 34:08.580
Note that we have to explicitly accept data loss.
34:08.700 --> 34:16.770
So curl request post local host 9200 slash underscore cluster slash reroute.
34:18.210 --> 34:19.620
And we want pretty results.
34:20.770 --> 34:22.990
Backslash did all.
34:24.330 --> 34:26.580
Quick curly commands.
34:27.850 --> 34:29.170
Actually, there's going to be a square bracket.
34:29.170 --> 34:29.830
We have a list of them.
34:31.610 --> 34:32.840
And curly brackets.
34:35.370 --> 34:38.400
Allocate empty primary.
34:41.990 --> 34:43.040
Index test.
34:44.650 --> 34:45.190
Shard.
34:45.820 --> 34:52.990
This one node is node one and except data loss.
34:54.950 --> 34:55.610
We'll be true.
34:58.230 --> 34:59.340
Cause everything out.
35:01.240 --> 35:01.930
I think that's right.
35:05.310 --> 35:05.730
All right.
35:05.940 --> 35:08.880
And afterwards, we should no longer experience timeouts during indexing.
35:10.170 --> 35:11.130
All right, so we're done.
35:11.130 --> 35:15.570
But we just need to restore everything from our backup now, because we did do some pretty invasive
35:15.570 --> 35:16.570
stuff to our index here.
35:16.590 --> 35:20.580
So we're back at the point where you should be following along if you were following along earlier.
35:20.610 --> 35:20.880
Okay.
35:20.880 --> 35:24.780
We need to restore from that back up and make sure we're not left with any lingering issues that we
35:24.780 --> 35:25.590
might have introduced.
35:25.800 --> 35:29.520
So we're going to restore all of our original indices that we backed up earlier. 35:29.940 --> 35:31.860 Before we can do that, we need to do some cleaning up.
35:32.490 --> 35:36.390
So first, we need to make sure that the repository path is registered again in the Elasticsearch dot.
35:36.990 --> 35:39.120
As we've done some changes to it during the exercise.
35:39.750 --> 35:43.380
So let's go ahead and reference our stored config file that we squirreled away at the start of the lesson,
35:44.040 --> 35:45.090
and we'll put that back.
35:47.010 --> 35:48.960
So let's see, go back to our home directory.
35:48.960 --> 35:49.860
I think that's where we put it.
35:50.760 --> 35:50.970
Yep.
35:50.970 --> 35:52.320
There's Elasticsearch that Lyonel.
35:52.320 --> 35:54.180
So let's go ahead and move that back into position.
35:54.270 --> 35:55.530
Sudo move Elasticsearch.
35:55.530 --> 35:58.590
So I am going to see Elasticsearch.
36:01.080 --> 36:03.480
All right, let's double check that it's there and looks correct.
36:07.440 --> 36:09.860
Well, I'm the wrong user sudo.
36:09.900 --> 36:13.320
Listen to me and we'll just go ahead and edit it directly at c.
36:13.830 --> 36:14.700
Elasticsearch.
36:15.180 --> 36:18.210
Elasticsearch why?
36:18.240 --> 36:18.510
Email.
36:19.800 --> 36:20.790
Make sure things look normal. 36:21.480 --> 36:21.690 All right.
36:21.690 --> 36:22.920
Things are back to how we started.
36:23.190 --> 36:24.300
We have node one.
36:25.400 --> 36:28.370
We still have the path to repo set to home student backups.
36:28.370 --> 36:32.930
That's very important so we can restore that backup memory lock is commented out again.
36:33.860 --> 36:36.800
Everything looks like it's back to default settings, so that's good.
36:37.730 --> 36:38.020
All right.
36:38.030 --> 36:38.420
Looks good.
36:38.420 --> 36:38.870
Looks good.
36:39.260 --> 36:43.430
Now, we do need to make sure that Elasticsearch has permission to read that configuration file we just
36:43.430 --> 36:44.150
restored first.
36:44.150 --> 36:51.590
So let's go to the SC folder and do a pseudo Alice start.
So now we should be able to restart our main node.
37:26.660 --> 37:27.200
Like so.
37:31.050 --> 37:31.320
Right.
37:32.250 --> 37:36.390
So now we can reregister our repository again to make sure it's ready to provide the backup data.
37:36.780 --> 37:47.130
Curl request put will host and a few hundred slash underscore snapshot slash backup dash repo backslash
37:47.790 --> 37:48.420
data all.
37:51.080 --> 37:52.640
Type is filesystem system.
37:54.150 --> 37:54.870
Settings.
37:58.170 --> 37:58.830
Location.
38:00.120 --> 38:02.640
Home Student Backups.
38:03.060 --> 38:03.780
Backup dash.
38:03.780 --> 38:04.170
Repo.
38:07.900 --> 38:08.350
All right.
38:09.010 --> 38:13.390
And we can check the available snapshots in the repository with a simple cash request to our backup 38:13.390 --> 38:13.780 repo.
38:13.780 --> 38:16.630
And we should see our snapshot one waiting to be restored.
38:17.260 --> 38:18.310
Curl Local Host.
38:18.340 --> 38:21.370
Note both underscore cat snapshots.
38:22.600 --> 38:23.320
Flash Backup.
38:23.320 --> 38:23.920
Dash Repo.
38:26.060 --> 38:26.360
All right.
38:27.290 --> 38:28.190
It's a success.
38:28.580 --> 38:32.600
Now, to prevent any rights during the restore process, we need to make sure that all of our indices
38:32.600 --> 38:33.170
are closed.
38:33.380 --> 38:38.300
So, however, as of Elasticsearch eight, they've made it not that easy.
38:38.330 --> 38:43.040
You actually need to disable a safety feature that prevents you from closing all of your indices at
38:43.040 --> 38:43.730
once together.
38:44.240 --> 38:48.050
So let's fire up our editor and edit our Elasticsearch Dynamo file.
38:48.440 --> 38:54.380
And we're looking for the setting action dot destructive underscore requires, underscore name, uncomment
38:54.380 --> 38:56.420
that and let it be set to false.
39:04.890 --> 39:10.320
And after that, we're going to stop and restart the service again to pick that change up.
39:25.200 --> 39:34.050
Curl request post will host a 200 slash underscore all underscore close. 39:36.330 --> 39:39.240 And finally we can restore a backup with Colonel request.
39:40.050 --> 39:43.710
Post localhost 9000 snapshot.
39:45.240 --> 39:45.780
Backup.
39:46.290 --> 39:46.620
Dash.
39:46.620 --> 39:48.930
Repo slash snapshot.
39:48.960 --> 39:49.650
Dash one.
39:51.860 --> 39:53.390
Slash underscore restore.
39:56.540 --> 39:58.430
And it took it slow after a few seconds.
39:58.430 --> 40:01.490
If we check our indices, we should see all the original data back in place.
40:02.210 --> 40:05.930
Curl Local Host 9200 Slasher and Score Cat Slash Indices.
40:08.510 --> 40:10.880
And there's our original Shakespeare index, for example.
40:10.880 --> 40:11.930
So, yeah.
40:12.080 --> 40:12.950
Things have been restored.
40:13.460 --> 40:13.760
Great.
40:13.760 --> 40:18.260
So now that you're armed with foundational knowledge and various commands on troubleshooting your Elasticsearch
40:18.260 --> 40:22.940
cluster, the last piece of advice is to stay positive even when things are not working out.
40:23.480 --> 40:26.690
It's part of and parcel to being an Elasticsearch engineer.
Learn Python Programming for Beginners: Best Step-by-Step Guide for Coding with Python, Great for Kids and Adults. Includes Practical Exercises on Data Analysis, Machine Learning and More.
The Advanced Roblox Coding Book: An Unofficial Guide, Updated Edition: Learn How to Script Games, Code Objects and Settings, and Create Your Own World!