You are on page 1of 38


00:07.030 --> 00:12.280

Elasticsearch is a complex piece of software by itself, but complexity is further
increased when you

00:12.280 --> 00:14.410

spin up multiple instances to form a cluster.

00:15.280 --> 00:18.040

This complexity comes with the risk of things going wrong.

00:18.730 --> 00:22.660

In this lecture, we're going to explore some common issues that you're likely to
encounter on your

00:22.660 --> 00:23.770

Elasticsearch journey.

00:24.550 --> 00:27.550

There are plenty more potential issues that we can squeeze into this lesson.

00:27.550 --> 00:33.820

So let's focus on the most prevalent ones, mainly related to a node setup, a
cluster formation and

00:33.820 --> 00:34.690

the cluster state.

00:38.440 --> 00:44.290

The potential Elasticsearch issues can be categorized according to the
Elasticsearch lifecycle node

00:44.290 --> 00:44.740


00:45.790 --> 00:49.060

Potential issues include the installation and initial startup.

00:49.570 --> 00:53.710

The issues can differ significantly depending on how you run your cluster, like
whether it's a local

00:53.710 --> 00:56.920

installation running on containers or a cloud service, etc..

00:57.880 --> 01:02.860

In this lesson, we'll follow the process of a local setup and focus specifically on
bootstrap checks,

01:02.860 --> 01:04.960

which are very important when starting a node up.

01:06.130 --> 01:07.750

Discovery and cluster formation.

01:08.650 --> 01:13.060

This category covers issues related to the discovery process when the nodes need to
communicate with
01:13.060 --> 01:15.220
each other to establish a cluster relationship.

01:16.000 --> 01:21.010

This may involve problems during the initial bootstrapping of the cluster nodes,
not joining the cluster

01:21.010 --> 01:22.750

and problems with master elections.

01:24.090 --> 01:25.650

Indexing data and sharding.

01:26.400 --> 01:29.040

This includes issues related to index settings and mapping.

01:29.220 --> 01:33.540

But as this is covered in other lectures, we'll just touch upon how sharding issues
are reflected in

01:33.540 --> 01:34.350

the cluster state.

01:35.580 --> 01:41.070

Searching search being the ultimate step of the set up journey can raise issues
related to queries that

01:41.070 --> 01:44.460

return less relevant results or issues related to search performance.

01:44.910 --> 01:47.160

This topic is covered in another lecture in this course.

01:50.550 --> 01:54.990

Now that we have some initial background of potential issues with Elasticsearch,
let's go one by one.

01:54.990 --> 01:59.550

Using a practical approach will expose the pitfalls and show how to overcome them.

02:01.180 --> 02:06.280

So before we start messing up our cluster to simulate real world issues, let's back
up our existing

02:06.280 --> 02:06.850


02:07.210 --> 02:08.320

This will have two benefits.

02:08.470 --> 02:12.580

After we're done, we can get back to where we ended up and just continue on in the
course and we'll

02:12.580 --> 02:16.510

better understand the importance of backing up to prevent data loss while
02:17.020 --> 02:18.700
First, we need to set up our repository.

02:18.850 --> 02:23.350

So let's open up our Elasticsearch YAML file using your favorite editor.

02:23.800 --> 02:24.270

I like that.

02:24.280 --> 02:25.780

No, let's see.

02:26.080 --> 02:27.040


02:28.150 --> 02:29.260


02:29.710 --> 02:30.310

Why yaml?

02:35.870 --> 02:39.290

And we want to make sure we have a registered repository path on our machine.

02:39.300 --> 02:41.270

So we're looking for the path dot repo.

02:42.650 --> 02:43.630

Don't think there's one in here.

02:45.990 --> 02:47.190

Let's go ahead and add one then.

02:52.510 --> 02:53.440

Path got repo.

02:55.430 --> 02:56.090

Square bracket.

02:56.420 --> 03:00.590

Home student backups that should do the job.

03:01.550 --> 03:01.910

All right.

03:02.210 --> 03:04.310

Control o, enter control x.

03:04.430 --> 03:05.180

So that saved.

03:06.290 --> 03:10.340

And we might want to save a copy of this config file now as well so we can get back
to it at the end

03:10.340 --> 03:10.880

of the lesson.

03:10.880 --> 03:20.090

So let's make a copy, we'll say pseudo c.p, that's the Elasticsearch Elasticsearch
dot y html and

03:20.090 --> 03:25.460

let's just copy that into our home directory and that way we can just copy that
back when we're done

03:25.460 --> 03:27.440

if we need to restore any of those settings later on.

03:28.370 --> 03:28.840


03:28.850 --> 03:33.020

So we need to make sure that the directory exists that we're going to be storing
that repository into

03:33.050 --> 03:34.780

and that Elasticsearch can write into it.

03:34.790 --> 03:39.320

So let's say maker dash p home student backups.

03:42.110 --> 03:48.020

And will change the group on that to elasticsearch like so sudo changed group

03:48.770 --> 03:49.970

Home Student Backups.

03:52.940 --> 03:54.620

And finally make it rideable.

03:55.040 --> 03:59.420

Pseudo change mod g plus w home student backups.

04:02.580 --> 04:06.170

And we need to restart Elasticsearch to pick up that configuration change we made.

04:06.180 --> 04:13.530

So let's say sudo in system control stop Elasticsearch start service.

04:19.250 --> 04:20.330

And we'll restart it.

04:24.650 --> 04:24.980


04:24.980 --> 04:30.590

So now we can register the new repository to Elasticsearch at the path we
configured with Curl request.

04:31.110 --> 04:35.870

Put local local host 1900.

04:36.440 --> 04:40.940

Underscore snapshot slash backup dash repo backslash.

04:43.240 --> 04:43.670


04:43.750 --> 04:44.290


04:45.470 --> 04:59.240

Quote, Curly type will be filesystem and the settings will have a location of home
students backups,

05:00.140 --> 05:05.570

backup dash repo, close everything out and looks like a took.

05:06.080 --> 05:10.430

Now we can initiate the snapshot process to do the backup with kernel request.

05:10.700 --> 05:14.540

Put localhost 200 underscore snapshot.

05:17.080 --> 05:21.780

Back up dash repo and we'll call it snapshot dash one.

05:24.440 --> 05:25.460

So it looks like that worked.

05:25.610 --> 05:29.910

We can check the status of that with a simple get request with kernel request.

05:30.620 --> 05:33.980

Get local host native snapshot.

05:35.060 --> 05:35.900

Back up repo.

05:37.190 --> 05:38.030

Snapshot one.

05:39.800 --> 05:40.670

And we'll make a pretty.

05:43.640 --> 05:44.420

Looks like it worked.

05:44.510 --> 05:48.830

It says the state was to success.

05:49.070 --> 05:49.580

All right, cool.

05:50.090 --> 05:50.600

Very good.

05:50.630 --> 05:54.230

Now that we have our data backed up, we can now proceed to nuke our cluster.

05:54.770 --> 05:55.730

So let's get started.

05:56.360 --> 05:58.130

Well, let's recap on the basics about logs.
05:58.400 --> 06:00.710
So we'll start by looking at the Elasticsearch logs.

06:01.160 --> 06:04.470

Their location will depend on the path that logs setting in your Elasticsearch.

06:04.670 --> 06:05.240

Why yaml?

06:05.540 --> 06:10.640

By default they are found in var log Elasticsearch slash whatever your cluster name
is, start log.

06:11.390 --> 06:15.680

So basic tailing commands come in handy to monitor the logs in real time.

06:15.680 --> 06:17.870

And so say want to keep an eye on these logs off to the side?

06:18.230 --> 06:23.030

I'm actually going to start a different terminal window here, so let's go ahead and
start a new Telnet

06:23.030 --> 06:23.690

client here.

06:31.940 --> 06:34.470

It would help if I typed in my password correctly.

06:34.490 --> 06:34.940

There we go.

06:36.320 --> 06:36.710

All right.

06:36.950 --> 06:38.870

And let's see where those logs live.

06:39.140 --> 06:41.690

Those are going to be in var log Elasticsearch.

06:44.250 --> 06:47.160

So our account has insufficient rates to actually read these logs.

06:47.250 --> 06:49.260

Now there are various options to solve this.

06:49.410 --> 06:54.450

For example, a valid group assignment of your Linux user or one generally simpler
approach is to provide

06:54.450 --> 06:57.570

the user sudo permission to run Shell as the Elasticsearch user.

06:58.200 --> 07:02.130

We can do this by editing the pseudo file using the pseudo under route.

07:02.280 --> 07:04.290

So let's just say pseudo vs pseudo.

07:11.710 --> 07:13.420

And we will add the following line.

07:16.410 --> 07:17.460

The center to the bottom here.

07:19.170 --> 07:21.060

How about username?

07:21.840 --> 07:24.030

All equals parentheses.

07:24.090 --> 07:26.270

Elasticsearch parentheses.

07:27.120 --> 07:28.500

No password.

07:29.640 --> 07:31.710

All that should do it.

07:31.890 --> 07:32.820

So control O.o.

07:33.330 --> 07:34.110

Control X.

07:35.530 --> 07:39.250

So after we've done that, we can run the following command to launch a new shell as
the Elasticsearch

07:39.250 --> 07:42.430

user sudo dash S2 Elasticsearch.

07:44.440 --> 07:44.810


07:45.220 --> 07:46.360

So now we should have the permissions.

07:46.360 --> 07:48.970

We need to actually look at these logs, so let's try that again.

07:49.780 --> 07:50.740

CD bar.

07:50.770 --> 07:52.000

Log Elasticsearch.

07:53.080 --> 07:53.750

That's better.

07:54.250 --> 07:59.080

And now we can do things like tailed ash and want to look at the last 100 lines in
this log file.

07:59.560 --> 08:03.790

And our cluster name is actually Elasticsearch start log because we haven't changed

08:04.420 --> 08:05.110

And there you have it.

08:05.770 --> 08:08.260

Or sometimes you just want to look for error messages, right?

08:08.260 --> 08:15.550

So for example, we could look at the last 500 log lines and pipe that into grep for
error and that

08:15.550 --> 08:16.670

would just show us any errors.

08:17.140 --> 08:20.170

Fortunately, we don't have any because our cluster is healthy, so that's cool.

08:20.680 --> 08:25.690

And sometimes it can also be useful to grab a few surrounding log lines with the
context parameter because

08:25.690 --> 08:28.990

the messages and struct stack traces can be multi-line sometimes.

08:28.990 --> 08:34.870

So we could say, for example, cat Elasticsearch dot log grep bootstrap.

08:36.260 --> 08:36.560


08:36.560 --> 08:40.610

Dash context equals three to get the three surrounding lines for each hit there.

08:41.390 --> 08:45.530

So for example, here we have a bootstrap hit and the three lines before and after
it as well.

08:46.040 --> 08:48.350

So those are some useful tricks for looking at the logs.

08:48.710 --> 08:49.220

All right.

08:49.220 --> 08:51.440

So let's start talking about bootstrap checks.

08:51.440 --> 08:53.150

We'll go back to our primary terminal here.

08:53.900 --> 08:58.580

Bootstrap checks are pre-flight validations performed during a node start, which
ensure that your node

08:58.580 --> 09:00.320

can reasonably perform its functions.

09:00.830 --> 09:03.950

There are two modes which determine the execution of bootstrap checks.

09:04.580 --> 09:10.190

Development mode is when you bind your node only to a loopback address localhost or
with an explicit

09:10.190 --> 09:12.860

discovery type of single dash node.

09:13.400 --> 09:18.050

No bootstrap checks are performed in development mode, and then in production mode
is when you bind

09:18.050 --> 09:24.440

your note to a non loopback address like, thus making it reachable by other

09:24.800 --> 09:26.960

This is the mode where bootstrap checks are executed.

09:27.620 --> 09:31.520

Let's see them in action because when the checks don't pass, it can become tedious
work to find out

09:31.520 --> 09:32.300

what's going on.

09:33.680 --> 09:38.450

So one of the first system settings recommended by elastic is to disable heap

09:39.110 --> 09:43.670

This makes sense because Elasticsearch is highly memory intensive and you don't
want to load your memory

09:43.670 --> 09:44.510

data from disk.

09:45.110 --> 09:46.370

There are two options for this.

09:46.670 --> 09:49.940

One is to remove swap files entirely or minimize sloppiness.

09:50.540 --> 09:54.080

This is the preferred option but requires considerable intervention as the root

09:54.650 --> 09:59.150

Or we can add the bootstrapped on memory lock parameter in the last search dot,

09:59.570 --> 10:01.080

So let's try that second option.

10:01.160 --> 10:09.470

Let's go ahead and open our main configuration file with pseudo nano and see
10:09.650 --> 10:11.930
Elasticsearch dot y IMO.

10:13.780 --> 10:16.330

And we'll go ahead and find the bootstrap drop memory lock setting.

10:19.490 --> 10:21.740

And uncomment that to allow it to be true.

10:23.840 --> 10:26.960

Write that out and quit and let's go ahead and restart our service.

10:27.200 --> 10:30.830

So studio system control stop Elasticsearch start service.

10:32.880 --> 10:33.930

And let's restart it.

10:37.760 --> 10:42.200

And after a short wait, we should see some indication of what's happening.

10:44.650 --> 10:45.070

All right.

10:45.070 --> 10:47.800

So, yeah, we actually got an error as a result of doing that.

10:47.800 --> 10:50.110

So let's check our logs and find out what happened.

10:50.650 --> 10:54.010

So let's go spelunking through here and see what went wrong.

10:54.580 --> 10:57.970

Just got to hit the up arrow here to do a fresh tail of my log.

11:00.280 --> 11:01.360

And there we have it.

11:01.360 --> 11:06.100

So there's our error and it says bootstrap checks failed memory locking requested
for Elasticsearch

11:06.100 --> 11:06.520


11:06.520 --> 11:07.930

But memory is not locked.

11:08.560 --> 11:10.160

But didn't we just lock it before?

11:10.840 --> 11:11.650

Well, not really.

11:11.650 --> 11:15.370

We just requested the lock, but it didn't actually get locked, so we hit the memory
11:15.370 --> 11:16.360
Bootstrap check here.

11:17.140 --> 11:21.490

Now, the easy way to fix this in our case is to allow locking and overwrite into
our system to a unit

11:21.490 --> 11:22.480

file like this.

11:22.810 --> 11:24.580

So let's go back to our other window here.

11:25.270 --> 11:27.430

Sudo system control.

11:28.030 --> 11:30.670

Edit Elasticsearch thought service.

11:32.980 --> 11:36.880

And we're going to put in the following config parameter here service.

11:39.640 --> 11:43.120

Limit mem lock equals infinity.

11:47.640 --> 11:48.120

All right.

11:48.270 --> 11:49.980

And let's try spinning that up again.

11:57.460 --> 11:59.170

And this time it should be okay.

12:03.170 --> 12:03.500

All right.

12:03.500 --> 12:04.400

Looks like success.

12:05.930 --> 12:06.380


12:06.650 --> 12:08.390

So let's talk about heap settings next.

12:08.780 --> 12:13.310

Now, if you start playing with the JVM settings in the JVM dot options file, which
you will likely

12:13.310 --> 12:17.450

need to do because by default these settings are set to low for actual production

12:17.960 --> 12:20.480

You may face a similar problem as as we just did.

12:21.320 --> 12:21.980

So how is that?
12:22.310 --> 12:27.170
Well, by setting the initial heap size lower than the max size, which is actually
quite usual in the

12:27.170 --> 12:27.890

world of Java.

12:28.490 --> 12:32.240

Let's open up that option file and lower the initial heap size to see what's going
to happen.

12:32.780 --> 12:38.540

So sudo nano etsi elasticsearch JVM dot options.

12:38.840 --> 12:40.430

VM dot options.

12:43.740 --> 12:46.980

And let's go ahead and change these memory settings here.

12:49.180 --> 12:50.500

Then a comment, not the original one.

12:50.500 --> 12:59.380

So I can go back to them later and we'll set some new ones X and S 500 megabytes
and slash x and x one

12:59.380 --> 12:59.890


13:01.540 --> 13:01.810

All right.

13:01.810 --> 13:03.340

So we've lowered the initial heap size.

13:03.700 --> 13:05.110

Let's go ahead and save this setting.

13:07.470 --> 13:09.150

And we'll restart our service again.

13:11.920 --> 13:12.420

Stop it.

13:13.790 --> 13:14.630

And I'll start it.

13:17.170 --> 13:18.970

And we'll see what happens as it spins up.

13:23.220 --> 13:24.360

Well, looks like we had an error.

13:24.370 --> 13:26.800

So let's go back to our logs and see what's going on.

13:26.850 --> 13:30.750

So back to the other window and I'll hit the up arrow just to tail the last one
lines again.

13:32.440 --> 13:32.920

All right.

13:32.930 --> 13:33.760

Well, there we have it.

13:34.060 --> 13:34.960

Error bootstrap.

13:34.960 --> 13:35.560

No validation.

13:35.560 --> 13:37.060

Exception bootstrap checks failed.

13:37.060 --> 13:39.820

Initial heap size not equal to maximum heap size.

13:40.330 --> 13:42.910

So that's telling us pretty explicitly what the problem was there.

13:43.690 --> 13:47.800

Now, generally speaking, this problem is also related to memory logging, where they
need to increase

13:47.800 --> 13:51.400

the heap size during program operations may have undesired consequences.

13:52.120 --> 13:56.770

So remember to set those numbers to equal values and for the actual values, follow
the recommendations

13:56.770 --> 14:01.960

by elastic, which in short is lower than 32 gigabytes and up to half of the
available RAM memory.

14:02.080 --> 14:04.150

Let's go ahead and change those back before we forget.

14:08.670 --> 14:09.180


14:09.510 --> 14:15.270

So we'll set that back to one gig for both and I'll just use Control K to get rid
of those lines and

14:15.270 --> 14:18.600

control o to save and control x, so we should be back in option.

14:19.110 --> 14:20.970

Let's try starting it up again, just to be sure.

14:25.460 --> 14:25.790

All right.

14:25.790 --> 14:27.110

That time has started successfully.

14:28.190 --> 14:31.370

So let's talk about some other system checks you may want to perform when things go

14:31.610 --> 14:36.290

There are many other bootstrap checks on the runtime platform and its settings,
including a file descriptors

14:36.290 --> 14:41.750

check a maximum number of threads, check a maximum size, virtual memory check and
many others.

14:42.410 --> 14:46.280

You should definitely browse through their descriptions in the docs, but as we're
running the official

14:46.280 --> 14:51.170

Debian distribution that comes with a predefined system D unit file, most of these
issues are resolved

14:51.170 --> 14:52.850

for us in the unit file, among others.

14:53.270 --> 14:56.630

We can check that unit file to see the individual parameters that get configured.

14:56.900 --> 14:58.760

Let's take a look at that unit file to see what's in it.

14:59.040 --> 15:02.630

We can say sudo cat user lib system.

15:02.630 --> 15:05.990

DX System, elasticsearch dot service.

15:07.900 --> 15:08.200

All right.

15:09.010 --> 15:12.250

So just take a look at the different things that you have at your disposal here.

15:12.400 --> 15:16.390

All sorts of things that could go wrong, but by default, they should be okay in our

15:18.160 --> 15:22.480

So just remember that if you run the Elasticsearch binary on your own, you will
need to take care of

15:22.480 --> 15:23.410

these settings as well.

15:24.730 --> 15:28.270

Now, the last check we'll run is the one that will carry us nicely to the next
section of the lesson

15:28.270 --> 15:29.270

dealing with clustering.

15:29.290 --> 15:33.790

But before we dive in, let's see what are the configuration parameters that
Elasticsearch checks during

15:33.790 --> 15:36.550

its startup with a discovery configuration check?

15:37.270 --> 15:41.740

There are three key parameters which govern the cluster formation and discovery

15:41.980 --> 15:43.870

Let's pull up our wine ML file to take a look.

15:44.710 --> 15:51.080

Pseudo nano etsy elasticsearch elasticsearch dot y IMO.

15:52.690 --> 15:52.960

All right.

15:52.960 --> 15:57.550

So one is discovery dot seed hosts should be down here.

16:00.680 --> 16:00.930


16:01.610 --> 16:06.200

Now, this is a list of ideally all the master eligible nodes in the cluster that we
want to join and

16:06.200 --> 16:07.820

draw the last cluster state from.

16:08.330 --> 16:12.850

Now there's also a discovery dot seed underscore provider setting that you could
set here as well,

16:12.860 --> 16:16.760

and that would allow you to provide the seed hosts lists in the form of a file that
gets reloaded on

16:16.760 --> 16:20.840

any change instead of specifying it within the configuration file itself.

16:21.500 --> 16:24.530

Also, let's look at the cluster dot initial master node setting here.

16:25.070 --> 16:29.540

This is a list of the node names, not hostnames for the very first master

16:30.230 --> 16:34.190

So before all of these join and vote, the cluster setup won't be completed.

16:35.330 --> 16:39.470

But what if you don't want to form any cluster, but rather just want to run in a
small single node
16:39.470 --> 16:39.830

16:40.070 --> 16:43.130

Well, you might think you could just eliminate these settings and the y small file.

16:44.150 --> 16:44.470


16:45.200 --> 16:46.040

But no, that won't work.

16:46.130 --> 16:50.180

After starting up, you would hit another bootstrap error, since at least one of
those parameters needs

16:50.180 --> 16:52.490

to be set to pass a bootstrap check.

16:52.970 --> 16:56.390

So we're going to go ahead and put those back because you can't actually get away
with that.

16:56.930 --> 17:00.680

So let's see why this is and dive deeper into troubleshooting the discovery

17:01.310 --> 17:02.540

First, I'll exit out of here.

17:04.520 --> 17:06.560

And let's shut down our cluster before we forget.

17:12.900 --> 17:14.040

Just stop the service.

17:15.030 --> 17:15.420

All right.

17:16.470 --> 17:20.850

So after we've successfully passed the bootstrap checks and started up our node for
the first time,

17:20.850 --> 17:23.910

the next phase in its lifecycle is the discovery process.

17:24.540 --> 17:28.440

Now, to simulate the formation of a brand new cluster, we're going to need a clean

17:28.710 --> 17:33.420

So we need to remove all the data of the node and thus lose all previous cluster
state information.

17:33.450 --> 17:35.580

That's why we backed everything up to a snapshot earlier.

17:36.120 --> 17:39.390

Now, remember, this is really just to experiment in a real production setup.

17:39.690 --> 17:42.090

There would be very few reasons to do this.

17:42.480 --> 17:45.870

I'm going to go to this other window here where I'm logged in and see Elasticsearch
user because I'm

17:46.290 --> 17:48.090

going to need its permissions to do this stuff.

17:48.750 --> 17:52.500

Armed RF var lib elasticsearch.

17:53.980 --> 17:54.750

Last star.

17:56.100 --> 17:56.430

All right.

17:56.430 --> 17:58.320

We blew away our entire node there.

17:58.950 --> 18:04.020

So now let's imagine a situation where we already had a cluster and we just want
the node to join in.

18:04.650 --> 18:10.110

So we need to make sure the cluster name is correct and linked to some seed host
either by IP or hostname

18:10.110 --> 18:10.530

and port.

18:11.400 --> 18:14.160

So let's go ahead and open up our y am file.

18:15.330 --> 18:17.520

We use vim because that's what's installed under this account.

18:18.030 --> 18:22.380

That's the Elasticsearch Elasticsearch dot waymo.

18:24.600 --> 18:26.910

All right, so we need to make sure that we have a cluster name.

18:28.860 --> 18:29.070


18:29.070 --> 18:31.140

I'd go on to insert mode and now I can edit it.

18:32.740 --> 18:36.370

Will change my application to lecture cluster.

18:36.730 --> 18:37.720

It would help if I typed it right.
18:40.190 --> 18:42.380
And we need to set our Discovery seat hosts.

18:45.460 --> 18:46.010


18:52.170 --> 19:00.240

There they are and we'll change that to 127.00.1 Colin 9301 Now this is just a
demonstration, so we're

19:00.240 --> 19:01.320

using a loopback address.

19:01.350 --> 19:06.420

Normally you put an hostname or an IP here and the actual transport port of one or
more of your nodes

19:06.420 --> 19:07.080

in the cluster.

19:08.820 --> 19:12.360

And just to force the failure that we're interested in, I'm going to comment out
this line for the

19:12.360 --> 19:15.600

initial master nodes, and that way it's not going to be able to reach the master.

19:15.630 --> 19:17.310

We'll see what happens when we hit that failure.

19:18.240 --> 19:21.300

Let's go ahead and hit escape colon WQ.

19:21.630 --> 19:23.280

Exclamation point to right and quit.

19:24.090 --> 19:25.590

And now let's start up our service.

19:35.260 --> 19:35.560

All right.

19:35.560 --> 19:36.870

It looks like it started successfully.

19:36.880 --> 19:39.850

Let's check our route, End Point, to see if it really is running.

19:39.850 --> 19:40.300


19:40.450 --> 19:43.120

Curl Local host, coordinated 100.

19:46.550 --> 19:47.330

All right.

19:47.600 --> 19:54.110

So we did get a nice response with various details here, but something is missing
the cluster UUID.

19:55.010 --> 19:57.200

This means that our cluster is not actually formed.

19:57.360 --> 20:02.090

And we can confirm this by checking the cluster state with the cluster health API.

20:02.670 --> 20:03.590

Let's say curl.

20:04.340 --> 20:07.070

Local host coordinates 200 slash underscore.

20:07.070 --> 20:08.330

Cluster slash health.

20:12.240 --> 20:15.090

And after about 30 seconds of waiting will get an exception.

20:19.230 --> 20:20.340

Indeed we did, master.

20:20.340 --> 20:21.450

Not discovered exception.

20:21.870 --> 20:26.760

All right, let's Taylor logs and see that the note didn't discover any master and
will continue the

20:26.760 --> 20:27.740

discovery process.

20:27.750 --> 20:30.780

So let's check our logs and see what happened.

20:31.560 --> 20:35.850

Let's look at the past 500 lines here and those that's lecture cluster.

20:35.850 --> 20:37.920

Don't log this time because we changed the cluster name.

20:40.220 --> 20:42.620

That's the relevant message here, master not discovered.

20:42.770 --> 20:46.880

This note is not previously joined a bootstrap cluster and cluster initial master
nodes is empty on

20:46.880 --> 20:49.280

this node so it's going to continue.

20:49.280 --> 20:53.030

Discovery on 120 7.0.1 9301 from the host providers.

20:53.660 --> 20:58.970

But yeah, that's basically telling us that we had a problem actually electing a
master because we didn't
20:58.970 --> 21:01.220
list any master nodes and it couldn't find any makes sense.

21:01.220 --> 21:01.460


21:02.750 --> 21:07.070

So these issues are going to be very similar when forming a new cluster and we can
simulate that in

21:07.070 --> 21:09.560

our environment with the cluster initial master node settings.

21:09.830 --> 21:12.290

So again, let's make sure there's no previous data on our node.

21:12.680 --> 21:13.970

We'll go ahead and blow away that.

21:14.360 --> 21:15.980

Let's stop the service before we forget, huh?

21:16.610 --> 21:18.590

So back to this other site here.

21:20.700 --> 21:21.540

Stop service.

21:22.560 --> 21:23.670

Now we're going to blow away.

21:24.150 --> 21:25.530

Var Lib Elasticsearch again.

21:26.800 --> 21:27.370

Like so.

21:29.690 --> 21:30.080

All right.

21:30.080 --> 21:32.720

And now we can edit our way and I'll file again.

21:37.490 --> 21:41.240

And now we're going to go back to make sure our cluster name is still a lecture

21:41.240 --> 21:43.010

And now we're gong to set our initial master nodes.

21:45.560 --> 21:47.660

So it was complaining before that we had an empty list there.

21:47.660 --> 21:54.800

So let's give it a a list and I'll go to insert mode and now we can edit this line,
uncomment it and

21:54.800 --> 21:59.120

we'll set it to the list of Node one, note two and Node three.
22:04.580 --> 22:06.080
So let's go ahead and hit escape.

22:06.320 --> 22:11.720

Colin WQ Exclamation point two writing quit and we'll restart the note again.

22:13.340 --> 22:14.240

Start the service.

22:16.120 --> 22:17.460

And see what happens this time.

22:20.290 --> 22:20.610

All right.

22:20.620 --> 22:22.030

Looks like it went okay.

22:22.030 --> 22:23.800

But again, let's check and make sure.

22:23.830 --> 22:24.490

Let's hit the route.

22:24.490 --> 22:24.940

End Point.

22:27.750 --> 22:29.880

Still we have no cluster EOD.

22:29.910 --> 22:32.430

So we didn't actually join a cluster that failed.

22:32.460 --> 22:33.750

And if we do a health check again.

22:37.560 --> 22:39.330

We'll have to wait 30 seconds for that to time out.

22:41.460 --> 22:41.820

All right.

22:41.820 --> 22:42.630

Same deal, Master.

22:42.630 --> 22:43.320

Not discovered.

22:43.380 --> 22:45.690

Let's check the logs again to see what happened this time.

22:46.320 --> 22:48.540

So we'll just tail those last 500 lines again.

22:49.640 --> 22:54.200

And we're going to look for something about discovering master eligible notes to
22:58.080 --> 22:59.490
Probably should have crept for Warren, huh?

23:05.380 --> 23:06.160

This looks interesting.

23:07.480 --> 23:07.900

All right.

23:07.900 --> 23:09.640

Node one not discovered yet.

23:10.660 --> 23:12.370

This node must discover massive eligible nodes.

23:12.370 --> 23:14.660

Node one no to a node three to bootstrap a cluster.

23:14.680 --> 23:15.850

We only discovered node one.

23:16.630 --> 23:19.270

So, yeah, you can't just specify nodes that don't exist there.

23:20.260 --> 23:24.310

All right, so we have performed some experiments here, so we'll need to use your
imagination to complete

23:24.310 --> 23:24.880

the picture.

23:25.000 --> 23:29.080

Now, in a real production scenario, there are many reasons why this problem often

23:29.650 --> 23:34.030

Since we're dealing with a distributed system, many external factors such as
network communication

23:34.030 --> 23:36.940

come to play and may cause the notes to be unable to reach each other.

23:37.000 --> 23:40.640

So the problem might not just be that I listed a bunch of fictitious hosts there.

23:40.780 --> 23:44.860

It might be that those are valid hosts, but they can't be reached for some reason
to resolve these

23:44.860 --> 23:45.280


23:45.370 --> 23:46.870

You need to triple check all your settings.

23:47.230 --> 23:49.420

So again, let's go back into them.

23:50.440 --> 23:54.190

We need to make sure the cluster name, all the notes are joining or forming the
right cluster.

23:54.910 --> 24:00.520

The no name and a miss type in the no names can cause invalidity for the master
elections and the seed

24:00.520 --> 24:04.120

hostnames, APIs and supports down here somewhere.

24:05.970 --> 24:10.570

Got to make sure those all have valid seed hosts linked and that the ports are
actually the configured

24:10.570 --> 24:10.960


24:11.710 --> 24:14.680

We need to check connectivity between the nodes and the firewall settings.

24:14.800 --> 24:19.600

So use telnet or similar tools to inspect your network and make sure it's open for
communication between

24:19.600 --> 24:22.390

the nodes, the transport layer and the ports especially.

24:23.170 --> 24:24.250

Also check SSL.

24:24.250 --> 24:29.080

Intel's communication encryption is a vast topic and we're not going to touch that
here, but it's a

24:29.080 --> 24:33.790

usual source of troubles invalid certificates and untrusted certificate,
authorities and things like

24:33.790 --> 24:34.090


24:34.840 --> 24:38.320

Also be aware that there are special requirements on the certs when encrypting No.

24:38.320 --> 24:39.370

Two, No communication.

24:40.900 --> 24:44.800

All right, the last thing we're going to explore is the relationship between the
shard allocation and

24:44.800 --> 24:47.440

cluster state as these two things are tightly related.

24:48.010 --> 24:52.240

But first, we need to change the Elasticsearch y email configuration to let our
notes successfully

24:52.240 --> 24:53.470

form a single node cluster.

24:53.950 --> 24:59.770

So back in our configuration file here, let's just set the initial master as the
node itself and start

24:59.770 --> 25:00.340

the service.

25:01.580 --> 25:03.170

So to take out No to a No.

25:03.170 --> 25:05.300

Three and just hit I had to go to insert mode.

25:06.410 --> 25:07.910

Forgot I was in vim there for a second.

25:09.420 --> 25:09.890


25:10.140 --> 25:12.060

Colin WQ exclamation point.

25:12.270 --> 25:12.980

We wrote that out.

25:12.990 --> 25:15.600

So now let's restart our service yet again.

25:17.540 --> 25:17.990

Stop it.

25:19.590 --> 25:20.040


25:22.680 --> 25:23.190

All right.

25:23.430 --> 25:25.710

And again, we'll carry the cluster health API.

25:25.810 --> 25:26.610

Let's see what happened.

25:30.150 --> 25:32.590

So we can see the cluster status is, in fact, green.

25:32.640 --> 25:33.210

That's good.

25:33.930 --> 25:35.520

So what does cluster status mean?

25:35.730 --> 25:39.180

Well, it actually reflects the worst state of any of the indices that we have in
our cluster.

25:39.900 --> 25:41.220

The different options include red.

25:41.730 --> 25:44.850

That means one or more shards of the index is not assigned in the cluster.

25:45.360 --> 25:49.920

This can be caused by various issues at the cluster level, like disjoint nodes or
problems with disks

25:49.920 --> 25:50.670

and things like that.

25:51.450 --> 25:56.460

Generally, the red status marks very serious issues, so be prepared for some
potential data loss.

25:57.150 --> 25:58.200

It could also be yellow.

25:58.230 --> 26:00.750

In that case, the primary data are not yet impacted.

26:01.080 --> 26:04.500

All the primary shards are okay, but some replica shards are not assigned.

26:05.130 --> 26:09.540

Like, for example, replicas won't be allocated on the same node as the primary
shard by design.

26:10.290 --> 26:15.420

This status marks a risk of losing data and green means all shards are well

26:15.840 --> 26:20.160

However, it doesn't mean that the data is safely replicated as a single node
cluster, since with a

26:20.160 --> 26:22.560

single shard index it would be green as well.

26:23.280 --> 26:26.670

So now let's create an index with one primary shard and one replica.

26:27.510 --> 26:30.210

We'll do that with curl request.

26:30.450 --> 26:42.000

Put local host 9200 slash test what's called the index test backslash slash data
raw curly with the

26:42.000 --> 26:42.900

following settings.

26:43.620 --> 26:55.590

Curly bracket number of groups of shards will be set to one and the number of
replicas will be set to

26:55.590 --> 26:56.220

one as well.
26:56.820 --> 26:57.630
Close everything out.

26:58.350 --> 27:03.570

All right, so suddenly our cluster will turn yellow because our worst performing
index, the only one

27:03.570 --> 27:05.040

we have, is also yellow.

27:05.310 --> 27:06.900

Let's check our health again.

27:08.370 --> 27:08.630


27:08.640 --> 27:09.270

Now we're yellow.

27:10.350 --> 27:15.510

Now you can also check the shards assignment with the Cat Shards API and see what's
going on there.

27:15.540 --> 27:20.550

So let's say curl localhost 9200 slash underscore cat slash shards.

27:20.910 --> 27:23.620

Question mark v aha.

27:24.660 --> 27:27.180

So we can see that we have unassigned shards here.

27:28.470 --> 27:32.190

Or if you want a more descriptive information, you can use the cluster allocation.

27:32.190 --> 27:36.750

Explain API, which provides an explanation as to why the individual shards were not

27:36.960 --> 27:43.680

To do that will say Karl local host 9200 cluster allocation.

27:44.660 --> 27:46.260

Explain pretty.

27:49.390 --> 27:52.810

And that tells you very explicitly what's going on in our case, as I mentioned

27:53.140 --> 27:57.670

The reason is due to the allocation of the data replica to the same node being
disallowed, since it

27:57.670 --> 28:01.090

makes no sense from a resiliency perspective, you wouldn't have a replica on the
same node.

28:01.090 --> 28:01.930

That's that's silly.

28:02.740 --> 28:03.790

So how would you resolve this?

28:03.820 --> 28:04.930

Well, we have two options.

28:05.350 --> 28:08.620

One would be to remove the replica shard, which is not a real solution.

28:08.620 --> 28:10.870

But if you need the actual status, it will work out.

28:11.440 --> 28:14.530

Or you could add another node on which the shards could be reallocated.

28:14.830 --> 28:16.210

So let's take that second route.

28:18.120 --> 28:23.280

So to simulate the following failures, I actually have two different nodes running
on the same host

28:23.280 --> 28:27.510

here, and setting that up is kind of involved and we're going to do that later in
the course as we

28:27.510 --> 28:28.410

go into failover.

28:28.830 --> 28:32.490

So for now, I just want you to watch and not actually try to follow on yourself.

28:32.670 --> 28:34.350

So I've already done some of the grunt work here.

28:34.650 --> 28:39.750

Basically, you need to set up a separate system to a unit file for the second node
and a server configuration

28:39.750 --> 28:40.800

and stuff like that.

28:40.800 --> 28:43.480

So just watch from this point on.

28:43.500 --> 28:43.800


28:44.640 --> 28:48.570

So anyway, let's start by reviewing the main configuration file of that second note
that I've already

28:48.570 --> 28:52.560

set up and will ensure that it will join the same cluster with our existing nodes.

28:52.560 --> 28:59.130

So let's say sudo nano etsi Elasticsearch dash node two is where I put that.
29:05.690 --> 29:07.460
All right, so we have the same cluster name.

29:08.030 --> 29:10.130

We're calling our node here Node two.

29:10.790 --> 29:14.540

And we can see that our seed hosts is set to a loopback address, hopefully.

29:17.870 --> 29:18.090


29:18.470 --> 29:21.950

And we can see that our master knows consists of node one and node two.

29:23.110 --> 29:29.770

Let's go ahead and exit out of here and start that second node sudo system control
start Elasticsearch

29:30.460 --> 29:32.940

that's node two dot service.

29:35.310 --> 29:35.640


29:35.640 --> 29:38.790

So at this point I have started up a second node on the same VM.

29:38.790 --> 29:42.960

Again, there's quite a bit of configuration behind making that happen, so just
watch for this part

29:42.960 --> 29:43.200

of it.

29:43.860 --> 29:47.220

So now that we have a second node spun up, we should be back in a green status.

29:47.220 --> 29:48.120

So let's check.

29:48.240 --> 29:50.010

Let's say kernel dashed silence.

29:50.970 --> 29:51.780

Local host.

29:53.260 --> 29:56.080

200 slash underscore cluster slash.

29:56.080 --> 29:56.500


29:57.190 --> 29:57.580


29:59.190 --> 30:00.780

And we'll just grep for the status line.
30:02.380 --> 30:03.610
And our status is green.

30:03.640 --> 30:04.000


30:04.840 --> 30:08.740

Okay, so we've resolved her issue and the replica shards were automatically

30:08.740 --> 30:09.160


30:10.120 --> 30:11.590

So let's continue with this example.

30:11.590 --> 30:16.660

To simulate the red cluster state, let's start by removing the index and creating
it again, but this

30:16.660 --> 30:19.390

time with only two primary shards and no replica.

30:19.390 --> 30:21.610

And we'll quickly see why this is a bad idea.

30:21.970 --> 30:31.200

So first of all, delete the one that we have with Curl Bash Dash request delete
local host 9200 slash

30:31.210 --> 30:40.750

test and I will recreate it with curl request put localhost 9200 slash test

30:42.720 --> 30:52.380

And dash, dash data, dash raw, quick curly settings will set the number of shards.

30:54.510 --> 31:03.660

Two, one, two, two, rather four because we have two notes to work with and number
of replicas to

31:03.660 --> 31:04.140


31:05.160 --> 31:06.750

So this seems like a pretty bad idea.

31:06.780 --> 31:09.900

You know, we have our shard split across two nodes, but no backups anywhere.

31:11.720 --> 31:13.100

All right, but so far, so good.

31:13.130 --> 31:14.840

You know, it's at least storing it.

31:15.170 --> 31:17.750

Let's check the shards of salmon to see what's actually going on here.
31:17.960 --> 31:24.980
Carol, local host, 9000 underscore cat slash shards, verbose.

31:26.420 --> 31:26.750


31:27.110 --> 31:31.280

So we can see that each primary shard is on a different node, which follows the
standard allocation

31:31.280 --> 31:33.860

rules set at the cluster level and at the index level.

31:34.340 --> 31:35.930

And you likely know where we're heading.

31:36.650 --> 31:41.120

So imagine the situation where some network issue emerges and your cluster splits
up, resulting in

31:41.120 --> 31:46.070

disabled node communication, or even worse, some disk malfunctions leading to the
improper functioning

31:46.070 --> 31:46.550

of a node.

31:47.240 --> 31:50.090

Now, the easiest way to simulate this is to just stop one of our nodes.

31:50.360 --> 31:51.680

So let's go ahead and kill No.

31:51.680 --> 31:56.480

Two with a pseudo slash spin slash system control.

31:57.590 --> 31:59.750

Stop Elasticsearch Dash No.

31:59.750 --> 32:00.980

Two dot service.

32:03.020 --> 32:04.010

And down it goes.

32:04.430 --> 32:07.230

So now if we check our status again to do.

32:10.570 --> 32:11.860

We are now in red status.

32:12.700 --> 32:13.390

That's a bad thing.

32:13.840 --> 32:17.110

So now let's check the explain API to learn more about what's going on.

32:17.230 --> 32:19.270

Curl local host.

32:19.990 --> 32:24.040

A200 slash underscore cluster slash allocation.

32:25.640 --> 32:27.080

Explain pretty.

32:29.940 --> 32:30.500

All right.

32:30.510 --> 32:35.220

So we cannot allocate it because a previous copy of the primary chart existed but
can no longer be found

32:35.220 --> 32:36.210

on the nodes in the cluster.

32:36.390 --> 32:37.630

Well, that tells you what's going on.

32:37.650 --> 32:38.610

It's pretty well described.

32:39.030 --> 32:40.680

A node left as we have turned it off.

32:41.040 --> 32:46.320

But in the real world that has various potential causes and no valid shard copy can
be found in the

32:46.320 --> 32:48.780

cluster, in which case we're missing data.

32:49.440 --> 32:54.060

Unfortunately, there's no easy solution to this scenario, as we do not have any
replicas and there's

32:54.060 --> 32:55.500

no way we could remake our data.

32:56.520 --> 33:00.450

So firstly, if you are dealing with some network problems, try to thoroughly
inspect what could go

33:00.450 --> 33:06.210

wrong like a misconfiguration of firewalls and inspect it as a priority, since data
cannot consistently

33:06.210 --> 33:07.380

be indexed in this state.

33:08.280 --> 33:12.690

Now, depending on the document routing, many indexing requests can be pointed
toward the missing shard

33:12.690 --> 33:13.770

and end up timing out.
33:14.460 --> 33:17.050
For example, this to try to insert a document and see what happens.

33:17.070 --> 33:18.200

Curl request.

33:19.560 --> 33:29.190

Post local host 200 slash test underscore doc data raw and we'll just say a

33:31.360 --> 33:31.930

It's data.

33:35.720 --> 33:37.310

And this should lead to an exception.

33:39.600 --> 33:42.780

And after about 30 seconds or so, it finally timed out on me.

33:43.560 --> 33:48.480

Now, secondly, if no possible solution was found, the only option left to get the
index to work properly

33:48.480 --> 33:49.920

may be to allocate a new shard.

33:50.400 --> 33:54.990

But be aware that even if the lost node will come back afterwards, the new shard
will just overwrite

33:54.990 --> 33:56.700

it because it is in a newer state.

33:57.480 --> 34:00.720

Now we can allocate a new shard with the cluster reroute API.

34:00.840 --> 34:05.700

So here we will allocate one for the test index on the node dash one that operates

34:06.210 --> 34:08.580

Note that we have to explicitly accept data loss.

34:08.700 --> 34:16.770

So curl request post local host 9200 slash underscore cluster slash reroute.

34:18.210 --> 34:19.620

And we want pretty results.

34:20.770 --> 34:22.990

Backslash did all.

34:24.330 --> 34:26.580

Quick curly commands.

34:27.850 --> 34:29.170

Actually, there's going to be a square bracket.

34:29.170 --> 34:29.830

We have a list of them.

34:31.610 --> 34:32.840

And curly brackets.

34:35.370 --> 34:38.400

Allocate empty primary.

34:41.990 --> 34:43.040

Index test.

34:44.650 --> 34:45.190


34:45.820 --> 34:52.990

This one node is node one and except data loss.

34:54.950 --> 34:55.610

We'll be true.

34:58.230 --> 34:59.340

Cause everything out.

35:01.240 --> 35:01.930

I think that's right.

35:05.310 --> 35:05.730

All right.

35:05.940 --> 35:08.880

And afterwards, we should no longer experience timeouts during indexing.

35:10.170 --> 35:11.130

All right, so we're done.

35:11.130 --> 35:15.570

But we just need to restore everything from our backup now, because we did do some
pretty invasive

35:15.570 --> 35:16.570

stuff to our index here.

35:16.590 --> 35:20.580

So we're back at the point where you should be following along if you were
following along earlier.

35:20.610 --> 35:20.880


35:20.880 --> 35:24.780

We need to restore from that back up and make sure we're not left with any
lingering issues that we

35:24.780 --> 35:25.590

might have introduced.

35:25.800 --> 35:29.520

So we're going to restore all of our original indices that we backed up earlier.
35:29.940 --> 35:31.860
Before we can do that, we need to do some cleaning up.

35:32.490 --> 35:36.390

So first, we need to make sure that the repository path is registered again in the
Elasticsearch dot.

35:36.990 --> 35:39.120

As we've done some changes to it during the exercise.

35:39.750 --> 35:43.380

So let's go ahead and reference our stored config file that we squirreled away at
the start of the lesson,

35:44.040 --> 35:45.090

and we'll put that back.

35:47.010 --> 35:48.960

So let's see, go back to our home directory.

35:48.960 --> 35:49.860

I think that's where we put it.

35:50.760 --> 35:50.970


35:50.970 --> 35:52.320

There's Elasticsearch that Lyonel.

35:52.320 --> 35:54.180

So let's go ahead and move that back into position.

35:54.270 --> 35:55.530

Sudo move Elasticsearch.

35:55.530 --> 35:58.590

So I am going to see Elasticsearch.

36:01.080 --> 36:03.480

All right, let's double check that it's there and looks correct.

36:07.440 --> 36:09.860

Well, I'm the wrong user sudo.

36:09.900 --> 36:13.320

Listen to me and we'll just go ahead and edit it directly at c.

36:13.830 --> 36:14.700


36:15.180 --> 36:18.210

Elasticsearch why?

36:18.240 --> 36:18.510


36:19.800 --> 36:20.790

Make sure things look normal.
36:21.480 --> 36:21.690
All right.

36:21.690 --> 36:22.920

Things are back to how we started.

36:23.190 --> 36:24.300

We have node one.

36:25.400 --> 36:28.370

We still have the path to repo set to home student backups.

36:28.370 --> 36:32.930

That's very important so we can restore that backup memory lock is commented out

36:33.860 --> 36:36.800

Everything looks like it's back to default settings, so that's good.

36:37.730 --> 36:38.020

All right.

36:38.030 --> 36:38.420

Looks good.

36:38.420 --> 36:38.870

Looks good.

36:39.260 --> 36:43.430

Now, we do need to make sure that Elasticsearch has permission to read that
configuration file we just

36:43.430 --> 36:44.150

restored first.

36:44.150 --> 36:51.590

So let's go to the SC folder and do a pseudo Alice start.

36:51.610 --> 36:53.960

L-A Elasticsearch.

36:55.640 --> 36:56.310

See we have.

36:56.820 --> 36:57.070


36:57.090 --> 36:58.980

So we can see that it's owned by the root group.

36:59.010 --> 36:59.820

We need to change that.

36:59.970 --> 37:01.910

So, sudo change group.

37:02.250 --> 37:03.150

37:04.650 --> 37:05.780

37:09.670 --> 37:10.810


37:12.400 --> 37:13.210

But, Lionel.

37:15.130 --> 37:15.970

Check that again.

37:16.480 --> 37:17.290

All right, that looks better.

37:17.290 --> 37:19.510

So now we should be able to restart our main node.

37:26.660 --> 37:27.200

Like so.

37:31.050 --> 37:31.320


37:32.250 --> 37:36.390

So now we can reregister our repository again to make sure it's ready to provide
the backup data.

37:36.780 --> 37:47.130

Curl request put will host and a few hundred slash underscore snapshot slash backup
dash repo backslash

37:47.790 --> 37:48.420

data all.

37:51.080 --> 37:52.640

Type is filesystem system.

37:54.150 --> 37:54.870


37:58.170 --> 37:58.830


38:00.120 --> 38:02.640

Home Student Backups.

38:03.060 --> 38:03.780

Backup dash.

38:03.780 --> 38:04.170


38:07.900 --> 38:08.350

All right.

38:09.010 --> 38:13.390

And we can check the available snapshots in the repository with a simple cash
request to our backup
38:13.390 --> 38:13.780

38:13.780 --> 38:16.630

And we should see our snapshot one waiting to be restored.

38:17.260 --> 38:18.310

Curl Local Host.

38:18.340 --> 38:21.370

Note both underscore cat snapshots.

38:22.600 --> 38:23.320

Flash Backup.

38:23.320 --> 38:23.920

Dash Repo.

38:26.060 --> 38:26.360

All right.

38:27.290 --> 38:28.190

It's a success.

38:28.580 --> 38:32.600

Now, to prevent any rights during the restore process, we need to make sure that
all of our indices

38:32.600 --> 38:33.170

are closed.

38:33.380 --> 38:38.300

So, however, as of Elasticsearch eight, they've made it not that easy.

38:38.330 --> 38:43.040

You actually need to disable a safety feature that prevents you from closing all of
your indices at

38:43.040 --> 38:43.730

once together.

38:44.240 --> 38:48.050

So let's fire up our editor and edit our Elasticsearch Dynamo file.

38:48.440 --> 38:54.380

And we're looking for the setting action dot destructive underscore requires,
underscore name, uncomment

38:54.380 --> 38:56.420

that and let it be set to false.

39:04.890 --> 39:10.320

And after that, we're going to stop and restart the service again to pick that
change up.

39:25.200 --> 39:34.050

Curl request post will host a 200 slash underscore all underscore close.
39:36.330 --> 39:39.240
And finally we can restore a backup with Colonel request.

39:40.050 --> 39:43.710

Post localhost 9000 snapshot.

39:45.240 --> 39:45.780


39:46.290 --> 39:46.620


39:46.620 --> 39:48.930

Repo slash snapshot.

39:48.960 --> 39:49.650

Dash one.

39:51.860 --> 39:53.390

Slash underscore restore.

39:56.540 --> 39:58.430

And it took it slow after a few seconds.

39:58.430 --> 40:01.490

If we check our indices, we should see all the original data back in place.

40:02.210 --> 40:05.930

Curl Local Host 9200 Slasher and Score Cat Slash Indices.

40:08.510 --> 40:10.880

And there's our original Shakespeare index, for example.

40:10.880 --> 40:11.930

So, yeah.

40:12.080 --> 40:12.950

Things have been restored.

40:13.460 --> 40:13.760


40:13.760 --> 40:18.260

So now that you're armed with foundational knowledge and various commands on
troubleshooting your Elasticsearch

40:18.260 --> 40:22.940

cluster, the last piece of advice is to stay positive even when things are not
working out.

40:23.480 --> 40:26.690

It's part of and parcel to being an Elasticsearch engineer.

You might also like