106 - Troubleshooting Common Issues English

WEBVTT
00:07.030 --> 00:12.280

Elasticsearch is a complex piece of software by itself, but complexity is further
increased when you
00:12.280 --> 00:14.410

spin up multiple instances to form a cluster.
00:15.280 --> 00:18.040

This complexity comes with the risk of things going wrong.
00:18.730 --> 00:22.660

In this lecture, we're going to explore some common issues that you're likely to
encounter on your
00:22.660 --> 00:23.770

Elasticsearch journey.
00:24.550 --> 00:27.550

There are plenty more potential issues that we can squeeze into this lesson.
00:27.550 --> 00:33.820

So let's focus on the most prevalent ones, mainly related to a node setup, a
cluster formation and
00:33.820 --> 00:34.690

the cluster state.
00:38.440 --> 00:44.290

The potential Elasticsearch issues can be categorized according to the
Elasticsearch lifecycle node
00:44.290 --> 00:44.740

setup.
00:45.790 --> 00:49.060

Potential issues include the installation and initial startup.
00:49.570 --> 00:53.710

The issues can differ significantly depending on how you run your cluster, like
whether it's a local
00:53.710 --> 00:56.920

installation running on containers or a cloud service, etc..
00:57.880 --> 01:02.860

In this lesson, we'll follow the process of a local setup and focus specifically on
bootstrap checks,
01:02.860 --> 01:04.960

which are very important when starting a node up.
01:06.130 --> 01:07.750

Discovery and cluster formation.
01:08.650 --> 01:13.060

This category covers issues related to the discovery process when the nodes need to
communicate with
01:13.060 --> 01:15.220
each other to establish a cluster relationship.
01:16.000 --> 01:21.010

This may involve problems during the initial bootstrapping of the cluster nodes,
not joining the cluster
01:21.010 --> 01:22.750

and problems with master elections.
01:24.090 --> 01:25.650

Indexing data and sharding.
01:26.400 --> 01:29.040

This includes issues related to index settings and mapping.
01:29.220 --> 01:33.540

But as this is covered in other lectures, we'll just touch upon how sharding issues
are reflected in
01:33.540 --> 01:34.350

the cluster state.
01:35.580 --> 01:41.070

Searching search being the ultimate step of the set up journey can raise issues
related to queries that
01:41.070 --> 01:44.460

return less relevant results or issues related to search performance.
01:44.910 --> 01:47.160

This topic is covered in another lecture in this course.
01:50.550 --> 01:54.990

Now that we have some initial background of potential issues with Elasticsearch,
let's go one by one.
01:54.990 --> 01:59.550

Using a practical approach will expose the pitfalls and show how to overcome them.
02:01.180 --> 02:06.280

So before we start messing up our cluster to simulate real world issues, let's back
up our existing
02:06.280 --> 02:06.850

indices.
02:07.210 --> 02:08.320

This will have two benefits.
02:08.470 --> 02:12.580

After we're done, we can get back to where we ended up and just continue on in the
course and we'll
02:12.580 --> 02:16.510

better understand the importance of backing up to prevent data loss while
troubleshooting.
02:17.020 --> 02:18.700
First, we need to set up our repository.
02:18.850 --> 02:23.350

So let's open up our Elasticsearch YAML file using your favorite editor.
02:23.800 --> 02:24.270

I like that.
02:24.280 --> 02:25.780

No, let's see.
02:26.080 --> 02:27.040

Elasticsearch.
02:28.150 --> 02:29.260

Elasticsearch.
02:29.710 --> 02:30.310

Why yaml?
02:35.870 --> 02:39.290

And we want to make sure we have a registered repository path on our machine.
02:39.300 --> 02:41.270

So we're looking for the path dot repo.
02:42.650 --> 02:43.630

Don't think there's one in here.
02:45.990 --> 02:47.190

Let's go ahead and add one then.
02:52.510 --> 02:53.440

Path got repo.
02:55.430 --> 02:56.090

Square bracket.
02:56.420 --> 03:00.590

Home student backups that should do the job.
03:01.550 --> 03:01.910

All right.
03:02.210 --> 03:04.310

Control o, enter control x.
03:04.430 --> 03:05.180

So that saved.
03:06.290 --> 03:10.340

And we might want to save a copy of this config file now as well so we can get back
to it at the end
03:10.340 --> 03:10.880

of the lesson.
03:10.880 --> 03:20.090

So let's make a copy, we'll say pseudo c.p, that's the Elasticsearch Elasticsearch
dot y html and
03:20.090 --> 03:25.460

let's just copy that into our home directory and that way we can just copy that
back when we're done
03:25.460 --> 03:27.440

if we need to restore any of those settings later on.
03:28.370 --> 03:28.840

Okay.
03:28.850 --> 03:33.020

So we need to make sure that the directory exists that we're going to be storing
that repository into
03:33.050 --> 03:34.780

and that Elasticsearch can write into it.
03:34.790 --> 03:39.320

So let's say maker dash p home student backups.
03:42.110 --> 03:48.020

And will change the group on that to elasticsearch like so sudo changed group
Elasticsearch.
03:48.770 --> 03:49.970

Home Student Backups.
03:52.940 --> 03:54.620

And finally make it rideable.
03:55.040 --> 03:59.420

Pseudo change mod g plus w home student backups.
04:02.580 --> 04:06.170

And we need to restart Elasticsearch to pick up that configuration change we made.
04:06.180 --> 04:13.530

So let's say sudo in system control stop Elasticsearch start service.
04:19.250 --> 04:20.330

And we'll restart it.
04:24.650 --> 04:24.980

Okay.
04:24.980 --> 04:30.590

So now we can register the new repository to Elasticsearch at the path we
configured with Curl request.
04:31.110 --> 04:35.870

Put local local host 1900.
04:36.440 --> 04:40.940

Underscore snapshot slash backup dash repo backslash.
04:43.240 --> 04:43.670

Data.
04:43.750 --> 04:44.290

Raw.
04:45.470 --> 04:59.240

Quote, Curly type will be filesystem and the settings will have a location of home
students backups,
05:00.140 --> 05:05.570

backup dash repo, close everything out and looks like a took.
05:06.080 --> 05:10.430

Now we can initiate the snapshot process to do the backup with kernel request.
05:10.700 --> 05:14.540

Put localhost 200 underscore snapshot.
05:17.080 --> 05:21.780

Back up dash repo and we'll call it snapshot dash one.
05:24.440 --> 05:25.460

So it looks like that worked.
05:25.610 --> 05:29.910

We can check the status of that with a simple get request with kernel request.
05:30.620 --> 05:33.980

Get local host native snapshot.
05:35.060 --> 05:35.900

Back up repo.
05:37.190 --> 05:38.030

Snapshot one.
05:39.800 --> 05:40.670

And we'll make a pretty.
05:43.640 --> 05:44.420

Looks like it worked.
05:44.510 --> 05:48.830

It says the state was to success.
05:49.070 --> 05:49.580

All right, cool.
05:50.090 --> 05:50.600

Very good.
05:50.630 --> 05:54.230

Now that we have our data backed up, we can now proceed to nuke our cluster.
05:54.770 --> 05:55.730

So let's get started.
05:56.360 --> 05:58.130

Well, let's recap on the basics about logs.
05:58.400 --> 06:00.710
So we'll start by looking at the Elasticsearch logs.
06:01.160 --> 06:04.470

Their location will depend on the path that logs setting in your Elasticsearch.
06:04.670 --> 06:05.240

Why yaml?
06:05.540 --> 06:10.640

By default they are found in var log Elasticsearch slash whatever your cluster name
is, start log.
06:11.390 --> 06:15.680

So basic tailing commands come in handy to monitor the logs in real time.
06:15.680 --> 06:17.870

And so say want to keep an eye on these logs off to the side?
06:18.230 --> 06:23.030

I'm actually going to start a different terminal window here, so let's go ahead and
start a new Telnet
06:23.030 --> 06:23.690

client here.
06:31.940 --> 06:34.470

It would help if I typed in my password correctly.
06:34.490 --> 06:34.940

There we go.
06:36.320 --> 06:36.710

All right.
06:36.950 --> 06:38.870

And let's see where those logs live.
06:39.140 --> 06:41.690

Those are going to be in var log Elasticsearch.
06:44.250 --> 06:47.160

So our account has insufficient rates to actually read these logs.
06:47.250 --> 06:49.260

Now there are various options to solve this.
06:49.410 --> 06:54.450

For example, a valid group assignment of your Linux user or one generally simpler
approach is to provide
06:54.450 --> 06:57.570

the user sudo permission to run Shell as the Elasticsearch user.
06:58.200 --> 07:02.130

We can do this by editing the pseudo file using the pseudo under route.
07:02.280 --> 07:04.290

So let's just say pseudo vs pseudo.
07:11.710 --> 07:13.420

And we will add the following line.
07:16.410 --> 07:17.460

The center to the bottom here.
07:19.170 --> 07:21.060

How about username?
07:21.840 --> 07:24.030

All equals parentheses.
07:24.090 --> 07:26.270

Elasticsearch parentheses.
07:27.120 --> 07:28.500

No password.
07:29.640 --> 07:31.710

All that should do it.
07:31.890 --> 07:32.820

So control O.o.
07:33.330 --> 07:34.110

Control X.
07:35.530 --> 07:39.250

So after we've done that, we can run the following command to launch a new shell as
the Elasticsearch
07:39.250 --> 07:42.430

user sudo dash S2 Elasticsearch.
07:44.440 --> 07:44.810

Cool.
07:45.220 --> 07:46.360

So now we should have the permissions.
07:46.360 --> 07:48.970

We need to actually look at these logs, so let's try that again.
07:49.780 --> 07:50.740

CD bar.
07:50.770 --> 07:52.000

Log Elasticsearch.
07:53.080 --> 07:53.750

That's better.
07:54.250 --> 07:59.080

And now we can do things like tailed ash and want to look at the last 100 lines in
this log file.
07:59.560 --> 08:03.790

And our cluster name is actually Elasticsearch start log because we haven't changed
it.
08:04.420 --> 08:05.110

And there you have it.
08:05.770 --> 08:08.260

Or sometimes you just want to look for error messages, right?
08:08.260 --> 08:15.550

So for example, we could look at the last 500 log lines and pipe that into grep for
error and that
08:15.550 --> 08:16.670

would just show us any errors.
08:17.140 --> 08:20.170

Fortunately, we don't have any because our cluster is healthy, so that's cool.
08:20.680 --> 08:25.690

And sometimes it can also be useful to grab a few surrounding log lines with the
context parameter because
08:25.690 --> 08:28.990

the messages and struct stack traces can be multi-line sometimes.
08:28.990 --> 08:34.870

So we could say, for example, cat Elasticsearch dot log grep bootstrap.
08:36.260 --> 08:36.560

Dash.
08:36.560 --> 08:40.610

Dash context equals three to get the three surrounding lines for each hit there.
08:41.390 --> 08:45.530

So for example, here we have a bootstrap hit and the three lines before and after
it as well.
08:46.040 --> 08:48.350

So those are some useful tricks for looking at the logs.
08:48.710 --> 08:49.220

All right.
08:49.220 --> 08:51.440

So let's start talking about bootstrap checks.
08:51.440 --> 08:53.150

We'll go back to our primary terminal here.
08:53.900 --> 08:58.580

Bootstrap checks are pre-flight validations performed during a node start, which
ensure that your node
08:58.580 --> 09:00.320

can reasonably perform its functions.
09:00.830 --> 09:03.950

There are two modes which determine the execution of bootstrap checks.
09:04.580 --> 09:10.190

Development mode is when you bind your node only to a loopback address localhost or
with an explicit
09:10.190 --> 09:12.860

discovery type of single dash node.
09:13.400 --> 09:18.050

No bootstrap checks are performed in development mode, and then in production mode
is when you bind
09:18.050 --> 09:24.440

your note to a non loopback address like 0.0.0.0, thus making it reachable by other
nodes.
09:24.800 --> 09:26.960

This is the mode where bootstrap checks are executed.
09:27.620 --> 09:31.520

Let's see them in action because when the checks don't pass, it can become tedious
work to find out
09:31.520 --> 09:32.300

what's going on.
09:33.680 --> 09:38.450

So one of the first system settings recommended by elastic is to disable heap
swapping.
09:39.110 --> 09:43.670

This makes sense because Elasticsearch is highly memory intensive and you don't
want to load your memory
09:43.670 --> 09:44.510

data from disk.
09:45.110 --> 09:46.370

There are two options for this.
09:46.670 --> 09:49.940

One is to remove swap files entirely or minimize sloppiness.
09:50.540 --> 09:54.080

This is the preferred option but requires considerable intervention as the root
user.
09:54.650 --> 09:59.150

Or we can add the bootstrapped on memory lock parameter in the last search dot,
y'know.
09:59.570 --> 10:01.080

So let's try that second option.
10:01.160 --> 10:09.470

Let's go ahead and open our main configuration file with pseudo nano and see
Elasticsearch.
10:09.650 --> 10:11.930
Elasticsearch dot y IMO.
10:13.780 --> 10:16.330

And we'll go ahead and find the bootstrap drop memory lock setting.
10:19.490 --> 10:21.740

And uncomment that to allow it to be true.
10:23.840 --> 10:26.960

Write that out and quit and let's go ahead and restart our service.
10:27.200 --> 10:30.830

So studio system control stop Elasticsearch start service.
10:32.880 --> 10:33.930

And let's restart it.
10:37.760 --> 10:42.200

And after a short wait, we should see some indication of what's happening.
10:44.650 --> 10:45.070

All right.
10:45.070 --> 10:47.800

So, yeah, we actually got an error as a result of doing that.
10:47.800 --> 10:50.110

So let's check our logs and find out what happened.
10:50.650 --> 10:54.010

So let's go spelunking through here and see what went wrong.
10:54.580 --> 10:57.970

Just got to hit the up arrow here to do a fresh tail of my log.
11:00.280 --> 11:01.360

And there we have it.
11:01.360 --> 11:06.100

So there's our error and it says bootstrap checks failed memory locking requested
for Elasticsearch
11:06.100 --> 11:06.520

process.
11:06.520 --> 11:07.930

But memory is not locked.
11:08.560 --> 11:10.160

But didn't we just lock it before?
11:10.840 --> 11:11.650

Well, not really.
11:11.650 --> 11:15.370

We just requested the lock, but it didn't actually get locked, so we hit the memory
lock.
11:15.370 --> 11:16.360
Bootstrap check here.
11:17.140 --> 11:21.490

Now, the easy way to fix this in our case is to allow locking and overwrite into
our system to a unit
11:21.490 --> 11:22.480

file like this.
11:22.810 --> 11:24.580

So let's go back to our other window here.
11:25.270 --> 11:27.430

Sudo system control.
11:28.030 --> 11:30.670

Edit Elasticsearch thought service.
11:32.980 --> 11:36.880

And we're going to put in the following config parameter here service.
11:39.640 --> 11:43.120

Limit mem lock equals infinity.
11:47.640 --> 11:48.120

All right.
11:48.270 --> 11:49.980

And let's try spinning that up again.
11:57.460 --> 11:59.170

And this time it should be okay.
12:03.170 --> 12:03.500

All right.
12:03.500 --> 12:04.400

Looks like success.
12:05.930 --> 12:06.380

Okay.
12:06.650 --> 12:08.390

So let's talk about heap settings next.
12:08.780 --> 12:13.310

Now, if you start playing with the JVM settings in the JVM dot options file, which
you will likely
12:13.310 --> 12:17.450

need to do because by default these settings are set to low for actual production
usage.
12:17.960 --> 12:20.480

You may face a similar problem as as we just did.
12:21.320 --> 12:21.980

So how is that?
12:22.310 --> 12:27.170
Well, by setting the initial heap size lower than the max size, which is actually
quite usual in the
12:27.170 --> 12:27.890

world of Java.
12:28.490 --> 12:32.240

Let's open up that option file and lower the initial heap size to see what's going
to happen.
12:32.780 --> 12:38.540

So sudo nano etsi elasticsearch JVM dot options.
12:38.840 --> 12:40.430

VM dot options.
12:43.740 --> 12:46.980

And let's go ahead and change these memory settings here.
12:49.180 --> 12:50.500

Then a comment, not the original one.
12:50.500 --> 12:59.380

So I can go back to them later and we'll set some new ones X and S 500 megabytes
and slash x and x one
12:59.380 --> 12:59.890

gigabyte.
13:01.540 --> 13:01.810

All right.
13:01.810 --> 13:03.340

So we've lowered the initial heap size.
13:03.700 --> 13:05.110

Let's go ahead and save this setting.
13:07.470 --> 13:09.150

And we'll restart our service again.
13:11.920 --> 13:12.420

Stop it.
13:13.790 --> 13:14.630

And I'll start it.
13:17.170 --> 13:18.970

And we'll see what happens as it spins up.
13:23.220 --> 13:24.360

Well, looks like we had an error.
13:24.370 --> 13:26.800

So let's go back to our logs and see what's going on.
13:26.850 --> 13:30.750

So back to the other window and I'll hit the up arrow just to tail the last one
lines again.
13:32.440 --> 13:32.920

All right.
13:32.930 --> 13:33.760

Well, there we have it.
13:34.060 --> 13:34.960

Error bootstrap.
13:34.960 --> 13:35.560

No validation.
13:35.560 --> 13:37.060

Exception bootstrap checks failed.
13:37.060 --> 13:39.820

Initial heap size not equal to maximum heap size.
13:40.330 --> 13:42.910

So that's telling us pretty explicitly what the problem was there.
13:43.690 --> 13:47.800

Now, generally speaking, this problem is also related to memory logging, where they
need to increase
13:47.800 --> 13:51.400

the heap size during program operations may have undesired consequences.
13:52.120 --> 13:56.770

So remember to set those numbers to equal values and for the actual values, follow
the recommendations
13:56.770 --> 14:01.960

by elastic, which in short is lower than 32 gigabytes and up to half of the
available RAM memory.
14:02.080 --> 14:04.150

Let's go ahead and change those back before we forget.
14:08.670 --> 14:09.180

Yeah.
14:09.510 --> 14:15.270

So we'll set that back to one gig for both and I'll just use Control K to get rid
of those lines and
14:15.270 --> 14:18.600

control o to save and control x, so we should be back in option.
14:19.110 --> 14:20.970

Let's try starting it up again, just to be sure.
14:25.460 --> 14:25.790

All right.
14:25.790 --> 14:27.110

That time has started successfully.
14:28.190 --> 14:31.370

So let's talk about some other system checks you may want to perform when things go
wrong.
14:31.610 --> 14:36.290

There are many other bootstrap checks on the runtime platform and its settings,
including a file descriptors
14:36.290 --> 14:41.750

check a maximum number of threads, check a maximum size, virtual memory check and
many others.
14:42.410 --> 14:46.280

You should definitely browse through their descriptions in the docs, but as we're
running the official
14:46.280 --> 14:51.170

Debian distribution that comes with a predefined system D unit file, most of these
issues are resolved
14:51.170 --> 14:52.850

for us in the unit file, among others.
14:53.270 --> 14:56.630

We can check that unit file to see the individual parameters that get configured.
14:56.900 --> 14:58.760

Let's take a look at that unit file to see what's in it.
14:59.040 --> 15:02.630

We can say sudo cat user lib system.
15:02.630 --> 15:05.990

DX System, elasticsearch dot service.
15:07.900 --> 15:08.200

All right.
15:09.010 --> 15:12.250

So just take a look at the different things that you have at your disposal here.
15:12.400 --> 15:16.390

All sorts of things that could go wrong, but by default, they should be okay in our
installation.
15:18.160 --> 15:22.480

So just remember that if you run the Elasticsearch binary on your own, you will
need to take care of
15:22.480 --> 15:23.410

these settings as well.
15:24.730 --> 15:28.270

Now, the last check we'll run is the one that will carry us nicely to the next
section of the lesson
15:28.270 --> 15:29.270

dealing with clustering.
15:29.290 --> 15:33.790

But before we dive in, let's see what are the configuration parameters that
Elasticsearch checks during
15:33.790 --> 15:36.550

its startup with a discovery configuration check?
15:37.270 --> 15:41.740

There are three key parameters which govern the cluster formation and discovery
process.
15:41.980 --> 15:43.870

Let's pull up our wine ML file to take a look.
15:44.710 --> 15:51.080

Pseudo nano etsy elasticsearch elasticsearch dot y IMO.
15:52.690 --> 15:52.960

All right.
15:52.960 --> 15:57.550

So one is discovery dot seed hosts should be down here.
16:00.680 --> 16:00.930

Yep.
16:01.610 --> 16:06.200

Now, this is a list of ideally all the master eligible nodes in the cluster that we
want to join and
16:06.200 --> 16:07.820

draw the last cluster state from.
16:08.330 --> 16:12.850

Now there's also a discovery dot seed underscore provider setting that you could
set here as well,
16:12.860 --> 16:16.760

and that would allow you to provide the seed hosts lists in the form of a file that
gets reloaded on
16:16.760 --> 16:20.840

any change instead of specifying it within the configuration file itself.
16:21.500 --> 16:24.530

Also, let's look at the cluster dot initial master node setting here.
16:25.070 --> 16:29.540

This is a list of the node names, not hostnames for the very first master
elections.
16:30.230 --> 16:34.190

So before all of these join and vote, the cluster setup won't be completed.
16:35.330 --> 16:39.470

But what if you don't want to form any cluster, but rather just want to run in a
small single node
16:39.470 --> 16:39.830
setup?
16:40.070 --> 16:43.130

Well, you might think you could just eliminate these settings and the y small file.
16:44.150 --> 16:44.470

Right.
16:45.200 --> 16:46.040

But no, that won't work.
16:46.130 --> 16:50.180

After starting up, you would hit another bootstrap error, since at least one of
those parameters needs
16:50.180 --> 16:52.490

to be set to pass a bootstrap check.
16:52.970 --> 16:56.390

So we're going to go ahead and put those back because you can't actually get away
with that.
16:56.930 --> 17:00.680

So let's see why this is and dive deeper into troubleshooting the discovery
process.
17:01.310 --> 17:02.540

First, I'll exit out of here.
17:04.520 --> 17:06.560

And let's shut down our cluster before we forget.
17:12.900 --> 17:14.040

Just stop the service.
17:15.030 --> 17:15.420

All right.
17:16.470 --> 17:20.850

So after we've successfully passed the bootstrap checks and started up our node for
the first time,
17:20.850 --> 17:23.910

the next phase in its lifecycle is the discovery process.
17:24.540 --> 17:28.440

Now, to simulate the formation of a brand new cluster, we're going to need a clean
node.
17:28.710 --> 17:33.420

So we need to remove all the data of the node and thus lose all previous cluster
state information.
17:33.450 --> 17:35.580

That's why we backed everything up to a snapshot earlier.
17:36.120 --> 17:39.390

Now, remember, this is really just to experiment in a real production setup.
17:39.690 --> 17:42.090

There would be very few reasons to do this.
17:42.480 --> 17:45.870

I'm going to go to this other window here where I'm logged in and see Elasticsearch
user because I'm
17:46.290 --> 17:48.090

going to need its permissions to do this stuff.
17:48.750 --> 17:52.500

Armed RF var lib elasticsearch.
17:53.980 --> 17:54.750

Last star.
17:56.100 --> 17:56.430

All right.
17:56.430 --> 17:58.320

We blew away our entire node there.
17:58.950 --> 18:04.020

So now let's imagine a situation where we already had a cluster and we just want
the node to join in.
18:04.650 --> 18:10.110

So we need to make sure the cluster name is correct and linked to some seed host
either by IP or hostname
18:10.110 --> 18:10.530

and port.
18:11.400 --> 18:14.160

So let's go ahead and open up our y am file.
18:15.330 --> 18:17.520

We use vim because that's what's installed under this account.
18:18.030 --> 18:22.380

That's the Elasticsearch Elasticsearch dot waymo.
18:24.600 --> 18:26.910

All right, so we need to make sure that we have a cluster name.
18:28.860 --> 18:29.070

Hit.
18:29.070 --> 18:31.140

I'd go on to insert mode and now I can edit it.
18:32.740 --> 18:36.370

Will change my application to lecture cluster.
18:36.730 --> 18:37.720

It would help if I typed it right.
18:40.190 --> 18:42.380
And we need to set our Discovery seat hosts.
18:45.460 --> 18:46.010

Do.
18:52.170 --> 19:00.240

There they are and we'll change that to 127.00.1 Colin 9301 Now this is just a
demonstration, so we're
19:00.240 --> 19:01.320

using a loopback address.
19:01.350 --> 19:06.420

Normally you put an hostname or an IP here and the actual transport port of one or
more of your nodes
19:06.420 --> 19:07.080

in the cluster.
19:08.820 --> 19:12.360

And just to force the failure that we're interested in, I'm going to comment out
this line for the
19:12.360 --> 19:15.600

initial master nodes, and that way it's not going to be able to reach the master.
19:15.630 --> 19:17.310

We'll see what happens when we hit that failure.
19:18.240 --> 19:21.300

Let's go ahead and hit escape colon WQ.
19:21.630 --> 19:23.280

Exclamation point to right and quit.
19:24.090 --> 19:25.590

And now let's start up our service.
19:35.260 --> 19:35.560

All right.
19:35.560 --> 19:36.870

It looks like it started successfully.
19:36.880 --> 19:39.850

Let's check our route, End Point, to see if it really is running.
19:39.850 --> 19:40.300

Okay.
19:40.450 --> 19:43.120

Curl Local host, coordinated 100.
19:46.550 --> 19:47.330

All right.
19:47.600 --> 19:54.110

So we did get a nice response with various details here, but something is missing
the cluster UUID.
19:55.010 --> 19:57.200

This means that our cluster is not actually formed.
19:57.360 --> 20:02.090

And we can confirm this by checking the cluster state with the cluster health API.
20:02.670 --> 20:03.590

Let's say curl.
20:04.340 --> 20:07.070

Local host coordinates 200 slash underscore.
20:07.070 --> 20:08.330

Cluster slash health.
20:12.240 --> 20:15.090

And after about 30 seconds of waiting will get an exception.
20:19.230 --> 20:20.340

Indeed we did, master.
20:20.340 --> 20:21.450

Not discovered exception.
20:21.870 --> 20:26.760

All right, let's Taylor logs and see that the note didn't discover any master and
will continue the
20:26.760 --> 20:27.740

discovery process.
20:27.750 --> 20:30.780

So let's check our logs and see what happened.
20:31.560 --> 20:35.850

Let's look at the past 500 lines here and those that's lecture cluster.
20:35.850 --> 20:37.920

Don't log this time because we changed the cluster name.
20:40.220 --> 20:42.620

That's the relevant message here, master not discovered.
20:42.770 --> 20:46.880

This note is not previously joined a bootstrap cluster and cluster initial master
nodes is empty on
20:46.880 --> 20:49.280

this node so it's going to continue.
20:49.280 --> 20:53.030

Discovery on 120 7.0.1 9301 from the host providers.
20:53.660 --> 20:58.970

But yeah, that's basically telling us that we had a problem actually electing a
master because we didn't
20:58.970 --> 21:01.220
list any master nodes and it couldn't find any makes sense.
21:01.220 --> 21:01.460

Right?
21:02.750 --> 21:07.070

So these issues are going to be very similar when forming a new cluster and we can
simulate that in
21:07.070 --> 21:09.560

our environment with the cluster initial master node settings.
21:09.830 --> 21:12.290

So again, let's make sure there's no previous data on our node.
21:12.680 --> 21:13.970

We'll go ahead and blow away that.
21:14.360 --> 21:15.980

Let's stop the service before we forget, huh?
21:16.610 --> 21:18.590

So back to this other site here.
21:20.700 --> 21:21.540

Stop service.
21:22.560 --> 21:23.670

Now we're going to blow away.
21:24.150 --> 21:25.530

Var Lib Elasticsearch again.
21:26.800 --> 21:27.370

Like so.
21:29.690 --> 21:30.080

All right.
21:30.080 --> 21:32.720

And now we can edit our way and I'll file again.
21:37.490 --> 21:41.240

And now we're going to go back to make sure our cluster name is still a lecture
cluster.
21:41.240 --> 21:43.010

And now we're gong to set our initial master nodes.
21:45.560 --> 21:47.660

So it was complaining before that we had an empty list there.
21:47.660 --> 21:54.800

So let's give it a a list and I'll go to insert mode and now we can edit this line,
uncomment it and
21:54.800 --> 21:59.120

we'll set it to the list of Node one, note two and Node three.
22:04.580 --> 22:06.080
So let's go ahead and hit escape.
22:06.320 --> 22:11.720

Colin WQ Exclamation point two writing quit and we'll restart the note again.
22:13.340 --> 22:14.240

Start the service.
22:16.120 --> 22:17.460

And see what happens this time.
22:20.290 --> 22:20.610

All right.
22:20.620 --> 22:22.030

Looks like it went okay.
22:22.030 --> 22:23.800

But again, let's check and make sure.
22:23.830 --> 22:24.490

Let's hit the route.
22:24.490 --> 22:24.940

End Point.
22:27.750 --> 22:29.880

Still we have no cluster EOD.
22:29.910 --> 22:32.430

So we didn't actually join a cluster that failed.
22:32.460 --> 22:33.750

And if we do a health check again.
22:37.560 --> 22:39.330

We'll have to wait 30 seconds for that to time out.
22:41.460 --> 22:41.820

All right.
22:41.820 --> 22:42.630

Same deal, Master.
22:42.630 --> 22:43.320

Not discovered.
22:43.380 --> 22:45.690

Let's check the logs again to see what happened this time.
22:46.320 --> 22:48.540

So we'll just tail those last 500 lines again.
22:49.640 --> 22:54.200

And we're going to look for something about discovering master eligible notes to
do.
22:58.080 --> 22:59.490
Probably should have crept for Warren, huh?
23:05.380 --> 23:06.160

This looks interesting.
23:07.480 --> 23:07.900

All right.
23:07.900 --> 23:09.640

Node one not discovered yet.
23:10.660 --> 23:12.370

This node must discover massive eligible nodes.
23:12.370 --> 23:14.660

Node one no to a node three to bootstrap a cluster.
23:14.680 --> 23:15.850

We only discovered node one.
23:16.630 --> 23:19.270

So, yeah, you can't just specify nodes that don't exist there.
23:20.260 --> 23:24.310

All right, so we have performed some experiments here, so we'll need to use your
imagination to complete
23:24.310 --> 23:24.880

the picture.
23:25.000 --> 23:29.080

Now, in a real production scenario, there are many reasons why this problem often
appears.
23:29.650 --> 23:34.030

Since we're dealing with a distributed system, many external factors such as
network communication
23:34.030 --> 23:36.940

come to play and may cause the notes to be unable to reach each other.
23:37.000 --> 23:40.640

So the problem might not just be that I listed a bunch of fictitious hosts there.
23:40.780 --> 23:44.860

It might be that those are valid hosts, but they can't be reached for some reason
to resolve these
23:44.860 --> 23:45.280

issues.
23:45.370 --> 23:46.870

You need to triple check all your settings.
23:47.230 --> 23:49.420

So again, let's go back into them.
23:50.440 --> 23:54.190

We need to make sure the cluster name, all the notes are joining or forming the
right cluster.
23:54.910 --> 24:00.520

The no name and a miss type in the no names can cause invalidity for the master
elections and the seed
24:00.520 --> 24:04.120

hostnames, APIs and supports down here somewhere.
24:05.970 --> 24:10.570

Got to make sure those all have valid seed hosts linked and that the ports are
actually the configured
24:10.570 --> 24:10.960

ones.
24:11.710 --> 24:14.680

We need to check connectivity between the nodes and the firewall settings.
24:14.800 --> 24:19.600

So use telnet or similar tools to inspect your network and make sure it's open for
communication between
24:19.600 --> 24:22.390

the nodes, the transport layer and the ports especially.
24:23.170 --> 24:24.250

Also check SSL.
24:24.250 --> 24:29.080

Intel's communication encryption is a vast topic and we're not going to touch that
here, but it's a
24:29.080 --> 24:33.790

usual source of troubles invalid certificates and untrusted certificate,
authorities and things like
24:33.790 --> 24:34.090

that.
24:34.840 --> 24:38.320

Also be aware that there are special requirements on the certs when encrypting No.
24:38.320 --> 24:39.370

Two, No communication.
24:40.900 --> 24:44.800

All right, the last thing we're going to explore is the relationship between the
shard allocation and
24:44.800 --> 24:47.440

cluster state as these two things are tightly related.
24:48.010 --> 24:52.240

But first, we need to change the Elasticsearch y email configuration to let our
notes successfully
24:52.240 --> 24:53.470

form a single node cluster.
24:53.950 --> 24:59.770

So back in our configuration file here, let's just set the initial master as the
node itself and start
24:59.770 --> 25:00.340

the service.
25:01.580 --> 25:03.170

So to take out No to a No.
25:03.170 --> 25:05.300

Three and just hit I had to go to insert mode.
25:06.410 --> 25:07.910

Forgot I was in vim there for a second.
25:09.420 --> 25:09.890

Escape.
25:10.140 --> 25:12.060

Colin WQ exclamation point.
25:12.270 --> 25:12.980

We wrote that out.
25:12.990 --> 25:15.600

So now let's restart our service yet again.
25:17.540 --> 25:17.990

Stop it.
25:19.590 --> 25:20.040

Started.
25:22.680 --> 25:23.190

All right.
25:23.430 --> 25:25.710

And again, we'll carry the cluster health API.
25:25.810 --> 25:26.610

Let's see what happened.
25:30.150 --> 25:32.590

So we can see the cluster status is, in fact, green.
25:32.640 --> 25:33.210

That's good.
25:33.930 --> 25:35.520

So what does cluster status mean?
25:35.730 --> 25:39.180

Well, it actually reflects the worst state of any of the indices that we have in
our cluster.
25:39.900 --> 25:41.220

The different options include red.
25:41.730 --> 25:44.850

That means one or more shards of the index is not assigned in the cluster.
25:45.360 --> 25:49.920

This can be caused by various issues at the cluster level, like disjoint nodes or
problems with disks
25:49.920 --> 25:50.670

and things like that.
25:51.450 --> 25:56.460

Generally, the red status marks very serious issues, so be prepared for some
potential data loss.
25:57.150 --> 25:58.200

It could also be yellow.
25:58.230 --> 26:00.750

In that case, the primary data are not yet impacted.
26:01.080 --> 26:04.500

All the primary shards are okay, but some replica shards are not assigned.
26:05.130 --> 26:09.540

Like, for example, replicas won't be allocated on the same node as the primary
shard by design.
26:10.290 --> 26:15.420

This status marks a risk of losing data and green means all shards are well
allocated.
26:15.840 --> 26:20.160

However, it doesn't mean that the data is safely replicated as a single node
cluster, since with a
26:20.160 --> 26:22.560

single shard index it would be green as well.
26:23.280 --> 26:26.670

So now let's create an index with one primary shard and one replica.
26:27.510 --> 26:30.210

We'll do that with curl request.
26:30.450 --> 26:42.000

Put local host 9200 slash test what's called the index test backslash slash data
raw curly with the
26:42.000 --> 26:42.900

following settings.
26:43.620 --> 26:55.590

Curly bracket number of groups of shards will be set to one and the number of
replicas will be set to
26:55.590 --> 26:56.220

one as well.
26:56.820 --> 26:57.630
Close everything out.
26:58.350 --> 27:03.570

All right, so suddenly our cluster will turn yellow because our worst performing
index, the only one
27:03.570 --> 27:05.040

we have, is also yellow.
27:05.310 --> 27:06.900

Let's check our health again.
27:08.370 --> 27:08.630

Yep.
27:08.640 --> 27:09.270

Now we're yellow.
27:10.350 --> 27:15.510

Now you can also check the shards assignment with the Cat Shards API and see what's
going on there.
27:15.540 --> 27:20.550

So let's say curl localhost 9200 slash underscore cat slash shards.
27:20.910 --> 27:23.620

Question mark v aha.
27:24.660 --> 27:27.180

So we can see that we have unassigned shards here.
27:28.470 --> 27:32.190

Or if you want a more descriptive information, you can use the cluster allocation.
27:32.190 --> 27:36.750

Explain API, which provides an explanation as to why the individual shards were not
allocated.
27:36.960 --> 27:43.680

To do that will say Karl local host 9200 cluster allocation.
27:44.660 --> 27:46.260

Explain pretty.
27:49.390 --> 27:52.810

And that tells you very explicitly what's going on in our case, as I mentioned
before.
27:53.140 --> 27:57.670

The reason is due to the allocation of the data replica to the same node being
disallowed, since it
27:57.670 --> 28:01.090

makes no sense from a resiliency perspective, you wouldn't have a replica on the
same node.
28:01.090 --> 28:01.930

That's that's silly.
28:02.740 --> 28:03.790

So how would you resolve this?
28:03.820 --> 28:04.930

Well, we have two options.
28:05.350 --> 28:08.620

One would be to remove the replica shard, which is not a real solution.
28:08.620 --> 28:10.870

But if you need the actual status, it will work out.
28:11.440 --> 28:14.530

Or you could add another node on which the shards could be reallocated.
28:14.830 --> 28:16.210

So let's take that second route.
28:18.120 --> 28:23.280

So to simulate the following failures, I actually have two different nodes running
on the same host
28:23.280 --> 28:27.510

here, and setting that up is kind of involved and we're going to do that later in
the course as we
28:27.510 --> 28:28.410

go into failover.
28:28.830 --> 28:32.490

So for now, I just want you to watch and not actually try to follow on yourself.
28:32.670 --> 28:34.350

So I've already done some of the grunt work here.
28:34.650 --> 28:39.750

Basically, you need to set up a separate system to a unit file for the second node
and a server configuration
28:39.750 --> 28:40.800

and stuff like that.
28:40.800 --> 28:43.480

So just watch from this point on.
28:43.500 --> 28:43.800

Okay.
28:44.640 --> 28:48.570

So anyway, let's start by reviewing the main configuration file of that second note
that I've already
28:48.570 --> 28:52.560

set up and will ensure that it will join the same cluster with our existing nodes.
28:52.560 --> 28:59.130

So let's say sudo nano etsi Elasticsearch dash node two is where I put that.
29:05.690 --> 29:07.460
All right, so we have the same cluster name.
29:08.030 --> 29:10.130

We're calling our node here Node two.
29:10.790 --> 29:14.540

And we can see that our seed hosts is set to a loopback address, hopefully.
29:17.870 --> 29:18.090

Yep.
29:18.470 --> 29:21.950

And we can see that our master knows consists of node one and node two.
29:23.110 --> 29:29.770

Let's go ahead and exit out of here and start that second node sudo system control
start Elasticsearch
29:30.460 --> 29:32.940

that's node two dot service.
29:35.310 --> 29:35.640

Okay.
29:35.640 --> 29:38.790

So at this point I have started up a second node on the same VM.
29:38.790 --> 29:42.960

Again, there's quite a bit of configuration behind making that happen, so just
watch for this part
29:42.960 --> 29:43.200

of it.
29:43.860 --> 29:47.220

So now that we have a second node spun up, we should be back in a green status.
29:47.220 --> 29:48.120

So let's check.
29:48.240 --> 29:50.010

Let's say kernel dashed silence.
29:50.970 --> 29:51.780

Local host.
29:53.260 --> 29:56.080

200 slash underscore cluster slash.
29:56.080 --> 29:56.500

Health.
29:57.190 --> 29:57.580

Pretty.
29:59.190 --> 30:00.780

And we'll just grep for the status line.
30:02.380 --> 30:03.610
And our status is green.
30:03.640 --> 30:04.000

Great.
30:04.840 --> 30:08.740

Okay, so we've resolved her issue and the replica shards were automatically
reallocated.
30:08.740 --> 30:09.160

Perfect.
30:10.120 --> 30:11.590

So let's continue with this example.
30:11.590 --> 30:16.660

To simulate the red cluster state, let's start by removing the index and creating
it again, but this
30:16.660 --> 30:19.390

time with only two primary shards and no replica.
30:19.390 --> 30:21.610

And we'll quickly see why this is a bad idea.
30:21.970 --> 30:31.200

So first of all, delete the one that we have with Curl Bash Dash request delete
local host 9200 slash
30:31.210 --> 30:40.750

test and I will recreate it with curl request put localhost 9200 slash test
backslash.
30:42.720 --> 30:52.380

And dash, dash data, dash raw, quick curly settings will set the number of shards.
30:54.510 --> 31:03.660

Two, one, two, two, rather four because we have two notes to work with and number
of replicas to
31:03.660 --> 31:04.140

zero.
31:05.160 --> 31:06.750

So this seems like a pretty bad idea.
31:06.780 --> 31:09.900

You know, we have our shard split across two nodes, but no backups anywhere.
31:11.720 --> 31:13.100

All right, but so far, so good.
31:13.130 --> 31:14.840

You know, it's at least storing it.
31:15.170 --> 31:17.750

Let's check the shards of salmon to see what's actually going on here.
31:17.960 --> 31:24.980
Carol, local host, 9000 underscore cat slash shards, verbose.
31:26.420 --> 31:26.750

Okay.
31:27.110 --> 31:31.280

So we can see that each primary shard is on a different node, which follows the
standard allocation
31:31.280 --> 31:33.860

rules set at the cluster level and at the index level.
31:34.340 --> 31:35.930

And you likely know where we're heading.
31:36.650 --> 31:41.120

So imagine the situation where some network issue emerges and your cluster splits
up, resulting in
31:41.120 --> 31:46.070

disabled node communication, or even worse, some disk malfunctions leading to the
improper functioning
31:46.070 --> 31:46.550

of a node.
31:47.240 --> 31:50.090

Now, the easiest way to simulate this is to just stop one of our nodes.
31:50.360 --> 31:51.680

So let's go ahead and kill No.
31:51.680 --> 31:56.480

Two with a pseudo slash spin slash system control.
31:57.590 --> 31:59.750

Stop Elasticsearch Dash No.
31:59.750 --> 32:00.980

Two dot service.
32:03.020 --> 32:04.010

And down it goes.
32:04.430 --> 32:07.230

So now if we check our status again to do.
32:10.570 --> 32:11.860

We are now in red status.
32:12.700 --> 32:13.390

That's a bad thing.
32:13.840 --> 32:17.110

So now let's check the explain API to learn more about what's going on.
32:17.230 --> 32:19.270

Curl local host.
32:19.990 --> 32:24.040

A200 slash underscore cluster slash allocation.
32:25.640 --> 32:27.080

Explain pretty.
32:29.940 --> 32:30.500

All right.
32:30.510 --> 32:35.220

So we cannot allocate it because a previous copy of the primary chart existed but
can no longer be found
32:35.220 --> 32:36.210

on the nodes in the cluster.
32:36.390 --> 32:37.630

Well, that tells you what's going on.
32:37.650 --> 32:38.610

It's pretty well described.
32:39.030 --> 32:40.680

A node left as we have turned it off.
32:41.040 --> 32:46.320

But in the real world that has various potential causes and no valid shard copy can
be found in the
32:46.320 --> 32:48.780

cluster, in which case we're missing data.
32:49.440 --> 32:54.060

Unfortunately, there's no easy solution to this scenario, as we do not have any
replicas and there's
32:54.060 --> 32:55.500

no way we could remake our data.
32:56.520 --> 33:00.450

So firstly, if you are dealing with some network problems, try to thoroughly
inspect what could go
33:00.450 --> 33:06.210

wrong like a misconfiguration of firewalls and inspect it as a priority, since data
cannot consistently
33:06.210 --> 33:07.380

be indexed in this state.
33:08.280 --> 33:12.690

Now, depending on the document routing, many indexing requests can be pointed
toward the missing shard
33:12.690 --> 33:13.770

and end up timing out.
33:14.460 --> 33:17.050
For example, this to try to insert a document and see what happens.
33:17.070 --> 33:18.200

Curl request.
33:19.560 --> 33:29.190

Post local host 200 slash test underscore doc data raw and we'll just say a
message.
33:31.360 --> 33:31.930

It's data.
33:35.720 --> 33:37.310

And this should lead to an exception.
33:39.600 --> 33:42.780

And after about 30 seconds or so, it finally timed out on me.
33:43.560 --> 33:48.480

Now, secondly, if no possible solution was found, the only option left to get the
index to work properly
33:48.480 --> 33:49.920

may be to allocate a new shard.
33:50.400 --> 33:54.990

But be aware that even if the lost node will come back afterwards, the new shard
will just overwrite
33:54.990 --> 33:56.700

it because it is in a newer state.
33:57.480 --> 34:00.720

Now we can allocate a new shard with the cluster reroute API.
34:00.840 --> 34:05.700

So here we will allocate one for the test index on the node dash one that operates
correctly.
34:06.210 --> 34:08.580

Note that we have to explicitly accept data loss.
34:08.700 --> 34:16.770

So curl request post local host 9200 slash underscore cluster slash reroute.
34:18.210 --> 34:19.620

And we want pretty results.
34:20.770 --> 34:22.990

Backslash did all.
34:24.330 --> 34:26.580

Quick curly commands.
34:27.850 --> 34:29.170

Actually, there's going to be a square bracket.
34:29.170 --> 34:29.830

We have a list of them.
34:31.610 --> 34:32.840

And curly brackets.
34:35.370 --> 34:38.400

Allocate empty primary.
34:41.990 --> 34:43.040

Index test.
34:44.650 --> 34:45.190

Shard.
34:45.820 --> 34:52.990

This one node is node one and except data loss.
34:54.950 --> 34:55.610

We'll be true.
34:58.230 --> 34:59.340

Cause everything out.
35:01.240 --> 35:01.930

I think that's right.
35:05.310 --> 35:05.730

All right.
35:05.940 --> 35:08.880

And afterwards, we should no longer experience timeouts during indexing.
35:10.170 --> 35:11.130

All right, so we're done.
35:11.130 --> 35:15.570

But we just need to restore everything from our backup now, because we did do some
pretty invasive
35:15.570 --> 35:16.570

stuff to our index here.
35:16.590 --> 35:20.580

So we're back at the point where you should be following along if you were
following along earlier.
35:20.610 --> 35:20.880

Okay.
35:20.880 --> 35:24.780

We need to restore from that back up and make sure we're not left with any
lingering issues that we
35:24.780 --> 35:25.590

might have introduced.
35:25.800 --> 35:29.520

So we're going to restore all of our original indices that we backed up earlier.
35:29.940 --> 35:31.860
Before we can do that, we need to do some cleaning up.
35:32.490 --> 35:36.390

So first, we need to make sure that the repository path is registered again in the
Elasticsearch dot.
35:36.990 --> 35:39.120

As we've done some changes to it during the exercise.
35:39.750 --> 35:43.380

So let's go ahead and reference our stored config file that we squirreled away at
the start of the lesson,
35:44.040 --> 35:45.090

and we'll put that back.
35:47.010 --> 35:48.960

So let's see, go back to our home directory.
35:48.960 --> 35:49.860

I think that's where we put it.
35:50.760 --> 35:50.970

Yep.
35:50.970 --> 35:52.320

There's Elasticsearch that Lyonel.
35:52.320 --> 35:54.180

So let's go ahead and move that back into position.
35:54.270 --> 35:55.530

Sudo move Elasticsearch.
35:55.530 --> 35:58.590

So I am going to see Elasticsearch.
36:01.080 --> 36:03.480

All right, let's double check that it's there and looks correct.
36:07.440 --> 36:09.860

Well, I'm the wrong user sudo.
36:09.900 --> 36:13.320

Listen to me and we'll just go ahead and edit it directly at c.
36:13.830 --> 36:14.700

Elasticsearch.
36:15.180 --> 36:18.210

Elasticsearch why?
36:18.240 --> 36:18.510

Email.
36:19.800 --> 36:20.790

Make sure things look normal.
36:21.480 --> 36:21.690
All right.
36:21.690 --> 36:22.920

Things are back to how we started.
36:23.190 --> 36:24.300

We have node one.
36:25.400 --> 36:28.370

We still have the path to repo set to home student backups.
36:28.370 --> 36:32.930

That's very important so we can restore that backup memory lock is commented out
again.
36:33.860 --> 36:36.800

Everything looks like it's back to default settings, so that's good.
36:37.730 --> 36:38.020

All right.
36:38.030 --> 36:38.420

Looks good.
36:38.420 --> 36:38.870

Looks good.
36:39.260 --> 36:43.430

Now, we do need to make sure that Elasticsearch has permission to read that
configuration file we just
36:43.430 --> 36:44.150

restored first.
36:44.150 --> 36:51.590

So let's go to the SC folder and do a pseudo Alice start.
36:51.610 --> 36:53.960

L-A Elasticsearch.
36:55.640 --> 36:56.310

See we have.
36:56.820 --> 36:57.070

Yeah.
36:57.090 --> 36:58.980

So we can see that it's owned by the root group.
36:59.010 --> 36:59.820

We need to change that.
36:59.970 --> 37:01.910

So, sudo change group.
37:02.250 --> 37:03.150

Elasticsearch.
37:04.650 --> 37:05.780
Elasticsearch.
37:09.670 --> 37:10.810

Elasticsearch.
37:12.400 --> 37:13.210

But, Lionel.
37:15.130 --> 37:15.970

Check that again.
37:16.480 --> 37:17.290

All right, that looks better.
37:17.290 --> 37:19.510

So now we should be able to restart our main node.
37:26.660 --> 37:27.200

Like so.
37:31.050 --> 37:31.320

Right.
37:32.250 --> 37:36.390

So now we can reregister our repository again to make sure it's ready to provide
the backup data.
37:36.780 --> 37:47.130

Curl request put will host and a few hundred slash underscore snapshot slash backup
dash repo backslash
37:47.790 --> 37:48.420

data all.
37:51.080 --> 37:52.640

Type is filesystem system.
37:54.150 --> 37:54.870

Settings.
37:58.170 --> 37:58.830

Location.
38:00.120 --> 38:02.640

Home Student Backups.
38:03.060 --> 38:03.780

Backup dash.
38:03.780 --> 38:04.170

Repo.
38:07.900 --> 38:08.350

All right.
38:09.010 --> 38:13.390

And we can check the available snapshots in the repository with a simple cash
request to our backup
38:13.390 --> 38:13.780
repo.
38:13.780 --> 38:16.630

And we should see our snapshot one waiting to be restored.
38:17.260 --> 38:18.310

Curl Local Host.
38:18.340 --> 38:21.370

Note both underscore cat snapshots.
38:22.600 --> 38:23.320

Flash Backup.
38:23.320 --> 38:23.920

Dash Repo.
38:26.060 --> 38:26.360

All right.
38:27.290 --> 38:28.190

It's a success.
38:28.580 --> 38:32.600

Now, to prevent any rights during the restore process, we need to make sure that
all of our indices
38:32.600 --> 38:33.170

are closed.
38:33.380 --> 38:38.300

So, however, as of Elasticsearch eight, they've made it not that easy.
38:38.330 --> 38:43.040

You actually need to disable a safety feature that prevents you from closing all of
your indices at
38:43.040 --> 38:43.730

once together.
38:44.240 --> 38:48.050

So let's fire up our editor and edit our Elasticsearch Dynamo file.
38:48.440 --> 38:54.380

And we're looking for the setting action dot destructive underscore requires,
underscore name, uncomment
38:54.380 --> 38:56.420

that and let it be set to false.
39:04.890 --> 39:10.320

And after that, we're going to stop and restart the service again to pick that
change up.
39:25.200 --> 39:34.050

Curl request post will host a 200 slash underscore all underscore close.
39:36.330 --> 39:39.240
And finally we can restore a backup with Colonel request.
39:40.050 --> 39:43.710

Post localhost 9000 snapshot.
39:45.240 --> 39:45.780

Backup.
39:46.290 --> 39:46.620

Dash.
39:46.620 --> 39:48.930

Repo slash snapshot.
39:48.960 --> 39:49.650

Dash one.
39:51.860 --> 39:53.390

Slash underscore restore.
39:56.540 --> 39:58.430

And it took it slow after a few seconds.
39:58.430 --> 40:01.490

If we check our indices, we should see all the original data back in place.
40:02.210 --> 40:05.930

Curl Local Host 9200 Slasher and Score Cat Slash Indices.
40:08.510 --> 40:10.880

And there's our original Shakespeare index, for example.
40:10.880 --> 40:11.930

So, yeah.
40:12.080 --> 40:12.950

Things have been restored.
40:13.460 --> 40:13.760

Great.
40:13.760 --> 40:18.260

So now that you're armed with foundational knowledge and various commands on
troubleshooting your Elasticsearch
40:18.260 --> 40:22.940

cluster, the last piece of advice is to stay positive even when things are not
working out.
40:23.480 --> 40:26.690

It's part of and parcel to being an Elasticsearch engineer.

106 - Troubleshooting Common Issues English

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

106 - Troubleshooting Common Issues English

Uploaded by

Copyright:

Available Formats

WEBVTT

00:07.030 --> 00:12.280

00:12.280 --> 00:14.410

00:15.280 --> 00:18.040

00:18.730 --> 00:22.660

00:22.660 --> 00:23.770

00:24.550 --> 00:27.550

00:27.550 --> 00:33.820

00:33.820 --> 00:34.690

00:38.440 --> 00:44.290

00:44.290 --> 00:44.740

00:45.790 --> 00:49.060

00:49.570 --> 00:53.710

00:53.710 --> 00:56.920

00:57.880 --> 01:02.860

01:02.860 --> 01:04.960

01:06.130 --> 01:07.750

01:08.650 --> 01:13.060

01:16.000 --> 01:21.010

01:21.010 --> 01:22.750

01:24.090 --> 01:25.650

01:26.400 --> 01:29.040

01:29.220 --> 01:33.540

01:33.540 --> 01:34.350

01:35.580 --> 01:41.070

01:41.070 --> 01:44.460

01:44.910 --> 01:47.160

01:50.550 --> 01:54.990

01:54.990 --> 01:59.550

02:01.180 --> 02:06.280

02:06.280 --> 02:06.850

02:07.210 --> 02:08.320

02:08.470 --> 02:12.580

02:12.580 --> 02:16.510

02:18.850 --> 02:23.350

02:23.800 --> 02:24.270

02:24.280 --> 02:25.780

02:26.080 --> 02:27.040

02:28.150 --> 02:29.260

02:29.710 --> 02:30.310

02:35.870 --> 02:39.290

02:39.300 --> 02:41.270

02:42.650 --> 02:43.630

02:45.990 --> 02:47.190

02:52.510 --> 02:53.440

02:55.430 --> 02:56.090

02:56.420 --> 03:00.590

03:01.550 --> 03:01.910

03:02.210 --> 03:04.310

03:04.430 --> 03:05.180

03:06.290 --> 03:10.340

03:10.340 --> 03:10.880

03:10.880 --> 03:20.090

03:20.090 --> 03:25.460

03:25.460 --> 03:27.440

03:28.370 --> 03:28.840

03:28.850 --> 03:33.020

03:33.050 --> 03:34.780

03:34.790 --> 03:39.320

03:42.110 --> 03:48.020

03:48.770 --> 03:49.970

03:52.940 --> 03:54.620

03:55.040 --> 03:59.420

04:02.580 --> 04:06.170