10,000 FT Overview 2 - Storage, Databases, Migration & Analytics Subtitles

Okay.
Hello Cloud Gurus and welcome to this lecture.

This lecture we're going to continue
our 10, 000 foot overview of the AWS platform
and we're going to start off with storage.
So, hopefully you've all had a little bit of a break
since the last lecture and we'll just dive right in.
Okay, so moving on to storage.
Now, storage consists of four different components
currently in the platform.
So the very first one is S3,
and this is almost as old as AWS itself.
Not quite, but it is one of the bedrocks
of the AWS platform.
And it certainly comes up a lot in the solutions
architect associate and the developer associate exam.
So S3, what is it?
Just think of it as a virtual disk in the cloud
where you can store objects.
And what do I mean by objects?
Well really I'm talking about files.
I'm talking about like word documents
or PowerPoint documents or pictures
or movies or text files, et cetera.
So, it's a place where you can store objects.
What you don't use S3 for is things like,
a place to install a database or a place to install
an application may be a computer game.
It's not that type of storage.
This is called object based storage.
If you wanted to install basically a computer game
or you wanted to install a database,
you'd need block-based storage.
We'll come onto that in a little bit.
So the easiest way to remember S3
is it's a place to put objects in the cloud.
And actually Dropbox was one
of the very first startups to utilize S3.
And they started storing their objects on S3
and they still use it an awful lot today.
Dropbox actually stores all the metadata
inside their own data centers.
So, metadata is just basically data about data.
So what's the file name?
When was it uploaded?
When was it last changed?
That sort of thing.
But the actual objects themselves still exist in S3 today.
And we'll cover off S3 in an entire section of the course.
Moving on, we've got Glacier.
Glacier is a place in which you archive
your files from S3 off.
So Glacier basically is where,
let's say you've got a requirement,
from the FSA or from, you know, a regulatory body,
government regulatory body,
and you've got to store your files for seven years.
You don't need instant access to those files.
Maybe you can wait for up to, you know, four, five hours
to retrieve them, then you'd archive them off to Glacier.
So, Glacier is used for data archival.
It's extremely low cost, I mean, unbelievably low cost.
and it's basically a place where you store your files
for compliance reasons or for whatever reason you want.
But it means that you can't access them immediately.
It does take around three to four hours to retrieve them.
And again, we're gonna cover Glacier off
in another section of the course.
We then have EFS.
Now EFS is a relatively new, it came out last year.
It's called Elastic File Service.
And basically when we were talking about S3,
S3 is where you store objects,
Elastic File Service is file-based storage
and you can share it.
And we'll look at that in the EC2 section of the course.
But essentially it's a place where you could install
your databases, for example.
You could install applications, and you can actually share
that volume with multiple virtual machines.
And we'll look at that in the EC2 section of the course.
By the way, for those who are wondering what S3 stands for,
it's just simple storage service, S, S S.
Which is why it's called S3.
And finally we have Storage Gateway.
Now Storage Gateway is a way of connecting up S3
to your on-Premise data center or to your headquarters.
It can, is normally a virtual machine
that you install on-Prem.
So you get a virtual machine image
and then it communicates with S3.
Storage Gateway can come up a little bit in
the Solutions Architect Associate exam, doesn't come up
in the developer exam at all to my knowledge.
And it comes up all the time
in the SYSOPS Administrator exam.
So, we'll cover off Storage Gateway
later on in the course as well.
So those are your four different storage options.
Just remember that S3 is always for object-based storage.
You never wanna install a database or you know,
a game or something on there, an application on S3
that you would either use EFS or we'd use EBS.
We don't actually have EBS on here yet
because it's not considered, a service of storage.
EBS is elastic block store and that's just basically
a virtual disk that you attach to your EC2 instances.
And again, we're gonna cover that off in the, you know,
EC2 section of the course.
So, let's move on and we're moving onto databases
and this is a fundamental part of both these,
all three associate exams, really.
So, let's start with RDS.
RDS is relational database service.
It consists of a number of different database technologies.
So we've got MySQL, we've got PostgreSQL,
we've got MariaDB, we've got SQL Server, we've got Oracle.
And then we have a new beta database technology
called Aurora.
And Aurora actually comes in two different flavors.
One's MySQL and newly announced at re:Invent 2016.
We have PostgreSQL as well,
and we have an entire section on RDS in the course.
RDS, in particular comes up a lot in
the solutions architect and SYSOPS administrator exams.
It doesn't features too heavily on the developer exam.
What does feature heavily in the developer exam is DynamoDB.
So, RDS is a relational database.
DynamoDB is a non-relational database and we'll explore
the differences between the two later on in the course.
But DynamoDB is basically a NoSQL database.
It's really, really, really scalable,
and it's got really, really super high performance.
And basically if you are getting away from traditional
databases and using NoSQL databases for your applications,
DynamoDB is where you want to start.
We have a 19-hour course on DynamoDB, which will take you
from zero to hero if you want to become the DynamoDB guru.
And DynamoDB is essential for the developer associate exam.
You need to know DynamoDB inside out
in order to pass that exam.
Moving onto Redshift.
Redshift is Amazon's data warehousing solution.
So, basically when you have a whole bunch of different data,
think of big data, you wanna store it in a warehouse
and then only query it as in when you need to run reports.
So, it's not good to run reports on your production database
cause it's gonna slow your production database down.
Instead you wanna basically transfer a copy
of your production database over into Redshift.
And then in Redshift you can run queries on that data.
So it might be, I wanna know what the net profit
is for toaster in Asia Pacific,
if you are a retailer for example.
Those types of queries you'd use using Redshift.
So we will cover that in the theory section.
You don't really need to be hands on
for Redshift in any of the associate courses.
You will need to know Redshift inside out
for the big data specialty cert, however.
Good news is we have an entire course
dedicated to Redshift and we also have
the big data specialty cert as well.
And then finally we have Elasticache.
And Elasticache basically is a way
of caching your data in the cloud.
So, imagine that you've got a shop front
that's frequently visited and you have your
top 10 selling items and it might be,
let's say it's a vacuum cleaner, for example.
So, every person who comes to your site
sees the top 10 selling items
and a lot of them click on this vacuum cleaner.
And vacuum cleaner price never changes,
the images never change, the data around it never changes.
So why take that data out of your database
when you can actually just cache it
and you can cache it using Elasticache.
If you design your architecture that way,
it means that you take a load off your database
and it means that that data will be returned much quicker
than if you're pulling it from your database.
And we'll cover that off in a little bit more detail
later on in the course.
Elasticache comes up mostly in the developer exam,
but it can also come up in the solutions architecture exam.
Certainly in scenario questions where you've got to take
a load off your database cause it's starting to fail.
You know, what different mechanisms can use to do that,
well, one of the answers is to use Elasticache.
Okay, so let's move onto migration services.
And the first migration service is Snowball.
So to understand a little bit of history here,
Snowball started out as import export.
Where you could send a disk or a whole bunch of disks,
to Amazon.
They connect those disks up either using IDE or SATA
or whatever it is that they were using to connect
and then they would basically transfer the contents
of those disks either to S3 or even two services like EBS.
And EBS is basically a virtual disk for your EC2 instances.
We're gonna cover that off in a lot of detail
coming up in the course.
so what happened was it was basically
a nightmare to manage as you could imagine.
You're getting all these different disks
of different shapes, sizes, manufacturers.
So they released Snowball.
Snowball was a way of doing this at the enterprise levels.
This is how you can move terabytes of data into the cloud.
So, Snowball is this sort of briefcase size appliance.
Traditionally, it consisted just of storage,
and you connect that up and then basically you'd load
your terabytes with the data onto your Snowball appliance
and then send it back to Amazon.
And you'd basically just charged the setup fee
and then a daily rate.
Now, at re:Invent 2016, they announced Snowball Edge
and basically they've taken the concept a bit further.
So instead of it just being an appliance that allows you
to transfer storage, what they've done is they've,
you know, added compute capacity to it.
So you can actually use Snowball Edge,
basically it's a piece of Amazon web services data centers
that you can take on-Prem.
So, you can actually have your own AWS, on-Premise,
you know, piece of kit.
So that's really cool.
I've actually ordered one,
it should arrive in about 10 days from now.
so we will have a lecture on that.
Where snowball comes up is really gonna be around
the solutions architect associated exam
as well as the SYSOPS associate exam.
Doesn't come up in the developer associate exam much at all.
Moving on to DMS.
This is database migration services.
So this allows you to migrate your on-Premise databases
to the AWS cloud and you can also use it to migrate
databases that are inside AWS cloud over to either
other regions or into things like Redshift, et cetera.
Now you can do this with your production databases.
You're not going to have any downtime.
Now, what's really cool with this technology
is that you don't have to stay with the database
that you're migrating from.
So let's say you've got an Oracle database that's,
you know, in house, costs you a lot of money
in licensing fees and you wanna move up to the cloud.
You can actually take your Oracle database
and migrate this over to Aurora, for example.
And basically DMS will handle the whole conversion process.
So it frees you from the licensing phase of Oracle.
Now, Amazon announced this at re:Invent in 2015.
As you can imagine, Oracle were not very happy.
And there has been quite a few fights ever since.
Oracle actually sent a fleet of Teslas to re:Invent 2016
to give guests free lifts back to their hotel
from the re:Invent conference.
And it was quite funny actually.
And the Teslas all had Oracle branding saying
don't get stuck in, you know, one particular cloud.
Make sure you diversify.
So, there's been an ongoing feud between Amazon and Oracle.
But DMS is a really, really great service.
And I don't know if you've guys have ever worked
a solutions architects before,
but Oracle licensing is hideously expensive.
So, DMS is a fantastic service.
You can migrate your production databases up to AWS.
You have no downtime.
It's using replication,
and you can actually convert your different databases.
So the technologies that are supported are Oracle, SQL,
MySQL, Aurora, PostgreSQL, and SAP Asc.
So, it's great technology
and I'm going to have a lecture on it.
We won't do any labs
but we'll just do it from a theoretical point of view.
It's not yet in the solutions architect associate
or SYSOPS administrator associate exam,
but I would expect it to be added in 2017,
because it is a fantastic service for migration.
Speaking of great services for migration, we also have SMS,
which stands for server migration services.
And basically this does exactly the same
as database migration services,
but instead of targeting databases,
this targets virtual machines,
specifically the VMware virtual machines.
So if you've got VMware virtual machines
that are running on-Premise,
you can use server migration service to basically replicate
these virtual machines up to the AWS cloud
and you can do 50 concurrently at the same time.
Again, this does not yet feature in any of the exams,
but it's a great thing to know and we'll just have
a theoretical lecture on it later on in the course.
Moving on to analytics, let's start with Athena.
Athena basically allows you to run SQL queries on S3.
It's a brand new service
that was announced at re:Invent 2016.
So, let's imagine you've got a whole bunch of CSV files
or JSON files in your S3 buckets.
You can actually run SQL queries on those files.
so it's kinda like turning your flat files into,
you know, searchable databases, essentially.
Athena's still very brand new,
doesn't yet feature in any of the exams.
Moving onto Elastic MapReduce.
This is used for big data processing.
We will cover this off
in the solutions architect associate course.
It doesn't come up in the developer associate course
and it can only come up in like tiny little bit
in SYSOPS administrator course.
Really, you just need to understand what EMR is
at a high level and then also how you access it.
So we will cover this off later.
But Elastic MapReduce is basically used
to process large amounts of data.
So this might be things like log analysis or web indexing
or perhaps you'd try to analyze the financial markets.
And it's using a framework called Hadoop.
and there's other frameworks available including
Apache spark, Apache HBase, Presto or Flunk.
so EMR is used for big data.
That's all you really need to remember.
It features very, very heavily
in the big data specialty cert.
we do cover it off in a lot more detail in that particular
course for the solutions architect associate course.
And I think that's really it actually.
You just need to know what it is at a high level
and how you can access it.
So we will have a quick lecture
on that later on in the course.
Okay, so moving on to cloud search.
And I'm also gonna throw in Elastic search here
cause they're very similar, slightly different products
but very similar in terms of what they do.
So, if you need to create a search engine for your website
or for your application, you know,
you can use either cloud search or elastic search.
Cloud search is a fully managed service
that's provided by AWS, whereas Elastic Search,
it's a service that's using an open source framework,
but essentially it allows you to create search
capabilities within your website or application.
Again, this doesn't come up much
in any of the associate exams.
We used to use cloud search ourselves,
but then we moved over to Algolia.
Algolia is fantastic.
Check out their website.
if you want to, you know, get off AWS and use a third party,
I'd definitely recommend Algolia.
It is lightning fast.
Okay, so let's move on to Kinesis.
So what is Kinesis?
well, Kinesis actually does come up
in the solutions architect associate exam a fair bit.
It also comes up a lot in the big data specialty exam.
so basically Kinesis is a way of streaming
and analyzing real time data, massive scale.
And so you can capture and store terabytes of data per hour.
And you'd use this for things like financial transactions
you might be wanting to analyze the market
or even things like social media streams.
So perhaps you have a sentiment analysis app.
Perhaps you want to understand the sentiment
of your particular product or of your company
or even of an election, you can use Kinesis
to analyze social media feeds and basically pull in
people's Twitter data, people's Facebook data, et cetera,
at terabytes of data per hour.
And then you can run real time analysis on this.
Now in terms of what you need to know
for the associate exam, basically you just need to know,
you know, what it is that it does.
When you do the big data specialty exam,
you're gonna need to know Kinesis in a bit more detail
and we deep dive into it, in that particular course.
Moving on to Data Pipeline.
Data Pipeline is just, as it sorta says on the tin,
it's a service that allows you to move data
from one place to another.
I will do a quick lab on it.
It doesn't come up too much in the essay associate course
apart from knowing, you know what it is.
But it does come up in the solutions architect professional.
so, Data Pipeline, we will do a lab on it
and basically you can move data from, let's say S3,
which is where you store all your objects.
Perhaps you wanna move data from there into DynamoDB
or perhaps you wanna move data from DynamoDB over to,
you know, into S3.
So, you can set up data pipeline jobs that do exactly that.
Moving on to Quick Sight.
So Quick Sight doesn't come up
in any of the exams whatsoever.
It was announced at re:Invent 2015,
and basically it's a business analytics tool.
it helps you create visualizations and rich sort
of dashboards for your data that exist in AWS.
So, it can analyze data in S3, in DynamoDB, in RDS,
in RedShift, et cetera.
And it allows you to build out these really cool dashboards,
you know, for that data.
But again, it doesn't come up.
Maybe we'll have a course on it later on
if there is sufficient demand.
Okay, so we're almost 20 minutes into this lecture.
So what we're gonna do now is take a break.
Again, go make yourself a coffee.
Go make yourself a tea.
If you're starting to feel overwhelmed,
seriously, don't worry.
We're gonna cover everything in detail
as we go throughout the course.
And remember, you don't need to know
every single service going into the exam.
It's good to have a high level understanding
of what each one is.
But really you just need to know
a few key services in depth.
So that's it from me guys.
If you have any questions, please let me know.
If not, go have a break and feel free
to move on to the next lecture when you're ready.
Thanks.

10,000 FT Overview 2 - Storage, Databases, Migration & Analytics Subtitles

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

10,000 FT Overview 2 - Storage, Databases, Migration & Analytics Subtitles

Uploaded by

Copyright:

Available Formats

Okay.

Hello Cloud Gurus and welcome to this lecture.

You might also like