Accessing Hadoop Data Using Hive: Hive Configuration

Accessing Hadoop Data Using Hive: Hive Configuration
1
Hi. Welcome to Accessing Hadoop Data Using Hive. In this lesson we will discuss
Hives Configuration.
2
After completing this lesson, you should be able to:
Make basic Hive configuration changes,
Configure the Hive Metastore,
Make changes to Hives logging configuration,
And Locate and clean Hives temp directories
3
There are multiple ways to configure Hive along with a large amount of configuration
variables we can tweak. By default, Hive gets its configuration from the hive-default.xml
file that is stored in Hives /conf directory.
Theres 3 ways to go about configuring Hive.
You can pass configuration settings directly to the Hive command shell as seen in the
first example here. These settings last across a session.
You can also change Hives configuration from within the Hive CLI.
Finally you can directly edit the hive-site.xml file where changes will be persisted
across all Hive sessions.
4
There is a long list of variables that can be used to configure Hive. Here is a listing of
some interesting ones. We recommend you look at Hives documentation to see the
complete listing of configuration variables, as you never know which ones may come in
handy for your own projects. You also may want to pause this video so you can examine
some of the variables in this table.
5
If a metastore is not configured in Hive, then by default a Derby database is used. Hive
can also be configured to use a more robust storage solution, such as DB2 or MySQL. T
here are 3 different ways to setup a Hive metastore embedded, local, and remote.
The embedded metastore is not generally used for production as only one process can
connect to the metastore at a time.
The local metastore allows each Hive client to open a connection to the datastore and
make SQL queries against it.
The remote metastore is the one you most likely would use in a production environment.
In the remote metastore scenario, all Hive clients will make connections to the metastore
server which then queries the actual datastore (for example MySQL) for the metadata.
6
Lets take a look at how to configure a local JDBC MySQL metastore with Hive. In hivesite.xml we would have the following properties defined. The first property contains the
URL to the MySQL database. We also are telling MySQL to create the database if it does
not exist.
The second property holds the MySQL driver information. In order to get this metstore to
work, you will need to put the MySQL JDBC driver in the Hive lib folder.
The third property contains the MySQL user name credential, and the final property listed
contains the password credential.
7
In Hive, Log4j is used for logging. By default, logs are not emitted to the console by the
CLI. However we can change that. Also by default, the logging level is WARN and logs
are stored in /tmp/username/hive.log file.
The Log4j configuration can be altered by editing the hive-log4j.properties file in the
conf folder of the Hive installation. This configuration file controls the logging of the
Hive CLI and other local components. The hive-exec-log4j.properties file in the same
folder, controls logging inside MapReduce tasks. We can change the logging behavior in
Hive by passing arguments to the hive application as shown on this slide.
The first example shows how you can make Hive emit log information to the console.
The next example shows you how to change just the logging level.
It is also worth noting that Hive stores query logs on a per session basis in the
/tmp/user.name directory by default. That directory can be altered in hive-site.xml using
the hive.querylog.location property.
8
Hive uses temporary directories on both the Hive client machine and HDFS. Per-query
temporary data sets are stored in temp directories. Hive writes data to the temp location
configured with the hive.exec.scratchdir variable, and THEN moves data to the target
table.
The Hive client cleans up the temp directories when a query is completed, however if the
Hive client is terminated abnormally, you may need to manually go in and clean up the
temp space. To do this, you must know what directories Hive is using for temp data.
The HDFS default temp location is /tmp/hive-username and can be changed by editing
the hive.exec.scratchdir configuration variable. The client machines temp default
location is /tmp/username.
9
You have now completed this topic. Thank you for watching.

Accessing Hadoop Data Using Hive: Hive Configuration

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Accessing Hadoop Data Using Hive: Hive Configuration

Uploaded by

Copyright:

Available Formats

Accessing Hadoop Data Using Hive: Hive Configuration

You might also like