Hadoop

LINUXJUNKIES
A HDFSCLIENT FOR HADOOP USING THE NATIVE JAVA API,

A TUTORIAL
November 21, 2011 | NPK | addFile, Adminstration, API, block locations, Client, copyFromLocal, copyToLocal, delete, Full code,
getHostnames, Hadoop, hadoop fs, HDFS, Java, mkdir, modification time, Programming, read, Tutorial
11 Votes
Id like to talk about doing some day to day administrative task on the Hadoop system. Although the
hadoop fs <commands> can get you to do most of the things, its still worthwhile to explore the rich
API in Java for Hadoop. This post is by no means complete, but can get you started well.
The most basic step is to create an object of this class.
1
HDFSClient client = new HDFSClient();
open in browser PRO version
Are you a developer? Try out the HTML to PDF API
pdfcrowd.com
Of course, you need to import a bunch of stuff. But if you are using an IDE like Eclipse, youll follow
along just fine just by importing these. This should word fine for the entire code.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
import
import
import
import
import
import
import
import
java.io.BufferedInputStream;
java.io.BufferedOutputStream;
java.io.File;
java.io.FileInputStream;
java.io.FileOutputStream;
java.io.IOException;
java.io.InputStream;
java.io.OutputStream;
import
import
import
import
import
import
import
import
import
org.apache.hadoop.conf.Configuration;
org.apache.hadoop.fs.BlockLocation;
org.apache.hadoop.fs.FSDataInputStream;
org.apache.hadoop.fs.FSDataOutputStream;
org.apache.hadoop.fs.FileStatus;
org.apache.hadoop.fs.FileSystem;
org.apache.hadoop.fs.Path;
org.apache.hadoop.hdfs.DistributedFileSystem;
org.apache.hadoop.hdfs.protocol.DatanodeInfo;
1. Copying from Local file system to HDFS.

Copies a local file onto HDFS. You do have the hadoop file system command to do the same.
1
hadoop fs -copyFromLocal <local fs> <hadoop fs>
I am not explaining much here as the comments are quite helpful. Of course, while importing the
configuration files, make sure to point them to your hadoop systems location. For mine, it looks like
pdfcrowd.com
this:
1
2
3
4
5
6
Configuration conf =
conf.addResource(new
new Configuration();
Path("/home/hadoop/hadoop/conf/core-site.xml"));
Path("/home/hadoop/hadoop/conf/hdfs-site.xml"));
Path("/home/hadoop/hadoop/conf/mapred-site.xml"));
FileSystem fileSystem = FileSystem.get(conf);
This is how the Java API looks like:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
public void copyFromLocal (String source, String dest) throws IOException {


Path srcPath = new Path(source);
Path dstPath = new Path(dest);
// Check if the file already exists
if (!(fileSystem.exists(dstPath))) {
System.out.println("No such destination " + dstPath);
return;
}
// Get the filename out of the file path

String filename = source.substring(source.lastIndexOf('/') + 1, source.leng
try{
pdfcrowd.com
21
22
23
24
25
26
27
28
29
30
try{
fileSystem.copyFromLocalFile(srcPath, dstPath);
System.out.println("File " + filename + "copied to " + dest);
}catch(Exception e){
System.err.println("Exception caught! :" + e);
System.exit(1);
}finally{
fileSystem.close();
}
}
2.Copying files from HDFS to the local file system.

The hadoop fs command is the following.
1
hadoop fs -copyToLocal <hadoop fs> <local fs>
Alternatively,
1
1
2
3
4
5
6
7
8
9
10
11
hadoop fs -copyToLocal
public void copyFromHdfs (String source, String dest) throws IOException {

pdfcrowd.com
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30

return;
}

try{
fileSystem.copyToLocalFile(srcPath, dstPath)
System.exit(1);
}finally{
fileSystem.close();
}
}
3.Renaming a file in HDFS.

You can use the mv command in this context.
1
hadoop fs -mv <this name> <new name>

1
2
3
4
5
6
public void renameFile (String fromthis, String tothis) throws IOException{

Configuration conf = new Configuration();
conf.addResource(new Path("/home/hadoop/hadoop/conf/core-site.xml"));
conf.addResource(new Path("/home/hadoop/hadoop/conf/hdfs-site.xml"));
conf.addResource(new Path("/home/hadoop/hadoop/conf/mapred-site.xml"));
pdfcrowd.com
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33

Path fromPath = new Path(fromthis);
Path toPath = new Path(tothis);
if (!(fileSystem.exists(fromPath))) {
System.out.println("No such destination " + fromPath);
return;
}
if (fileSystem.exists(toPath)) {
System.out.println("Already exists! " + toPath);
return;
}
try{
boolean isRenamed = fileSystem.rename(fromPath, toPath);
if(isRenamed){
System.out.println("Renamed from " + fromthis + "to " + tothis);
}
System.out.println("Exception :" + e);
System.exit(1);
}finally{
fileSystem.close();
}
}
4.Upload or add a file to HDFS

1
2
public void addFile(String source, String dest) throws IOException {
pdfcrowd.com
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
// Conf object will read the HDFS configuration parameters


// Create the destination path including the filename.
if (dest.charAt(dest.length() - 1) != '/') {
dest = dest + "/" + filename;
} else {
dest = dest + filename;
}
Path path = new Path(dest);
if (fileSystem.exists(path)) {
System.out.println("File " + dest + " already exists");
return;
}
// Create a new file and write data to it.
FSDataOutputStream out = fileSystem.create(path);
InputStream in = new BufferedInputStream(new FileInputStream(
new File(source)));
byte[] b = new byte[1024];
int numBytes = 0;
while ((numBytes = in.read(b)) > 0) {
pdfcrowd.com
35
36
37
38
39
40
41
42
43

out.write(b, 0, numBytes);
}
// Close all the file descripters
in.close();
out.close();
fileSystem.close();
}
5.Delete a file from HDFS.

You can use the following:
For removing a directory or a file:
1
hadoop fs -rmr <hdfs path>
If you want to skip the trash also, use:

1
hadoop fs -rmr -skipTrash <hdfs path>
1
2
3
4
5
6
public void deleteFile(String file) throws IOException {

pdfcrowd.com
6
7
8
9
10
11
12
13
14
15
16
17
18

Path path = new Path(file);
if (!fileSystem.exists(path)) {
System.out.println("File " + file + " does not exists");
return;
}
fileSystem.delete(new Path(file), true);
fileSystem.close();
}
6.Get modification time of a file in HDFS.

If you have any idea on this let me know.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
public void getModificationTime(String source) throws IOException{


if (!(fileSystem.exists(srcPath))) {
System.out.println("No such destination " + srcPath);
return;
}
pdfcrowd.com
15
16
17
18
19
20
21
22
23
24
}
FileStatus fileStatus = fileSystem.getFileStatus(srcPath);
long modificationTime = fileStatus.getModificationTime();
System.out.format("File %s; Modification time : %0.2f %n",filename,modifica

}
7.Get the block locations of a file in HDFS.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
public void getBlockLocations(String source) throws IOException{



if (!(ifExists(srcPath))) {
return;
}
pdfcrowd.com
20
21
22
23
24
25
26
27
28
29
30
BlockLocation[] blkLocations = fileSystem.getFileBlockLocations(fileStatus,

int blkCount = blkLocations.length;
System.out.println("File :" + filename + "stored at:");
for (int i=0; i < blkCount; i++) {
String[] hosts = blkLocations[i].getHosts();
System.out.format("Host %d: %s %n", i, hosts);
}
}
8.List all the datanodes in terms of hostnames.

This is a neat way rather than looking up the /etc/hosts file in the namenode.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
public void getHostnames () throws IOException{

Configuration config = new Configuration();
config.addResource(new Path("/home/hadoop/hadoop/conf/core-site.xml"));
config.addResource(new Path("/home/hadoop/hadoop/conf/hdfs-site.xml"));
config.addResource(new Path("/home/hadoop/hadoop/conf/mapred-site.xml"));
FileSystem fs = FileSystem.get(config);
DistributedFileSystem hdfs = (DistributedFileSystem) fs;
DatanodeInfo[] dataNodeStats = hdfs.getDataNodeStats();
String[] names = new String[dataNodeStats.length];
for (int i = 0; i < dataNodeStats.length; i++) {
names[i] = dataNodeStats[i].getHostName();
System.out.println((dataNodeStats[i].getHostName()));
}
}
9.Create a new directory in HDFS.

pdfcrowd.com
Creating a directory will be done as:

1
hadoop fs -mkdir <hadoop fs path>
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
public void mkdir(String dir) throws IOException {

Path path = new Path(dir);
System.out.println("Dir " + dir + " already exists!");
return;
}
fileSystem.mkdirs(path);
fileSystem.close();
}
10. Read a file from HDFS

1
2
3
4
public void readFile(String file) throws IOException {

pdfcrowd.com
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
return;
}
FSDataInputStream in = fileSystem.open(path);
String filename = file.substring(file.lastIndexOf('/') + 1,
file.length());
OutputStream out = new BufferedOutputStream(new FileOutputStream(
new File(filename)));
int numBytes = 0;
}
in.close();
out.close();
fileSystem.close();
}
11.Checking if a file exists in HDFS

1
public boolean ifExists (Path source) throws IOException{
pdfcrowd.com
1
2
3
4
5
6
7
8
9
10
11

Configuration config =
config.addResource(new
FileSystem hdfs = FileSystem.get(config);

boolean isExists = hdfs.exists(source);
return isExists;
}
I know this is no way complete. But this is a rather long post. I hope it is useful. Responses
appreciated!
And here is the complete code for HDFSClient.java. Happy Hadooping!
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
/*
Feel free to use, copy and distribute this program in any form.
HDFSClient.java
https://linuxjunkies.wordpress.com/
2011
*/
import
import
import
import
import
import
import
import
java.io.BufferedInputStream;
java.io.BufferedOutputStream;
java.io.File;
java.io.FileInputStream;
java.io.FileOutputStream;
java.io.IOException;
java.io.InputStream;
java.io.OutputStream;
import org.apache.hadoop.conf.Configuration;
pdfcrowd.com
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
import
import
import
import
import
import
import
import
import
org.apache.hadoop.conf.Configuration;
org.apache.hadoop.fs.BlockLocation;
org.apache.hadoop.fs.FSDataInputStream;
org.apache.hadoop.fs.FSDataOutputStream;
org.apache.hadoop.fs.FileStatus;
org.apache.hadoop.fs.FileSystem;
org.apache.hadoop.fs.Path;
org.apache.hadoop.hdfs.DistributedFileSystem;
org.apache.hadoop.hdfs.protocol.DatanodeInfo;
public class HDFSClient {

public HDFSClient() {
}
public static void printUsage(){
System.out.println("Usage: hdfsclient
}
add" + "<local_path> <hdfs_path>"

read" + "<hdfs_path>");
delete" + "<hdfs_path>");
mkdir" + "<hdfs_path>");
copyfromlocal" + "<local_path> <hdfs
copytolocal" + " <hdfs_path> <local_
modificationtime" + "<hdfs_path>"
getblocklocations" + "<hdfs_path>"
gethostnames");

Configuration config =
Path("/home/hadoop/hadoop/conf/mapred-site.xml"
pdfcrowd.com
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
config.addResource(new Path("/home/hadoop/hadoop/conf/mapred-site.xml"
FileSystem hdfs = FileSystem.get(config);
boolean isExists = hdfs.exists(source);
return isExists;
}
public void getHostnames () throws IOException{
Configuration config = new Configuration();
config.addResource(new Path("/home/hadoop/hadoop/conf/core-site.xml"));
config.addResource(new Path("/home/hadoop/hadoop/conf/hdfs-site.xml"));
config.addResource(new Path("/home/hadoop/hadoop/conf/mapred-site.xml"
FileSystem fs = FileSystem.get(config);
DistributedFileSystem hdfs = (DistributedFileSystem) fs;
DatanodeInfo[] dataNodeStats = hdfs.getDataNodeStats();
String[] names = new String[dataNodeStats.length];
for (int i = 0; i < dataNodeStats.length; i++) {
names[i] = dataNodeStats[i].getHostName();
System.out.println((dataNodeStats[i].getHostName()));
}
}
public void getBlockLocations(String source) throws IOException{

pdfcrowd.com
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116

if (!(ifExists(srcPath))) {
return;
}
String filename = source.substring(source.lastIndexOf('/') + 1, source.len
BlockLocation[] blkLocations = fileSystem.getFileBlockLocations(fileStatus

int blkCount = blkLocations.length;
System.out.println("File :" + filename + "stored at:");
for (int i=0; i < blkCount; i++) {
String[] hosts = blkLocations[i].getHosts();
System.out.format("Host %d: %s %n", i, hosts);
}
}
public void getModificationTime(String source) throws IOException{

pdfcrowd.com
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
return;
}
long modificationTime = fileStatus.getModificationTime();
System.out.format("File %s; Modification time : %0.2f %n",filename,modific

}
public void copyFromLocal (String source, String dest) throws IOException

return;
}

pdfcrowd.com
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
try{
fileSystem.copyFromLocalFile(srcPath, dstPath);
System.exit(1);
}finally{
fileSystem.close();
}
}
public void copyToLocal (String source, String dest) throws IOException {

return;
}

try{
fileSystem.copyToLocalFile(srcPath, dstPath);
pdfcrowd.com
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
fileSystem.copyToLocalFile(srcPath, dstPath);
System.exit(1);
}finally{
fileSystem.close();
}
}
public void renameFile (String fromthis, String tothis) throws IOException

Path fromPath = new Path(fromthis);
Path toPath = new Path(tothis);
if (!(fileSystem.exists(fromPath))) {
System.out.println("No such destination " + fromPath);
return;
}
if (fileSystem.exists(toPath)) {
System.out.println("Already exists! " + toPath);
return;
}
try{
boolean isRenamed = fileSystem.rename(fromPath, toPath);
if(isRenamed){
pdfcrowd.com
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248

}
System.out.println("Exception :" + e);
System.exit(1);
}finally{
fileSystem.close();
}
}
public void addFile(String source, String dest) throws IOException {
// Conf object will read the HDFS configuration parameters

// Create the destination path including the filename.
if (dest.charAt(dest.length() - 1) != '/') {
dest = dest + "/" + filename;
} else {
dest = dest + filename;
}
Path path = new Path(dest);
pdfcrowd.com
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
System.out.println("File " + dest + " already exists");
return;
}
// Create a new file and write data to it.
FSDataOutputStream out = fileSystem.create(path);
InputStream in = new BufferedInputStream(new FileInputStream(
new File(source)));
int numBytes = 0;
}
// Close all the file descripters
in.close();
out.close();
fileSystem.close();
}
public void readFile(String file) throws IOException {
return;
pdfcrowd.com
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
return;
}
FSDataInputStream in = fileSystem.open(path);
String filename = file.substring(file.lastIndexOf('/') + 1,
file.length());
OutputStream out = new BufferedOutputStream(new FileOutputStream(
new File(filename)));
int numBytes = 0;
}
in.close();
out.close();
fileSystem.close();
}
public void deleteFile(String file) throws IOException {
return;
pdfcrowd.com
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
return;
}
fileSystem.delete(new Path(file), true);
fileSystem.close();
}
public void mkdir(String dir) throws IOException {
Path path = new Path(dir);
System.out.println("Dir " + dir + " already exists!");
return;
}
fileSystem.mkdirs(path);
fileSystem.close();
}
public static void main(String[] args) throws IOException {
if (args.length < 1) {
printUsage();
System.exit(1);
}
pdfcrowd.com
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
HDFSClient client = new HDFSClient();

if (args[0].equals("add")) {
System.out.println("Usage: hdfsclient add <local_path> " + "<hdfs_path>"
System.exit(1);
}
client.addFile(args[1], args[2]);
} else if (args[0].equals("read")) {
System.out.println("Usage: hdfsclient read <hdfs_path>");
System.exit(1);
}
client.readFile(args[1]);
} else if (args[0].equals("delete")) {
System.out.println("Usage: hdfsclient delete <hdfs_path>");
System.exit(1);
}
client.deleteFile(args[1]);
} else if (args[0].equals("mkdir")) {
System.out.println("Usage: hdfsclient mkdir <hdfs_path>");
System.exit(1);
}
client.mkdir(args[1]);
}else if (args[0].equals("copyfromlocal")) {
System.out.println("Usage: hdfsclient copyfromlocal <from_local_path> <to_
System.exit(1);
pdfcrowd.com
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
System.exit(1);
}
client.copyFromLocal(args[1], args[2]);
} else if (args[0].equals("rename")) {
System.out.println("Usage: hdfsclient rename <old_hdfs_path> <new_hdfs_pat
System.exit(1);
}
client.renameFile(args[1], args[2]);
}else if (args[0].equals("copytolocal")) {
System.out.println("Usage: hdfsclient copytolocal <from_hdfs_path> <to_loc
System.exit(1);
}
client.copyToLocal(args[1], args[2]);
}else if (args[0].equals("modificationtime")) {
System.out.println("Usage: hdfsclient modificationtime <hdfs_path>");
System.exit(1);
}
client.getModificationTime(args[1]);
}else if (args[0].equals("getblocklocations")) {
System.out.println("Usage: hdfsclient getblocklocations <hdfs_path>");
System.exit(1);
}
client.getBlockLocations(args[1]);
} else if (args[0].equals("gethostnames")) {
pdfcrowd.com
413
414
415
416
417
418
419
420
421
422
423
client.getHostnames();
}else {
printUsage();
System.exit(1);
}
System.out.println("Done!");
}
}
About these ads
SHARE THIS:
Print
Email
Like PRO version

open in browser
Facebook 1
Twitter 3
pdfcrowd.com
Like
Be the first to like this.
RELATED
HBase administration using the

Java API, using code examples
In "HBase"
Hadoop or Hadoop datanode

Installation Tutorial (On a cluster)
In "Hadoop"
Hadoop or Hadoop datanode Installation

Tutorial (On a cluster)
A Simple Chat Server in Java

using threads Part 1
In "GUI"
Shell scripting: alias tutorial
26 THOUGHTS ON A HDFSCLIENT FOR HADOOP USING THE NATIVE JAVA API,

A TUTORIAL
ABHISHEK SAGAR SAYS:
pdfcrowd.com
Rate This
I have the following Queries :

1. Hadoop framework automatically split the file into blocks and distribute the blocks on cluster nodes.
My Question is , Can i control the block to go to which cluster machine through java API ? That is , i want
to assign a cluster machine to a file block based on some criteria ?
2. Also , instead of specifying the input as the file name stored on HDFS to MapReduce job , can i
specify the block as an input to MapReduce job , that is , can i specify the input at block level instead of
file level.
please , somebody help me !!
April 22, 2012 at 3:14 pm Reply
ROOKIE SAYS:
1
Rate This
1. Hadoop framework automatically split the file into blocks and distribute the blocks on cluster
nodes. My Question is , Can i control the block to go to which cluster machine through java API ?
That is , i want to assign a cluster machine to a file block based on some criteria ?
I have heard of setting the replication level for an individual block on HDFS. And may I know the
pdfcrowd.com
reason why you need to keep all the blocks of an individual file on one machine? That is possible
with a replication factor of one. Besides, keeping everything on a single machine does not help in
parallelism. Hope you already know how blocks are stored as per disk and then to the same rack and
then perhaps another data center.
I have not tried this Java API, but you should look at this org.apache.hadoop.fs.BlockLocation. In the
constructor, you can specify the hosts. Hope this suites your need. And let me know if you find some
other way to do it.
2. Also , instead of specifying the input as the file name stored on HDFS to MapReduce job , can i
specify the block as an input to MapReduce job , that is , can i specify the input at block level instead
of file level.
Again, never read about it. Is this for some research work? Look into these in the documentation:
org.apache.hadoop.fs.s3.Block
You get the block id from this API, maybe you can put this to use.
April 23, 2012 at 2:53 am Reply
SALMAKHALIL SAYS:
0
Rate This
Hi,
pdfcrowd.com
Did you find away to answer your question: how blocks are stored as per disk and then to the
same rack and then perhaps another data center.
Thanks in Advance,
Salma
December 1, 2012 at 10:34 pm Reply
RIZWAN MIAN SAYS:

0
Rate This
I am curious to know why do you want these operations?:

Can i control the block to go to which cluster machine through java API ?
can i specify the block as an input to MapReduce job
January 25, 2015 at 6:22 pm Reply
SANDEEP SAYS:
1
Rate This
Hi some of them are throwing errors like

org.apache.hadoop.ipc.RemoteException: org.apache.hadoop.security.AccessControlException: Access
denied for user SReddy. Superuser privilege is required
pdfcrowd.com
How to run them with user HDFS ??

July 23, 2012 at 7:17 pm Reply
ROOKIE SAYS:
1
Rate This
You should be running it as user who has access to HDFS or with appropriate permissions. This is a
working code tested on hadoop 0.20.2.
August 6, 2012 at 4:28 pm Reply
PUNEET PANT SAYS:

0
Rate This
There is a correction in listing 2. It should be fileSystem.copyToLocalFile(srcPath, dstPath) and even the

method name should be in accordance.
July 24, 2012 at 1:23 pm Reply
ROOKIE SAYS:
pdfcrowd.com
Rate This
Yes, thank you Puneet for pointing out. Will make a correction.
August 6, 2012 at 4:31 pm Reply
JAVA TUTORIAL SAYS:

0
Rate This
Good way of describing, and nice post to obtain data

about my presentation subject matter, which i am going to deliver in academy.
April 26, 2013 at 8:15 am Reply
NPK SAYS:
0
Rate This
Thank you. Good to know that it was helpful.

May 3, 2013 at 11:30 am Reply
RAMESHCHARYKOTHA SAYS:
pdfcrowd.com
Rate This
Thank you. Nice program.

February 27, 2014 at 5:33 pm Reply
SOMESH MANDA SAYS:

0
Rate This
Nice Article. I am in US and like to talk to you. Please email me your contact info to
someshmnda@gmail.com
June 15, 2014 at 12:52 pm Reply
VIGNESHWARAN SAYS:
0
Rate This
Nice article.
Hadoop Training in Chennai
July 11, 2014 at 4:11 am Reply
VERSION COMPATIBILITY ISSUE WITH TM PACKAGE SAYS:

pdfcrowd.com
Rate This[] Actually the -copyFromLocal function inside the hdfs in Hadoop is normally
written in Java program []
September 5, 2014 at 11:01 am Reply
ISA SAYS:
0
Rate This
Nicely covered basic topics. Thank you for your work.

September 26, 2014 at 6:47 am Reply
BCC SAYS:
0
Rate This
Your article is very good but Im having a problem. My error is the following:
DEBUG NativeCodeLoader Trying to load the custom-built native-hadoop library
DEBUG NativeCodeLoader Failed to load native-hadoop with error: java.lang.UnsatisfiedLinkError: no
hadoop in java.library.path
DEBUG NativeCodeLoader java.library.path=/usr/java/packages/lib/amd64:/usr/lib64:/lib64:/lib:/usr/lib
pdfcrowd.com
WARN NativeCodeLoader Unable to load native-hadoop library for your platform using builtin-java
classes where applicable
DEBUG JniBasedUnixGroupsMappingWithFallback Falling back to shell based
DEBUG JniBasedUnixGroupsMappingWithFallback Group mapping
impl=org.apache.hadoop.security.ShellBasedUnixGroupsMapping
DEBUG Shell Failed to detect a valid hadoop home directory
java.io.IOException: HADOOP_HOME or hadoop.home.dir are not set.
at org.apache.hadoop.util.Shell.checkHadoopHome(Shell.java:302)
at org.apache.hadoop.util.Shell.(Shell.java:327)
at org.apache.hadoop.util.StringUtils.(StringUtils.java:78)
at org.apache.hadoop.security.Groups.parseStaticMapping(Groups.java:93)
at org.apache.hadoop.security.Groups.(Groups.java:77)
at org.apache.hadoop.security.Groups.getUserToGroupsMappingService(Groups.java:240)
at org.apache.hadoop.security.UserGroupInformation.initialize(UserGroupInformation.java:257)
at org.apache.hadoop.security.UserGroupInformation.ensureInitialized(UserGroupInformation.java:234)
at
org.apache.hadoop.security.UserGroupInformation.loginUserFromSubject(UserGroupInformation.java:749)
at org.apache.hadoop.security.UserGroupInformation.getLoginUser(UserGroupInformation.java:734)
at org.apache.hadoop.security.UserGroupInformation.getCurrentUser(UserGroupInformation.java:607)
pdfcrowd.com
at org.apache.hadoop.fs.FileSystem$Cache$Key.(FileSystem.java:2748)
at org.apache.hadoop.fs.FileSystem$Cache$Key.(FileSystem.java:2740)
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2606)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:368)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:167)
at br.com.proativa.pisemd.business.HadoopBO.createHomeDirectory(HadoopBO.java:25)
at br.com.proativa.pisemd.business.UserBO.salvar(UserBO.java:57)
at br.com.proativa.pisemd.controller.UserBean.save(UserBean.java:73)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at com.sun.el.parser.AstValue.invoke(AstValue.java:234)
at com.sun.el.MethodExpressionImpl.invoke(MethodExpressionImpl.java:297)
at
javax.faces.event.MethodExpressionActionListener.processAction(MethodExpressionActionListener.java:153)
at javax.faces.event.ActionEvent.processListener(ActionEvent.java:88)
at javax.faces.component.UIComponentBase.broadcast(UIComponentBase.java:772)
at javax.faces.component.UICommand.broadcast(UICommand.java:300)
at javax.faces.component.UIViewRoot.broadcastEvents(UIViewRoot.java:775)
at javax.faces.component.UIViewRoot.processApplication(UIViewRoot.java:1267)
at com.sun.faces.lifecycle.InvokeApplicationPhase.execute(InvokeApplicationPhase.java:82)
at com.sun.faces.lifecycle.Phase.doPhase(Phase.java:101)
pdfcrowd.com
at com.sun.faces.lifecycle.LifecycleImpl.execute(LifecycleImpl.java:118)
at javax.faces.webapp.FacesServlet.service(FacesServlet.java:312)
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:303)
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:208)
at org.primefaces.webapp.filter.FileUploadFilter.doFilter(FileUploadFilter.java:95)
at org.apache.tomcat.websocket.server.WsFilter.doFilter(WsFilter.java:52)
at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:220)
at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:122)
at org.apache.catalina.authenticator.AuthenticatorBase.invoke(AuthenticatorBase.java:501)
at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:170)
at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:98)
at org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:950)
at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:116)
at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:408)
at org.apache.coyote.http11.AbstractHttp11Processor.process(AbstractHttp11Processor.java:1040)
at org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:607)
at org.apache.tomcat.util.net.JIoEndpoint$SocketProcessor.run(JIoEndpoint.java:313)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
pdfcrowd.com
at java.lang.Thread.run(Thread.java:745)
October 13, 2014 at 4:05 pm Reply
RIZWAN MIAN SAYS:

0
Rate This
@BCC
You need to make sure you have appropriate libraries in your class path. In Hortonworks Sanbox 2.1,
these are jars located in:
/usr/lib/hadoop/
/usr/lib/hadoop/client
January 25, 2015 at 6:26 pm Reply
BESANT TECHNOLOGIES SAYS:

0
Rate This
reviews-complaints-testimonials
December 3, 2014 at 8:34 am Reply
pdfcrowd.com
SARFRAZ KHAN SAYS:

0
Rate This
For its very nice tutorial for beginers

January 1, 2015 at 6:04 am Reply
HADOOP JAVA TUTORIAL - PDFENTER.COM SAYS:

0
0
Rate This[] distribute the blocks on cluster nodes. My Question is , Can i control the block to go to
Download A HDFSClient for Hadoop using the native JAVA API, a []
January 16, 2015 at 12:30 am Reply
JAHEER SAYS:
0
Rate This
Thanks For Program.!!!!!!!

February 5, 2015 at 5:28 am Reply
RIADH SAYS:
0
Rate This
first thank you for this tutorial,i have problem please your help,store file to remote hadoop make this
pdfcrowd.com
error ,with terminal is ok the file is copied seccussfuly bur using java api
org.apache.hadoop.ipc.RemoteException(java.io.IOException): File /text10.txt could only be replicated to
0 nodes instead of minReplication (=1). There are 1 datanode(s) running and 1 node(s) are excluded in
this operation.
March 9, 2015 at 11:39 pm Reply
KEVIN SAYS:
0
Rate This
Its really a nice work. Thanks!! I suppose this type of hadoop client has to be exported as jar file and run
in hadoop environment as hadoop jar . Can this be done in Java env directly?? When i tried, the files
system api is invoking the local fs only and not the hdfs.
SACHIN SHARMA SAYS:

0
Rate This
Thanks,your program is extremely helpful..Can I run this code from a remote machine(which is not
part of the hadoop cluster),if yes what file path should I give in order to do so when Iam calling
pdfcrowd.com
getBlockLocations method
VENKATESWARA RAO GANGULA SAYS:

0
Rate This
I am new to Hadoop this awsome..

now i got best view.
thank You so much
July 6, 2015 at 7:33 am Reply
NPK SAYS:
0
Rate This
Good to know. Youre welcome!

August 3, 2015 at 3:37 am Reply
pdfcrowd.com
LEAVE A REPLY
Enter your comment here...
BLOG AT WORDPRESS.COM. | THE MINNOW THEME.
Follow
FOLLOW
LINUXJUNKIES
Get every new post
delivered to your Inbox.
Enter your email address
Sign me up
Build a website with
WordPress.com
pdfcrowd.com

Hadoop

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Hadoop

Uploaded by

Copyright:

Available Formats

LINUXJUNKIES

A HDFSCLIENT FOR HADOOP USING THE NATIVE JAVA API,

HDFSClient client = new HDFSClient();

open in browser PRO version

Are you a developer? Try out the HTML to PDF API

1. Copying from Local file system to HDFS.

hadoop fs -copyFromLocal <local fs> <hadoop fs>

Are you a developer? Try out the HTML to PDF API

FileSystem fileSystem = FileSystem.get(conf);

This is how the Java API looks like:

public void copyFromLocal (String source, String dest) throws IOException {

FileSystem fileSystem = FileSystem.get(conf);

// Get the filename out of the file path

open in browser PRO version

Are you a developer? Try out the HTML to PDF API

2.Copying files from HDFS to the local file system.

hadoop fs -copyToLocal <hadoop fs> <local fs>

FileSystem fileSystem = FileSystem.get(conf);

open in browser PRO version

Are you a developer? Try out the HTML to PDF API

Path dstPath = new Path(dest);

// Get the filename out of the file path

3.Renaming a file in HDFS.

hadoop fs -mv <this name> <new name>

public void renameFile (String fromthis, String tothis) throws IOException{

open in browser PRO version

Are you a developer? Try out the HTML to PDF API

FileSystem fileSystem = FileSystem.get(conf);

4.Upload or add a file to HDFS

public void addFile(String source, String dest) throws IOException {

open in browser PRO version

Are you a developer? Try out the HTML to PDF API

// Conf object will read the HDFS configuration parameters

// Get the filename out of the file path

open in browser PRO version

Are you a developer? Try out the HTML to PDF API

while ((numBytes = in.read(b)) > 0) {

5.Delete a file from HDFS.

hadoop fs -rmr <hdfs path>

If you want to skip the trash also, use:

hadoop fs -rmr -skipTrash <hdfs path>

public void deleteFile(String file) throws IOException {

open in browser PRO version

Are you a developer? Try out the HTML to PDF API

FileSystem fileSystem = FileSystem.get(conf);

6.Get modification time of a file in HDFS.

public void getModificationTime(String source) throws IOException{

FileSystem fileSystem = FileSystem.get(conf);

open in browser PRO version

Are you a developer? Try out the HTML to PDF API

System.out.format("File %s; Modification time : %0.2f %n",filename,modifica

7.Get the block locations of a file in HDFS.

public void getBlockLocations(String source) throws IOException{

FileSystem fileSystem = FileSystem.get(conf);

// Check if the file already exists

open in browser PRO version

Are you a developer? Try out the HTML to PDF API

BlockLocation[] blkLocations = fileSystem.getFileBlockLocations(fileStatus,

8.List all the datanodes in terms of hostnames.

public void getHostnames () throws IOException{

9.Create a new directory in HDFS.

Are you a developer? Try out the HTML to PDF API

Creating a directory will be done as:

hadoop fs -mkdir <hadoop fs path>