Home > Hadoop > Hadoop WordCount Compilation errors related with OutputCollector, setInputPath, setOutputPath

Hadoop WordCount Compilation errors related with OutputCollector, setInputPath, setOutputPath


If you have tried Hadoop WordCount sample job available in multiple old tutorials, you may have hit compilation problem as below:

Older Code:

package org.myorg;
import java.io.Exception;
import java.util.*;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.conf.*;
import org.apache.hadoop.io.*;
import org.apache.hadoop.mapred.*;
import org.apache.hadoop.util.*;

public class WordCount {
public static class MapClass extends MapReduceBase implements Mapper<LongWritable, Text, Text, IntWritable> {
private final static IntWritable one = new IntWritable(1);
private Text word = new Text();
public void map(LongWritable key, Text value, OutputCollector<Text, IntWritable> output, Reporter reporter) throws IOException {
String line = value.toString();
StringTokenizer tokenizer = new StringTokenizer(line);
while (tokenizer.hasMoreTokens()) {
word.set(tokenizer.nextToken());
output.collect(word, one);
}
}
}

public static class Reduce extends MapReduceBase implements Reducer<Text, IntWritable, Text, IntWritable> {
public void reduce(Text key, Iterator<IntWritable> values, OutputCollector<Text, IntWritable> output, Reporter reporter) throws IOException {
int sum = 0;
while (values.hasNext()) {
sum += values.next().get();
}
output.collect(key, new IntWritable(sum));
}
}
public static void main(String[] args) throws Exception {

JobConf conf = new JobConf(WordCount.class);
conf.setJobName(“wordcount”);
conf.setOutputKeyClass(Text.class);
conf.setOutputValueClass(IntWritable.class);
conf.setMapperClass(MapClass.class);
conf.setCombinerClass(Reduce.class);
conf.setReducerClass(Reduce.class);
conf.setInputFormat(TextInputFormat.class);
conf.setOutputFormat(TextOutputFormat.class);
conf.setInputPath(new Path(args[1]));
conf.setOutputPath(new Path(args[2]));
JobClient.runJob(conf);
}

}

After Compilation we hit the following error:

WordCount.java:2: error: cannot find symbol
import java.io.Exception;
^
symbol: class Exception
location: package java.io
WordCount.java:14: error: cannot find symbol
public void map(LongWritable key, Text value, OutputCollector<Text, IntWritable> output, Reporter reporter) throws IOException {

^
symbol: class IOException
location: class MapClass
WordCount.java:25: error: cannot find symbol
public void reduce(Text key, Iterator<IntWritable> values, OutputCollector<Text, IntWritable> output, Reporter reporter) throws IOException {

^
symbol: class IOException
location: class Reduce
WordCount.java:44: error: cannot find symbol
conf.setInputPath(new Path(args[1]));
^
symbol: method setInputPath(Path)
location: variable conf of type JobConf
WordCount.java:45: error: cannot find symbol
conf.setOutputPath(new Path(args[2]));
^
symbol: method setOutputPath(Path)
location: variable conf of type JobConf
Note: WordCount.java uses or overrides a deprecated API.
Note: Recompile with -Xlint:deprecation for details.
5 errors

This is because the old sample code is based on older Hadoop distribution. This problem happens when you use 0.20.x or newer Hadoop distribution. Like in my case, I was using 0.20.203.1 as below:

C:\Azure\Java>C:\Apps\java\openjdk7\bin\javac -classpath c:\Apps\dist\hadoop-core-0.20.203.1-SNAPSHOT.jar -d . WordCount.java

To solve this problem you would need to change your  code to as below:

package org.myorg;
import java.io.IOException;
import java.util.*;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.conf.*;
import org.apache.hadoop.io.*;
import org.apache.hadoop.util.*;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.conf.Configured;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
public class AvkashWordCount {
public static class Map extends Mapper
<LongWritable, Text, Text, IntWritable> {
private final static IntWritable one = new IntWritable(1);
private Text word = new Text();

public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {
String line = value.toString();
StringTokenizer tokenizer = new StringTokenizer(line);
while (tokenizer.hasMoreTokens()) {
word.set(tokenizer.nextToken());
context.write(word, one);
}
}
}
public static class Reduce extends Reducer
<Text, IntWritable, Text, IntWritable> {
public void reduce(Text key, Iterator<IntWritable> values, Context context) throws IOException, InterruptedException {
int sum = 0;
while (values.hasNext()) {
sum += values.next().get();
}
context.write(key, new IntWritable(sum));
}
}
public static void main(String[] args) throws Exception {
Configuration conf = new Configuration();
Job job = new Job(conf);
job.setJarByClass(AvkashWordCount.class);
job.setJobName(“avkashwordcountjob”);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);
job.setMapperClass(AvkashWordCount.Map.class);
job.setCombinerClass(AvkashWordCount.Reduce.class);
job.setReducerClass(AvkashWordCount.Reduce.class);
FileInputFormat.addInputPath(job, new Path(args[0]));
FileOutputFormat.setOutputPath(job, new Path(args[1]));
job.waitForCompletion(true);
}
}

Above code is tested with Hadoop 0.20.x and above distribution.

 

Resources:

About these ads
Categories: Hadoop
  1. Ab
    February 25, 2013 at 6:51 am

    Thanks!!!!!!!!

  2. konstantina
    November 18, 2013 at 2:15 am

    I am dealing with the same problem…for every package it complains that it does not exist and for every class shows :”can not find symbol”.

    I executed the command : javac -classpath ${HADOOP_HOME}/hadoop-${HADOOP_VERSION}-core.jar -d Wordcount_classes WordCount.java

    and I still get the same errors.

    I’m using Hadoop 1.2.1 and Pseudo-Distributed Operation.

    If the jar file of the wordcount project is created by eclipse, is it necessary to include the .java file in the command?
    Also, instead of “hadoop-core-12.1.jar”, maybe I should put more jar files?

    If anyone is using this same Hadoop edition, can you tell me please:

    1) I am running the above command being in the folder that Hadoop was installed. I tried to do this being in the lib folder, to see if some .jar files are recognized, but there is still the same issue.
    Am I correct to execute commands from this mode (hadoop’s installation)?

    2)If the classpath contains only the core.jar, how will the rest of the libraries be recognized?

  3. nisha
    April 7, 2014 at 10:12 am

    I am a fresher in map reduce programming and not getting success to execute even a single map reduce code in ubuntu.whats the step to compile and run a map reduce program successfully in ubuntu?
    I have installed hadoop-2.2.0 version in ubuntu 12.04 under /usr/local/hadoop folder.Also path of javac is /usr/lib/jvm/java-7-oracle/bin/javac.When i am trying to run the simple map reduce program of wordcount it shows error like “package org.apache.hadoop.fs does not exist”, “package org.apache.hadoop.conf does not exist”, “package org.apache.hadoop.io does not exist”, “package org.apache.hadoop.mapred does not exist”, “package org.apache.hadoop.util does not exist”

    How can i fix this error?Also from where i can find this SNAPSHOT.jar file??

  1. No trackbacks yet.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.

Join 49 other followers

%d bloggers like this: