顯示具有 Hadoop 標籤的文章。 顯示所有文章
顯示具有 Hadoop 標籤的文章。 顯示所有文章

2009年7月9日 星期四

Hadoop 安裝

前陣子參加了趨勢的雲端設計比賽,雖然沒寫出什麼東西,不過卻學了這套軟體,說穿了就是把自己寫好的程式,將他分散化到不同的server上運行,不過小弟是窮人,沒有那麼多的server,所以就只能先裝在自己的筆電上面,以下就是紀錄我的安裝過程:

Environment: Ubuntu 9.04 Hadoop (現在最新的版本是0.20)

需要安裝:

Java1.6

sudo apt-get install sun-java6-jdk

JDK會被放在/usr/lib/jvm/java-6-sun

這個時候可以去check一下/etc/jvm 應該會像

# /etc/jvm
#
# This file defines the default system JVM search order. Each
# JVM should list their JAVA_HOME compatible directory in this file.
# The default system JVM is the first one available from top to
# bottom.

/usr/lib/jvm/java-6-sun
/usr/lib/jvm/java-gcj
/usr/lib/jvm/ia32-java-1.5.0-sun
/usr/lib/jvm/java-1.5.0-sun
/usr
ssh rsync sudo apt-get install ssh rsync

因為Hadoop之間要用SSH溝通

Adding Hadoop User

sudo addgroup hadoop sudo adduser --ingroup hadoop hadoop

設定SSH

su - hadoop ssh-keygen -t rsa -P "" cat $HOME/.ssh/id_rsa.pub >> $HOME/.ssh/authorized_keys
這樣是建立使用SSH連線不用密碼
接著可以用ssh localhost看看是否正常

Install Hadoop

cd /opt sudo tar zxvf hadoop-0.20.0.tar.gz sudo mv hadoop-0.20.0 hadoop sudo chown -R hadoop:hadoop hadoop

也可以使用人家包好的deb http://www.cloudera.com/hadoop-deb

Configuration

/opt/hadoop/conf/hadoop-env
修改路徑
export JAVA_HOME=/usr/lib/jvm/java-6-sun
/opt/hadoop/conf/hadoop-site.xml

修改內容

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<!-- Put site-specific property overrides in this file. -->

<configuration>

<property>
<name>hadoop.tmp.dir</name>
<value>/your/path/to/hadoop/tmp/dir/hadoop-${user.name}</value>
<description>A base for other temporary directories.</description>
</property>

<property>
<name>fs.default.name</name>
<value>hdfs://localhost:54310</value>
<description>The name of the default file system.  A URI whose
scheme and authority determine the FileSystem implementation.  The
uri's scheme determines the config property (fs.SCHEME.impl) naming
the FileSystem implementation class.  The uri's authority is used to
determine the host, port, etc. for a FileSystem.</description>
</property>

<property>
<name>mapred.job.tracker</name>
<value>localhost:54311</value>
<description>The host and port that the MapReduce job tracker runs
at.  If "local", then jobs are run in-process as a single map
and reduce task.
</description>
</property>

<property>
<name>dfs.replication</name>
<value>1</value>
<description>Default block replication.
The actual number of replications can be specified when the file is created.
The default is used if replication is not specified in create time.
</description>
</property>

</configuration>

Formatting the name node

建立namenode

/opt/hadoop/bin/hadoop namenode -format
啟動Hadoop
/opt/hadoop/bin/start-all.sh

Hadoop Web Interfaces

Hadoop 提供了Web 介面 (conf/hadoop-default.xml):

http://localhost:50030/ - web UI for MapReduce job tracker(s)

http://localhost:50060/ - web UI for task tracker(s)

http://localhost:50070/ - web UI for HDFS name node(s)

Reference:

Running Hadoop On Ubuntu Linux (Single-Node Cluster)