2009年7月9日 星期四

Hadoop 安裝

前陣子參加了趨勢的雲端設計比賽,雖然沒寫出什麼東西,不過卻學了這套軟體,說穿了就是把自己寫好的程式,將他分散化到不同的server上運行,不過小弟是窮人,沒有那麼多的server,所以就只能先裝在自己的筆電上面,以下就是紀錄我的安裝過程:

Environment: Ubuntu 9.04 Hadoop (現在最新的版本是0.20)

需要安裝:

Java1.6

sudo apt-get install sun-java6-jdk

JDK會被放在/usr/lib/jvm/java-6-sun

這個時候可以去check一下/etc/jvm 應該會像

# /etc/jvm
#
# This file defines the default system JVM search order. Each
# JVM should list their JAVA_HOME compatible directory in this file.
# The default system JVM is the first one available from top to
# bottom.

/usr/lib/jvm/java-6-sun
/usr/lib/jvm/java-gcj
/usr/lib/jvm/ia32-java-1.5.0-sun
/usr/lib/jvm/java-1.5.0-sun
/usr
ssh rsync sudo apt-get install ssh rsync

因為Hadoop之間要用SSH溝通

Adding Hadoop User

sudo addgroup hadoop sudo adduser --ingroup hadoop hadoop

設定SSH

su - hadoop ssh-keygen -t rsa -P "" cat $HOME/.ssh/id_rsa.pub >> $HOME/.ssh/authorized_keys
這樣是建立使用SSH連線不用密碼
接著可以用ssh localhost看看是否正常

Install Hadoop

cd /opt sudo tar zxvf hadoop-0.20.0.tar.gz sudo mv hadoop-0.20.0 hadoop sudo chown -R hadoop:hadoop hadoop

也可以使用人家包好的deb http://www.cloudera.com/hadoop-deb

Configuration

/opt/hadoop/conf/hadoop-env
修改路徑
export JAVA_HOME=/usr/lib/jvm/java-6-sun
/opt/hadoop/conf/hadoop-site.xml

修改內容

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<!-- Put site-specific property overrides in this file. -->

<configuration>

<property>
<name>hadoop.tmp.dir</name>
<value>/your/path/to/hadoop/tmp/dir/hadoop-${user.name}</value>
<description>A base for other temporary directories.</description>
</property>

<property>
<name>fs.default.name</name>
<value>hdfs://localhost:54310</value>
<description>The name of the default file system.  A URI whose
scheme and authority determine the FileSystem implementation.  The
uri's scheme determines the config property (fs.SCHEME.impl) naming
the FileSystem implementation class.  The uri's authority is used to
determine the host, port, etc. for a FileSystem.</description>
</property>

<property>
<name>mapred.job.tracker</name>
<value>localhost:54311</value>
<description>The host and port that the MapReduce job tracker runs
at.  If "local", then jobs are run in-process as a single map
and reduce task.
</description>
</property>

<property>
<name>dfs.replication</name>
<value>1</value>
<description>Default block replication.
The actual number of replications can be specified when the file is created.
The default is used if replication is not specified in create time.
</description>
</property>

</configuration>

Formatting the name node

建立namenode

/opt/hadoop/bin/hadoop namenode -format
啟動Hadoop
/opt/hadoop/bin/start-all.sh

Hadoop Web Interfaces

Hadoop 提供了Web 介面 (conf/hadoop-default.xml):

http://localhost:50030/ - web UI for MapReduce job tracker(s)

http://localhost:50060/ - web UI for task tracker(s)

http://localhost:50070/ - web UI for HDFS name node(s)

Reference:

Running Hadoop On Ubuntu Linux (Single-Node Cluster)

3 則留言:

  1. hello, 我們也有參加說
    雖然沒入選,但不知道有沒有機會交換心得呢

    回覆刪除
  2. 我也沒有入選捏...
    而且也沒寫出來什麼QQ

    回覆刪除
  3. 雖然說比賽也過了啦

    但過程中總是有些心得,雖然我現在也還在整理

    前陣子有在另個blog有留言
    自已的是在

    他也是有參加比賽,我似乎目前也只有找到你們兩位的blog有寫到這次比賽的事

    想說同校中就算是同系也不一定會有相同興趣(雖然當初是因為獎金參加的因素比較多)

    如果這樣就結束實在有點可惜

    回覆刪除