Category: APPLICATIONS > MACHINE LEARNING / BIG DATA
The Apache Hadoop software library is a framework that allows for the distributed processing of large data sets across clusters of computers using simple programming models. It is designed to scale up from single servers to thousands of machines, each offering local computation and storage. Rather than rely on hardware to deliver high-availability, the library itself is designed to detect and handle failures at the application layer, so delivering a highly-available service on top of a cluster of computers, each of which may be prone to failures.
Hadoop is available as a module. The module file sets up necessary enviromental virables for Hadoop and provides two commands, cluster_start and cluster_stop, to start and stop a Hadoop cluster with a minimun of 3 nodes.
IMPORTANT: By default, HDFS is set up on local SSD, the data on which will be purged once the job is finished.
#PBS -A your-account-number #PBS -j oe #PBS -l nodes=6 #PBS -l walltime=1:00:00 module load hadoop/2.5.0 #start the hadoop cluster with one name node, #one secondary name node plus resource manager and job history manager, #four data nodes plus node managers cluster-start #hive example wget http://files.grouplens.org/datasets/movielens/ml-100k.zip unzip ml-100k.zip cd ml-100k cat << _EOF_ > hive-script.sql CREATE TABLE u_data ( userid INT, movieid INT, rating INT, unixtime STRING) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' STORED AS TEXTFILE; LOAD DATA LOCAL INPATH './u.data' OVERWRITE INTO TABLE u_data; SELECT COUNT(*) FROM u_data; _EOF_ hive -f hive-script.sql #stop hadoop cluster cluster-stop
To specify hadoop configuration files tailored to your application, please redirect the configuration direcotry by
export HADOOP_CONF_DIR=/path/to/your/configuration/filesafter loading the hadoop module. User specific configuration files should follow the same format as the provided template files at $HADOOP_HOME/etc/hadoop/template.
This package has the following support level : Supported