CentOS7.2 上安装 Cloudera

 

Cloudera 安装总结。

Cloudera 常用链接

cloudera 安装文档 PDF HTML

cloudera 管理文档 PDF HTML

一、CentOS7.2 系统设置(所有集群内主机都需要设置)

1. 关闭 SELinux

getenforce命令检查 SELinux 是否已禁用

$ getenforce
Disabled

修改 SELinux 配置文件

$ sudo vim /etc/selinux/config
SELINUX=disabled

2. 关闭防火墙

sudo systemctl stop firewalld
sudo systemctl disable firewalld

3. 修改 hosts 文件和 hostname 文件

此文件必须群集内所有主机都一致,可以在 master 主机上配置好,然后 scp 到其他 slave 主机

$ sudo vim /etc/hosts
192.168.31.160   master
192.168.31.161   slave1
192.168.31.162   slave2
$ sudo scp /etc/hosts slave1:/etc/hosts
$ sudo scp /etc/hosts slave2:/etc/hosts

# 确保hostname命令的的主机名与hosts中本机的主机名一致
$ sudo vim /etc/hostname
master

$ hostnamectl

4. 设置静态 IP

$ sudo vim /etc/sysconfig/network-scripts/ifcfg-eno
BOOTPROTO="static"
ONBOOT="yes"
IPADDR=192.168.31.160
GATEWAY=192.168.31.1
DNS1=192.168.31.1

5. 设置时间同步

$ sudo yum install -y ntp
$ sudo systemctl enable ntpd
$ sudo systemctl enable ntpdate
$ sudo vim /etc/ntp.conf
server time1.aliyun.com

$ sudo ntpdate time1.aliyun.com
$ timedatectl

6. 安装 CDH 支持的 oracle jdk

卸载系统自带的 openjdk

rpm -qa | grep --color openjdk
sudo yum remove -y java-1.7.0-openjdk-headless.x86_64 java-1.7.0-openjdk.x86_64 java-1.8.0-openjdk-headless.x86_64 java-1.8.0-openjdk.x86_64

oracle下载 jdk 并安装

# 安装oracle jdk1.8
$ sudo yum install -y jdk-8u144-linux-x64.rpm

7. 调整内核参数

$ sudo sysctl vm.swappiness=0
$ sudo vim /etc/sysctl.conf
vm.swappiness=0

# 使参数生效
$ sudo sysctl -p

# CentOS7.2需要修改/usr/lib/tuned下面的文件,否则开机会动态调整vm.swappiness参数。
$ grep -R 'vm.swappiness' *
latency-performance/tuned.conf:vm.swappiness=10
throughput-performance/tuned.conf:vm.swappiness=10
virtual-guest/tuned.conf:vm.swappiness = 30

# 修改virtual-guest/tuned.conf中的参数
$ sudo vim /usr/lib/tuned/virtual-guest/tuned.conf
vm.swappiness=0

8. 禁止透明大页面预先分配

$ sudo sh -c "echo never > /sys/kernel/mm/transparent_hugepage/defrag"
$ sudo sh -c "echo never > /sys/kernel/mm/transparent_hugepage/enabled"
$ sudo vim /etc/rc.local
echo never > /sys/kernel/mm/transparent_hugepage/defrag
echo never > /sys/kernel/mm/transparent_hugepage/enabled
# /etc/rc.local是/etc/rc.d/rc.local的符号链接,修改rc.local为可执行
sudo chmod +x /etc/rc.d/rc.local

9. 重启机器

sudo reboot

二、安装 Cloudera Manager Server 的主机设置

0. 下载 CM 安装所需 RPM 文件和 parcel 文件

mv CDH-5.7.6-1.cdh5.7.6.p0.6-el7.parcel.sha{1,}

1. 为 yum 源添加 cloudera-manager.repo 文件

CM Archive下载cloudera-manager.repo文件,修改里面的 baseurl 对应到你所安装的版本(我这里的版本是 5.7.6),同时把gpgcheck=1改为gpgcheck=0,如果不修改的话,cloudera-manager-installer.bin 安装时会自动把已经安装好的 cloudera rpm 包在线升级到最新版本,gpgkey 那行可以删掉。

$ vim cloudera-manager.repo
$ sudo cp cloudera-manager.repo /etc/yum.repos.d
[cloudera-manager]
# Packages for Cloudera Manager, Version 5, on RedHat or CentOS 7 x86_64
name = Cloudera Manager
baseurl = http://archive.cloudera.com/cm5/redhat/7/x86_64/cm/5.7.6/
gpgcheck = 0

检查在 yum 源是否可以找到 cloudera 相关的包

yum list | grep cloudera

2. 将 parcel 文件放入/opt/cloudera/parcel-repo

将下载好的 CDH 文件(parcel、parcel.sha、manifest.json)移到/opt/cloudera/parcel-repo 目录,如果此步没做,在 Cloudera Manager 进行群集安装时,系统会去网上下载 parcel 文件,此文件大小在 1.4GB 左右

sudo mkdir -p /opt/cloudera
sudo mv ~/cdh /opt/cloudera/parcel-repo

3. 安装 Cloudera Manager 的所有 RPM

解压下载好的 CM5.7.6 压缩包

tar xvzf cm5.7.6-centos7.x86_64

进入解压后的 cm 目录,找到 rpm 文件,然后使用 yum 安装,yum 会自动安装相关依赖包

cd cm/5/RPMS/x86_64
sudo yum localinstall --nogpgcheck -y cloudera-manager-agent-*.rpm cloudera-manager-server-*.rpm cloudera-manager-daemons-*.rpm

注意:如果不使用内置的 PostgreSQL 数据库,则不需要安装 cloudera-manager-server-db 的 RPM 包。

4. 删除 db.properties 文件

这里不使用内置数据库
$ sudo rm -f /etc/cloudera-scm-server/db.properties

5. 执行 installer.bin 安装文件

如果前面的 RPMS 包都已安装,并且 cloudera-manager.repo 文件配置正确,则这一步会很快完成(1 分钟左右)
$ sudo ./cloudera-manager-installer.bin

6. 查看 Cloudera Manager 的服务状态

sudo service --status-all

7. 如果某个 Cloudera 服务没启动,就重启一下该服务

不使用内置数据库,则不用执行
$ sudo systemctl restart cloudera-scm-server-db

sudo systemctl restart cloudera-scm-server
sudo systemctl restart cloudera-scm-agent

8. 查看 7180 端口是否打开

Cloudera Manager Server 使用 7180 端口,重启服务后要等几分钟(有时候需要 5 分钟左右)才能看到 7180 端口

watch sudo netstat -tulpn

使用浏览器访问 Master 服务器的 ip:7180,就可以进入 Cloudera Manager 的 Web 配置界面

三、集群中其它主机上安装 Cloudera Manager Agent

  1. 为 yum 源添加 cloudera repo 文件,内容与 Master 主机一样
  2. 只安装 cloudera-manager-agent 和 cloudera-manager-daemons 两个 RPM 包
sudo yum localinstall --nogpgcheck -y cloudera-manager-{agent,daemons}-*.rpm

四、主机角色分配

  • Master hosts:运行 Hadoop 的主要进程,例如 HDFS NameNode 和 YARN Resource Manager.
  • Utility hosts:运行集群中的非主要进程,例如 Cloudera Manager 和 Hive Metastore
  • Edge hosts:一般作为集群中客户端的访问节点来启动一些任务。
  • Worker hosts:主要运行 DataNodes 和其它一些分布式进程,如 Impalad。
集群规模 Master hosts Utility hosts Edge hosts Worker hosts
小规模 NameNode YARN ResourceManager JobHistory ServerZooKeeper Impala StateStore Kudu Master Secondary NameNode Cloudera Manager Cloudera Manager Management Service Hive Metastore HiveServer2 Impala Catalog Hue Oozie Flume Gateway configuration   DataNode NodeManager Impalad Kudu tablet server

五、数据库配置

官方数据库设置文档

1、安装 MariaDB 数据库

  • 查看 CDH 版本支持的 MariaDB 数据库版本(这里选择 10.2 版本)
  • 设置 MariaDB
# 移除旧的InnoDB日志文件
$ sudo service mariadb stop
$ mv /var/lib/mysql/ib_logfile{0,1} /tmp
$ sudo vim /etc/my.cnf.d/server.cnf
[mysqld]
sql_mode=STRICT_ALL_TABLES

transaction-isolation = READ-COMMITTED
# Disabling symbolic-links is recommended to prevent assorted security risks;
# to do so, uncomment this line:
# symbolic-links = 0

key_buffer = 16M
key_buffer_size = 32M
max_allowed_packet = 32M
thread_stack = 256K
thread_cache_size = 64
query_cache_limit = 8M
query_cache_size = 64M
query_cache_type = 1

max_connections = 550
#expire_logs_days = 10
#max_binlog_size = 100M

#log_bin should be on a disk with enough free space. Replace '/var/lib/mysql/mysql_binary_log' with an appropriate path for your system
#and chown the specified folder to the mysql user.
log_bin=/var/lib/mysql/mysql_binary_log

binlog_format = mixed

read_buffer_size = 2M
read_rnd_buffer_size = 16M
sort_buffer_size = 8M
join_buffer_size = 8M

# InnoDB settings
innodb_file_per_table = 1
innodb_flush_log_at_trx_commit  = 2
innodb_log_buffer_size = 64M
innodb_buffer_pool_size = 4G
innodb_thread_concurrency = 8
innodb_flush_method = O_DIRECT
innodb_log_file_size = 512M

[mysqld_safe]
log-error=/var/log/mariadb/mariadb.log
pid-file=/var/run/mariadb/mariadb.pid

MYSQL 官方下载 mysql 的 jdbc,在所有需要连接 MariaDB 的主机上复制一份到/usr/share/java/mysql-connector-java.jar

2、需要数据库的服务

服务名 说明
Cloudera Manager Contains all the information about services you have configured and their role assignments, all configuration history, commands, users, and running processes. This relatively small database (< 100 MB) is the most important to back up.
Oozie Server Contains Oozie workflow, coordinator, and bundle data. Can grow very large.
Sqoop Server Contains entities such as the connector, driver, links and jobs. Relatively small.
Activity Monitor Contains information about past activities. In large clusters, this database can grow large. Configuring an Activity Monitor database is only necessary if a MapReduce service is deployed.
Reports Manager Tracks disk utilization and processing activities over time. Medium-sized.
Hive Metastore Server Contains Hive metadata. Relatively small.
Hue Server Contains user account information, job submissions, and Hive queries. Relatively small.
Sentry Server Contains authorization metadata. Relatively small.
Cloudera Navigator Audit Server Contains auditing information. In large clusters, this database can grow large.
Cloudera Navigator Metadata Server Contains authorization, policies, and audit report metadata. Relatively small.

3、创建 Cloudera Manager 数据库

sudo /usr/share/cmf/schema/scm_prepare_database.sh mysql -h <mysql-server> -u root -p[password] --scm-host <cm-server> scm scm scm

4、根据需要创建以下数据库

角色 数据库名 用户名 密码
Activity Monitor(如果使用 MapReduce 服务) amon amon amon
Reports Manager rman rman rman
Hive Metastore Server metastore hive hive
Sentry Server sentry sentry sentry
Cloudera Navigator Audit Server nav nav nav
Cloudera Navigator Metadata Server navms navms navms
# 连入mysql
mysql -u root -p
-- 创建aman数据库
create database amon default character set utf8;
grant all on amon.* to 'amon'@'%' identified by 'amon';

-- 创建rman数据库
create database rman default character set utf8;
grant all on rman.* to 'rman'@'%' identified by 'rman';

-- 创建hive数据库
create database metastore default character set utf8;
grant all on metastore.* to 'hive'@'%' identified by 'hive';

5、创建 Oozie 数据库

create database oozie default character set utf8;
grant all on oozie.* to 'oozie'@'localhost' identified by 'oozie';
grant all on oozie.* to 'oozie'@'%' identified by 'oozie';

复制 mysql jdbc 文件到/opt/cloudera/parcels/CDH/lib/ooize/lib

6、创建 Hue 数据库

create database hue default character set utf8 default collate utf8_general_ci;
grant all on hue.* to 'hue'@'%' identified by 'hue';
select * from information_schema.schemata;