Cloudera 安装总结。
Cloudera 常用链接
cloudera 安装文档 PDF HTML
cloudera 管理文档 PDF HTML
一、CentOS7.2 系统设置(所有集群内主机都需要设置)
1. 关闭 SELinux
用getenforce
命令检查 SELinux 是否已禁用
$ getenforce
Disabled
修改 SELinux 配置文件
$ sudo vim /etc/selinux/config
SELINUX=disabled
2. 关闭防火墙
sudo systemctl stop firewalld
sudo systemctl disable firewalld
3. 修改 hosts 文件和 hostname 文件
此文件必须群集内所有主机都一致,可以在 master 主机上配置好,然后 scp 到其他 slave 主机
$ sudo vim /etc/hosts
192.168.31.160 master
192.168.31.161 slave1
192.168.31.162 slave2
$ sudo scp /etc/hosts slave1:/etc/hosts
$ sudo scp /etc/hosts slave2:/etc/hosts
# 确保hostname命令的的主机名与hosts中本机的主机名一致
$ sudo vim /etc/hostname
master
$ hostnamectl
4. 设置静态 IP
$ sudo vim /etc/sysconfig/network-scripts/ifcfg-eno
BOOTPROTO="static"
ONBOOT="yes"
IPADDR=192.168.31.160
GATEWAY=192.168.31.1
DNS1=192.168.31.1
5. 设置时间同步
$ sudo yum install -y ntp
$ sudo systemctl enable ntpd
$ sudo systemctl enable ntpdate
$ sudo vim /etc/ntp.conf
server time1.aliyun.com
$ sudo ntpdate time1.aliyun.com
$ timedatectl
6. 安装 CDH 支持的 oracle jdk
卸载系统自带的 openjdk
rpm -qa | grep --color openjdk
sudo yum remove -y java-1.7.0-openjdk-headless.x86_64 java-1.7.0-openjdk.x86_64 java-1.8.0-openjdk-headless.x86_64 java-1.8.0-openjdk.x86_64
从oracle下载 jdk 并安装
# 安装oracle jdk1.8
$ sudo yum install -y jdk-8u144-linux-x64.rpm
7. 调整内核参数
$ sudo sysctl vm.swappiness=0
$ sudo vim /etc/sysctl.conf
vm.swappiness=0
# 使参数生效
$ sudo sysctl -p
# CentOS7.2需要修改/usr/lib/tuned下面的文件,否则开机会动态调整vm.swappiness参数。
$ grep -R 'vm.swappiness' *
latency-performance/tuned.conf:vm.swappiness=10
throughput-performance/tuned.conf:vm.swappiness=10
virtual-guest/tuned.conf:vm.swappiness = 30
# 修改virtual-guest/tuned.conf中的参数
$ sudo vim /usr/lib/tuned/virtual-guest/tuned.conf
vm.swappiness=0
8. 禁止透明大页面预先分配
$ sudo sh -c "echo never > /sys/kernel/mm/transparent_hugepage/defrag"
$ sudo sh -c "echo never > /sys/kernel/mm/transparent_hugepage/enabled"
$ sudo vim /etc/rc.local
echo never > /sys/kernel/mm/transparent_hugepage/defrag
echo never > /sys/kernel/mm/transparent_hugepage/enabled
# /etc/rc.local是/etc/rc.d/rc.local的符号链接,修改rc.local为可执行
sudo chmod +x /etc/rc.d/rc.local
9. 重启机器
sudo reboot
二、安装 Cloudera Manager Server 的主机设置
0. 下载 CM 安装所需 RPM 文件和 parcel 文件
从CM Archive下载 cloudera-manager-installer.bin 文件- 从CM Archive下载 CM5.7.6 的 tar 压缩文件(包含所有 RPM)
- 从CDH Archive下载对应操作系统版本的 parcel 文件,共有三个文件, CentOS7.2 对应文件是:
- 文件下载后,将 CDH-5.7.6-1.cdh5.7.6.p0.6-el7.parcel.sha1 重命名为 CDH-5.7.6-1.cdh5.7.6.p0.6-el7.parcel.sha
mv CDH-5.7.6-1.cdh5.7.6.p0.6-el7.parcel.sha{1,}
1. 为 yum 源添加 cloudera-manager.repo 文件
从CM Archive下载cloudera-manager.repo文件,修改里面的 baseurl 对应到你所安装的版本(我这里的版本是 5.7.6),同时把gpgcheck=1
改为gpgcheck=0
,如果不修改的话,cloudera-manager-installer.bin 安装时会自动把已经安装好的 cloudera rpm 包在线升级到最新版本,gpgkey 那行可以删掉。
$ vim cloudera-manager.repo
$ sudo cp cloudera-manager.repo /etc/yum.repos.d
[cloudera-manager]
# Packages for Cloudera Manager, Version 5, on RedHat or CentOS 7 x86_64
name = Cloudera Manager
baseurl = http://archive.cloudera.com/cm5/redhat/7/x86_64/cm/5.7.6/
gpgcheck = 0
检查在 yum 源是否可以找到 cloudera 相关的包
yum list | grep cloudera
2. 将 parcel 文件放入/opt/cloudera/parcel-repo
将下载好的 CDH 文件(parcel、parcel.sha、manifest.json)移到/opt/cloudera/parcel-repo 目录,如果此步没做,在 Cloudera Manager 进行群集安装时,系统会去网上下载 parcel 文件,此文件大小在 1.4GB 左右
sudo mkdir -p /opt/cloudera
sudo mv ~/cdh /opt/cloudera/parcel-repo
3. 安装 Cloudera Manager 的所有 RPM
解压下载好的 CM5.7.6 压缩包
tar xvzf cm5.7.6-centos7.x86_64
进入解压后的 cm 目录,找到 rpm 文件,然后使用 yum 安装,yum 会自动安装相关依赖包
cd cm/5/RPMS/x86_64
sudo yum localinstall --nogpgcheck -y cloudera-manager-agent-*.rpm cloudera-manager-server-*.rpm cloudera-manager-daemons-*.rpm
注意:如果不使用内置的 PostgreSQL 数据库,则不需要安装 cloudera-manager-server-db 的 RPM 包。
4. 删除 db.properties 文件
这里不使用内置数据库
$ sudo rm -f /etc/cloudera-scm-server/db.properties
5. 执行 installer.bin 安装文件
如果前面的 RPMS 包都已安装,并且 cloudera-manager.repo 文件配置正确,则这一步会很快完成(1 分钟左右)
$ sudo ./cloudera-manager-installer.bin
6. 查看 Cloudera Manager 的服务状态
sudo service --status-all
7. 如果某个 Cloudera 服务没启动,就重启一下该服务
不使用内置数据库,则不用执行
$ sudo systemctl restart cloudera-scm-server-db
sudo systemctl restart cloudera-scm-server
sudo systemctl restart cloudera-scm-agent
8. 查看 7180 端口是否打开
Cloudera Manager Server 使用 7180 端口,重启服务后要等几分钟(有时候需要 5 分钟左右)才能看到 7180 端口
watch sudo netstat -tulpn
使用浏览器访问 Master 服务器的 ip:7180,就可以进入 Cloudera Manager 的 Web 配置界面
三、集群中其它主机上安装 Cloudera Manager Agent
- 为 yum 源添加 cloudera repo 文件,内容与 Master 主机一样
- 只安装 cloudera-manager-agent 和 cloudera-manager-daemons 两个 RPM 包
sudo yum localinstall --nogpgcheck -y cloudera-manager-{agent,daemons}-*.rpm
四、主机角色分配
- Master hosts:运行 Hadoop 的主要进程,例如 HDFS NameNode 和 YARN Resource Manager.
- Utility hosts:运行集群中的非主要进程,例如 Cloudera Manager 和 Hive Metastore
- Edge hosts:一般作为集群中客户端的访问节点来启动一些任务。
- Worker hosts:主要运行 DataNodes 和其它一些分布式进程,如 Impalad。
集群规模 | Master hosts | Utility hosts | Edge hosts | Worker hosts |
---|---|---|---|---|
小规模 | NameNode YARN ResourceManager JobHistory ServerZooKeeper Impala StateStore Kudu Master | Secondary NameNode Cloudera Manager Cloudera Manager Management Service Hive Metastore HiveServer2 Impala Catalog Hue Oozie Flume Gateway configuration | DataNode NodeManager Impalad Kudu tablet server |
五、数据库配置
1、安装 MariaDB 数据库
- 查看 CDH 版本支持的 MariaDB 数据库版本(这里选择 10.2 版本)
- 设置 MariaDB
# 移除旧的InnoDB日志文件
$ sudo service mariadb stop
$ mv /var/lib/mysql/ib_logfile{0,1} /tmp
$ sudo vim /etc/my.cnf.d/server.cnf
[mysqld]
sql_mode=STRICT_ALL_TABLES
transaction-isolation = READ-COMMITTED
# Disabling symbolic-links is recommended to prevent assorted security risks;
# to do so, uncomment this line:
# symbolic-links = 0
key_buffer = 16M
key_buffer_size = 32M
max_allowed_packet = 32M
thread_stack = 256K
thread_cache_size = 64
query_cache_limit = 8M
query_cache_size = 64M
query_cache_type = 1
max_connections = 550
#expire_logs_days = 10
#max_binlog_size = 100M
#log_bin should be on a disk with enough free space. Replace '/var/lib/mysql/mysql_binary_log' with an appropriate path for your system
#and chown the specified folder to the mysql user.
log_bin=/var/lib/mysql/mysql_binary_log
binlog_format = mixed
read_buffer_size = 2M
read_rnd_buffer_size = 16M
sort_buffer_size = 8M
join_buffer_size = 8M
# InnoDB settings
innodb_file_per_table = 1
innodb_flush_log_at_trx_commit = 2
innodb_log_buffer_size = 64M
innodb_buffer_pool_size = 4G
innodb_thread_concurrency = 8
innodb_flush_method = O_DIRECT
innodb_log_file_size = 512M
[mysqld_safe]
log-error=/var/log/mariadb/mariadb.log
pid-file=/var/run/mariadb/mariadb.pid
从MYSQL 官方下载 mysql 的 jdbc,在所有需要连接 MariaDB 的主机上复制一份到/usr/share/java/mysql-connector-java.jar
2、需要数据库的服务
服务名 | 说明 |
---|---|
Cloudera Manager | Contains all the information about services you have configured and their role assignments, all configuration history, commands, users, and running processes. This relatively small database (< 100 MB) is the most important to back up. |
Oozie Server | Contains Oozie workflow, coordinator, and bundle data. Can grow very large. |
Sqoop Server | Contains entities such as the connector, driver, links and jobs. Relatively small. |
Activity Monitor | Contains information about past activities. In large clusters, this database can grow large. Configuring an Activity Monitor database is only necessary if a MapReduce service is deployed. |
Reports Manager | Tracks disk utilization and processing activities over time. Medium-sized. |
Hive Metastore Server | Contains Hive metadata. Relatively small. |
Hue Server | Contains user account information, job submissions, and Hive queries. Relatively small. |
Sentry Server | Contains authorization metadata. Relatively small. |
Cloudera Navigator Audit Server | Contains auditing information. In large clusters, this database can grow large. |
Cloudera Navigator Metadata Server | Contains authorization, policies, and audit report metadata. Relatively small. |
3、创建 Cloudera Manager 数据库
sudo /usr/share/cmf/schema/scm_prepare_database.sh mysql -h <mysql-server> -u root -p[password] --scm-host <cm-server> scm scm scm
4、根据需要创建以下数据库
角色 | 数据库名 | 用户名 | 密码 |
---|---|---|---|
Activity Monitor(如果使用 MapReduce 服务) | amon | amon | amon |
Reports Manager | rman | rman | rman |
Hive Metastore Server | metastore | hive | hive |
Sentry Server | sentry | sentry | sentry |
Cloudera Navigator Audit Server | nav | nav | nav |
Cloudera Navigator Metadata Server | navms | navms | navms |
# 连入mysql
mysql -u root -p
-- 创建aman数据库
create database amon default character set utf8;
grant all on amon.* to 'amon'@'%' identified by 'amon';
-- 创建rman数据库
create database rman default character set utf8;
grant all on rman.* to 'rman'@'%' identified by 'rman';
-- 创建hive数据库
create database metastore default character set utf8;
grant all on metastore.* to 'hive'@'%' identified by 'hive';
5、创建 Oozie 数据库
create database oozie default character set utf8;
grant all on oozie.* to 'oozie'@'localhost' identified by 'oozie';
grant all on oozie.* to 'oozie'@'%' identified by 'oozie';
复制 mysql jdbc 文件到/opt/cloudera/parcels/CDH/lib/ooize/lib
6、创建 Hue 数据库
create database hue default character set utf8 default collate utf8_general_ci;
grant all on hue.* to 'hue'@'%' identified by 'hue';
select * from information_schema.schemata;