参照netseek的pdf,centos6 64bit
为义马等地区用户提供了全套网页设计制作服务,及义马网站建设行业解决方案。主营业务为网站制作、成都网站制作、义马网站设计,以传统方式定制建设网站,并提供域名空间备案等一条龙服务,秉承以专业、用心的态度为用户提供真诚的服务。我们深信只要达到每一位用户的要求,就会得到认可,从而选择与我们长期合作。这样,我们也可以走得更远!
- nagios 安装步骤
- 1在做安装之前确认要对该机器拥有root 权限。
- 确认你安装好的linux 系统上已经安装如下软件包再继续。
- Apache
- GCC 编译器
- GD库与开发库
- yum -y install httpd gcc glibc glibc-common gd gd-devel
- 2
- 建立nagios 账号
- /usr/sbin/useradd nagios && passwd nagios
- 创建一个用户组名为nagcmd用于从Web 接口执行外部命令
- 用户都加到这个组中
- /usr/sbin/groupadd nagcmd
- /usr/sbin/usermod ‐ G nagcmd nagios
- /usr/sbin/usermod ‐ G nagcmd apache
- 3
- 下载nagios 和插件程序包
- 下载Nagios 和Nagios 插件的软件包( 访问http://www.nagios.org/download/站点以获得最
- 新版本)
- cd /usr/local/src
- wget http://nchc.dl.sourceforge.net/sourceforge/nagios/nagios-3.0.6.tar.gz
- wget http://nchc.dl.sourceforge.net/sourceforge/nagiosplug/nagios-plugins-1.4.13.tar.gz
- 4
- 编译与安装nagios
- cd /usr/local/src
- tar zxvf nagios-3.0.6.tar.gz
- cd nagios-3.0.6
- ./configure --with-command-group=nagcmd --prefix=/usr/local/nagios
- make all
- make install
- make install-init
- make install-config
- make install-commandmode
- 验证程序是否被正确安装。切换目录到安装路径(这里是/usr/local/nagios),看是否存在
- etc、bin、 sbin、 share、 var 这五个目录,如果存在则可以表明程序被正确的安装到系
- 统了。后表是五个目录功能的简要说明:
- 5
- 编译并安装nagios 插件 nagios-plugins
- cd /usr/local/src
- tar zxvf nagios-plugins-1.4.13.tar.gz
- cd nagios-plugins-1.4.13
- ./configure --with-nagios-user=nagios --with-nagios-group=nagios --prefix=/usr/local/nagios
- make && make install
- 验证:
- ls /usr/local/nagios/libexec
- 会显示安装的插件文件,即所有的插件都安装在 libexec 这个目录下
- 6配置WEB 接口
- 方法一:直接在安装nagios 时 make install ‐ webconf
- 创建一个nagiosadmin的用户用于Nagios 的WEB 接口登录。记下你所设置的登录口
- 令,一会儿你会用到它。
- htpasswd ‐ c /usr/local/nagios/etc/htpasswd.users nagiosadmin
- 重启Apache服务以使设置生效。
- service httpdrestart
- 方法二:在httpd.conf最后添加如下内容:
- #for nagios
- ScriptAlias /nagios/cgi-bin /usr/local/nagios/sbin
- Options ExecCGI
- AllowOverride None
- Order allow,deny
- Allow from all
- AuthName "Nagios Access"
- AuthType Basic
- AuthUserFile /usr/local/nagios/etc/htpasswd
- Require valid-user
- Alias /nagios /usr/local/nagios/share
- Options None
- AllowOverride None
- Order allow,deny
- Allow from all
- AuthName "Nagios Access"
- AuthType Basic
- AuthUserFile /usr/local/nagios/etc/htpasswd
- Require valid-user
- htpasswd ‐ c /usr/local/nagios/etc/htpasswd test
- New password: (输入123456)
- Re‐ type new password: (再输入一次密码)
- Adding password for user test
- 查看认证文件的内容
- less /usr/local/nagios/etc/htpasswd
- test:OmWGEsBnoGpIc 前半部分是用户名test, 后面是加密后的密码
- 本例添加的是 test 用户名,需要改 cgi.cfg 配置文件,允许test 用户
- vi /usr/local/nagios/etc/cgi.cfg
- authorized_for_system_information=test
- authorized_for_configuration_information=test
- authorized_for_system_commands=test
- authorized_for_all_services=test
- authorized_for_all_hosts=nagiosadmin,test
- authorized_for_all_ service_commands=test
- authorized_for_all_host_commands=test
- 7
- 启动nagios
- 把Nagios 加入到服务列表中以使之在系统启动时自动启动
- chkconfig ‐‐ add nagios
- chkconfig nagios on
- 验证Nagios 的样例配置文件
- /usr/local/nagios/bin/nagios ‐ v /usr/local/nagios/etc/nagios.cfg
- 有可能
- Nagios 3.0.6
- Copyright (c) 1999-2008 Ethan Galstad (http://www.nagios.org)
- Last Modified: 12-01-2008
- License: GPL
- Error: Cannot open main configuration file '/usr/local/‐' for reading! 然后赋予权限也不行 直接重启nagios服务 启动即可
- Nagios 3.0.6 starting... (PID=2821)
- Local time is Thu Feb 16 14:24:25 CST 2012
- Bailing out due to one or more errors encountered in the configuration files. Run Nagios from the command line with the -v option to verify your config before restarting. (PID=2821)
- 如果没有报错,可以启动Nagios 服务
- service nagios start
- service httpd start
- 8 setenforce 0(执行这个命令就可了)
- 令SELinux处于容许模式
- setenforce 0
- 如果要永久性更变它,需要更改/etc/selinux/config 里的设置并重启系统。
- 不关闭SELinux或是永久性变更它的方法是让 CGI 模块在SELinux下指定强制目标模式:
- chcon‐ R‐ t httpd_sys_content_t /usr/local/nagios/sbin/
- chcon‐ R‐ t httpd_sys_content_t /usr/local/nagios/share/
- 9
- 测试
- 登录 http://localhost/nagios/ 输入用户名test和密码123456就可以正常登录了
- 十 如何配置监控远程主机
- 1 在被监控主机上
- 增加用户
- useradd nagios
- 设置密码
- passwd nagios
- 安装nagios插件
- wget http://nchc.dl.sourceforge.net/sourceforge/nagiosplug/nagios-plugins-1.4.13.tar.gz
- tar zxvf nagios-plugins-1.4.13.tar.gz
- cd nagios-plugins-1.4.13
- ./configure
- make
- make install
- chown nagios.nagios /usr/local/nagios/
- chown -R nagios.nagios /usr/local/nagios/libexec/
- 2 nagios 安装nrpe的时候步骤(监控与被监控都要安装)
- tar -zxvf nrpe-2.8.1.tar.gz
- cd nrpe-2.8.1
- ./configure
- make all
- make install-plugin
- make install-daemon
- make install-daemon-config
- 3 vim /usr/local/nagios/etc/nrpe.cfg
- #allowed_hosts=127.0.0.1
- allowed_hosts=127.0.0.1,192.168.1.130(192.168.1.130监控端的地址)
- 改/etc/hosts.allow增加监控机ip
- echo 'nrpe:192.168.1.130' >> /etc/hosts.allow
- 4启动服务
- /usr/local/nagios/bin/nrpe -c /usr/local/nagios/etc/nrpe.cfg -d
- 测试nrpe服务是否正常
- /usr/local/nagios/libexec/check_nrpe -H 127.0.0.1(用127.0.0.1测试 不要用localhost测试)
- NRPE v2.8.1
- 5在监控端(192.168.1.130)测试 看到如下结果说明成功
- /etc/init.d/iptables stop(或者添加允许从被监控端收集信息)
- /usr/local/nagios/libexec/check_nrpe -H 192.168.1.129
- NRPE v2.8.1
- 然后在监控端
- 1 vim /usr/local/nagios/etc/objects/129.cfg 内容如下
- define host{
- use linux-server
- host_name 129
- alias 129
- address 192.168.1.129
- }
- define service{
- use generic-service
- host_name 129
- service_description load
- check_command check_nrpe!check_load
- #使用自定参数
- #check_command check_nrpe!check_load!6.0,5.0,4.0!15.0,8.0,6.0
- }
- vim /usr/local/nagios/etc/nagios.cfg 添加如下内容
- # Definitions for monitoring 192.168.1.129
- cfg_file=/usr/local/nagios/etc/objects/129.cfg
- vim /usr/local/nagios/etc/objects/commands.cfg
- # 'check_nrpe ' command definition
- define command{
- command_name check_nrpe
- command_line $USER1$/check_nrpe -H $HOSTADDRESS$ -c $ARG1$
- }
- 监控机nagios重启
- service nagios reload
- 输入http://192.168.1.130/nagios 就可看到129已经添加成功
- nagios监控swap
- 在被监控机的/usr/local/nagios/etc/nrpe.cfg
- vim /usr/local/nagios/etc/nrpe.cfg添加
- command[check_swap]=/usr/local/nagios/libexec/check_swap -w 20% -c 10%
- nrpe服务重启
- [root@localhost libexec]# ps -ef | grep nrpe
- nagios 2332 1 0 14:24 ? 00:00:00 /usr/local/nagios/bin/nrpe -c /usr/local/nagios/etc/nrpe.cfg -d
- root 2373 28887 0 14:25 pts/0 00:00:00 grep nrpe
- kill -9 2332
- /usr/local/nagios/bin/nrpe -c /usr/local/nagios/etc/nrpe.cfg -d
- 监控端
- /usr/local/nagios/etc/objects/commands.cfg添加
- # check_swap command definition
- define command{
- command_name check_swap
- command_line $USER1$/check_swap -w $ARG1$ -c $ARG2$
- }
- 在下面的文件中
- vim /usr/local/nagios/etc/objects/129.cfg添加
- define service{
- use generic-service
- host_name 129
- service_description swap
- check_command check_nrpe!check_swap
- }
- 重启nagios服务和http服务
- service nagios restart
- service httpd restart
- nagios监控磁盘
- 在被监控机的/usr/local/nagios/etc/nrpe.cfg
- vim /usr/local/nagios/etc/nrpe.cfg添加
- command[check_disk]=/usr/local/nagios/libexec/check_disk -w 20 -c 10 -p /
- nrpe服务重启
- [root@localhost libexec]# ps -ef | grep nrpe
- nagios 2332 1 0 14:24 ? 00:00:00 /usr/local/nagios/bin/nrpe -c /usr/local/nagios/etc/nrpe.cfg -d
- root 2373 28887 0 14:25 pts/0 00:00:00 grep nrpe
- kill -9 2332
- /usr/local/nagios/bin/nrpe -c /usr/local/nagios/etc/nrpe.cfg -d
- 监控端
- /usr/local/nagios/etc/objects/commands.cfg添加
- define command{
- command_name check_disk
- command_line $USER1$/check_disk -w $ARG1$ -c $ARG2$ -p $ARG3$
- }
- 在下面的文件中
- vim /usr/local/nagios/etc/objects/129.cfg添加
- define service{
- use generic-service
- host_name 129
- service_description disk
- check_command check_nrpe!check_disk
- }
- 重启nagios服务和http服务
- service nagios restart
- service httpd restart
- nagios监控内存
- 监控内存脚本如下
- ######################################
- #!/bin/bash
- # check memory script
- TOTAL=`free -m | head -2 |tail -1 |gawk '{print $2}'`
- USED=`free -m | head -2 |tail -1 |gawk '{print $3}'`
- FREE=`free -m | head -2 |tail -1 |gawk '{print $4}'`
- # to calculate free percent
- # use the expression free * 100 / total
- FREETMP=`expr $FREE \* 100`
- PERCENT=`expr $FREETMP / $TOTAL`
- echo "$TOTAL MB Total Memory"
- echo "$USED MB Used Memory"
- echo "$FREE MB ($PERCENT%) Free Memory"
- exit 0
- ######################################
- 在被监控机的/usr/local/nagios/etc/nrpe.cfg
- vim /usr/local/nagios/etc/nrpe.cfg添加
- command[check_mem]=/usr/local/nagios/libexec/check_mem -w 150 -c 200
- 把监控脚本check_mnem放到/usr/local/nagios/libexec/ 并赋予执行权限
- chmod +x /usr/local/nagios/libexec/check_mem
- chown nagios.nagios /usr/local/nagios/libexec/check_mem
- nrpe服务重启
- [root@localhost libexec]# ps -ef | grep nrpe
- nagios 2332 1 0 14:24 ? 00:00:00 /usr/local/nagios/bin/nrpe -c /usr/local/nagios/etc/nrpe.cfg -d
- root 2373 28887 0 14:25 pts/0 00:00:00 grep nrpe
- kill -9 2332
- /usr/local/nagios/bin/nrpe -c /usr/local/nagios/etc/nrpe.cfg -d
- 监控端
- /usr/local/nagios/etc/objects/commands.cfg添加
- define command{
- command_name check_mem
- command_line $USER1$/check_mem -w $ARG1$ -c $ARG2$
- }
- 在下面的文件中
- vim /usr/local/nagios/etc/objects/129.cfg添加
- define service{
- use generic-service
- host_name 129
- service_description memory
- check_command check_nrpe!check_mem
- }
- 重启nagios服务和http服务
- service nagios restart
- service httpd restart
- nagios监控http存活状态
- 被监控机不需要任何操作(因为check_http不需要通过nrpe来监控)
- 监控端
- /usr/local/nagios/etc/objects/commands.cfg已经存在check_http命令 故也不需要操作
- 在下面的文件中
- vim /usr/local/nagios/etc/objects/129.cfg添加
- define service{
- use generic-service
- host_name 129
- service_description http
- check_command check_http(这一行要注意 不是check_nrpe!check_http这种形式)
- }
- 重启nagios服务和http服务
- service nagios restart
- service httpd restart
- 错误解决方法 因为http是采用yum安装的 网站文件路径默认是/var/www/html
- 执行下面命令检测时
- /usr/local/nagios/libexec/check_http -I 192.168.1.129
- 报错如下
- HTTP WARNING: HTTP/1.1 403 Forbidden
- 原因这是因为/var/www/html 下面没有文件所致
- cd /var/www/html
- echo 123 >index.html
- 然后过一会 nagios检测就ok了
- nagios监控MySQL存活状态
- 被监控机登录数据库授权
- mysql> grant all privileges on *.* to xxxxx@192.168.1.130 identified by '123456';
- Query OK, 0 rows affected (0.09 sec)
- mysql> flush privileges;
- Query OK, 0 rows affected (0.08 sec)
- 监控端
- /usr/local/nagios/etc/objects/commands.cfg添加如下内容
- # check_mysql command definition
- define command{
- command_name check_mysql
- command_line $USER1$/check_mysql -H $HOSTADDRESS$ -P $ARG1$ -
- u $ARG2$ -p $ARG3$ (liuyu那个pdf有问题)
- }
- 在下面的文件中
- vim /usr/local/nagios/etc/objects/129.cfg添加
- define service{
- use generic-service
- host_name 129
- service_description mysql
- check_command check_mysql!192.168.1.129!3306!xxxx!123456(这一行liuyu文档上是对的 这一行要注意 不是check_nrpe!check_http这种形式)
- notifications_enabled 0
- }
- 重启nagios服务和http服务
- service nagios restart
- service httpd restart
- nagios监控tomcat存活状态
- 被监控机不需要任何操作(因为check_tcp!8080不需要通过nrpe来监控)
- 监控端
- /usr/local/nagios/etc/objects/commands.cfg已经存在check_tcp命令 故也不需要操作
- 在下面的文件中
- vim /usr/local/nagios/etc/objects/hong221.cfg添加
- define service{
- use generic-service
- host_name hong221
- service_description tomcat
- check_command check_tcp!8080!xxxxx
- }
- 收到检测 执行下面命令
- [root@nagios objects]# /usr/local/nagios/libexec/check_tcp -H xxxxx -p 8080
- TCP OK - 0.141 second response time on port 8080|time=0.141140s;;;0.000000;10.000000
- 重启nagios服务和http服务
- service nagios restart
- service httpd restart
- 然后在监控端就可以看到监控页面了
- nagios配置139邮箱报警
- 关于mail发送邮件139邮箱收不到的解决办法
- tail -f /var/log/maillog 日志报错如下
- Feb 21 17:20:49 localhost postfix/qmgr[2072]: A296612227F: from=
, size=700, nrcpt=1 (queue active) - Feb 21 17:20:49 localhost sendmail[2275]: q1L9KmDa002275: to=xxxxx@139.com, ctladdr=root (0/0), delay=00:00:01, xdelay=00:00:0
- 0, mailer=relay, pri=30221, relay=[127.0.0.1] [127.0.0.1], dsn=2.0.0, stat=Sent (Ok: queued as A296612227F)
- Feb 21 17:20:49 localhost postfix/smtpd[2276]: disconnect from localhost.localdomain[127.0.0.1]
- Feb 21 17:20:50 localhost postfix/smtp[2280]: A296612227F: to=
, relay=mx1.mail.139.com[221.176.9.178]:25, delay - =0.53, delays=0.05/0.01/0.24/0.23, dsn=5.0.0, status=bounced (host mx1.mail.139.com[221.176.9.178] said: 550 985a4f43618db72-3c5de Mail rejected (in reply to end of DATA command))
- Feb 21 17:20:50 localhost postfix/cleanup[2279]: 43FB812227E: message-id=<20120221092050.43FB812227E@localhost.localdomain>
- Feb 21 17:20:50 localhost postfix/qmgr[2072]: 43FB812227E: from=<>, size=2697, nrcpt=1 (queue active)
- Feb 21 17:20:50 localhost postfix/bounce[2281]: A296612227F: sender non-delivery notification: 43FB812227E
- Feb 21 17:20:50 localhost postfix/qmgr[2072]: A296612227F: removed
- 经指点是由于hostname(localhost.localdomain)的问题 可能会被139邮箱当做垃圾邮件
- [root@nagios objects]# cat /etc/sysconfig/network
- NETWORKING=yes
- #HOSTNAME=localhost.localdomain
- HOSTNAME=nagios.localdomain
- [root@nagios objects]# cat /etc/hosts
- 192.168.1.130 nagios.localdomain nagios # Added by NetworkManager
- 127.0.0.1 localhost.localdomain localhost
- ::1 nagios.localdomain nagios localhost6.localdomain6 localhost6
- 故随便改了一个名字 然后重启服务器发现可以使用了 139邮箱也能收到邮件了
- 关于服务报警nagios方面的配置
- 监控机上
- vim /usr/local/nagios/etc/objects/contacts.cfg
- define contact{
- contact_name nagiosadmin ; Short name of user
- use generic-contact ; Inherit default values from generic-contact template (defined abov
- e)
- alias Nagios Admin ; Full name of user
- service_notification_period 24x7
- host_notification_period 24x7
- service_notification_options w,u,c,r
- host_notification_options d,u,r
- service_notification_commands notify-service-by-email
- host_notification_commands notify-host-by-email
- email xxxxx@139.com(写上你要发送到的邮箱里面 139邮箱运维必备) ; <<***** CHANGE THIS TO YOUR EMAIL ADDRESS ******
- }
- define contactgroup{
- contactgroup_name admins
- alias Nagios Administrators
- members nagiosadmin
- }
- 然后重启nagios服务即可
- service nagios restart
- 注意在主机配置文件中 有下面语句的服务出了问题才会报警
- notifications_enabled 1 (1是报警 0为不报警)
- 注意申请139邮箱的时候短信要选长格式的
- 邮件到达通知 要改成24小时的
- vim templates.cfg
- define service{
- name generic-service ; The 'name' of this service template
- active_checks_enabled 1 ; Active service checks are enabled
- passive_checks_enabled 1 ; Passive service checks are enabled/accepted
- parallelize_check 1 ; Active service checks should be parallelized (disabling this can l
- ead to major performance problems)
- obsess_over_service 1 ; We should obsess over this service (if necessary)
- check_freshness 0 ; Default is to NOT check service 'freshness'
- notifications_enabled 1 ; Service notifications are enabled
- event_handler_enabled 1 ; Service event handler is enabled
- flap_detection_enabled 1 ; Flap detection is enabled
- failure_prediction_enabled 1 ; Failure prediction is enabled
- process_perf_data 1 ; Process performance data
- retain_status_information 1 ; Retain status information across program restarts
- retain_nonstatus_information 1 ; Retain non-status information across program restarts
- is_volatile 0 ; The service is not volatile
- check_period 24x7 ; The service can be checked at any time of the day
- max_check_attempts 3 ; Re-check the service up to 3 times in order to determine its final
- (hard) state
- normal_check_interval 10 ; Check the service every 10 minutes under normal conditions
- retry_check_interval 2 ; Re-check the service every two minutes until a hard state can be d
- etermined
- contact_groups admins ; Notifications get sent out to everyone in the 'admins' group
- notification_options w,u,c,r ; Send notifications about warning, unknown, critical, and recovery
- events
- notification_interval 10 (这个就是间隔多少时间发一次报警信息) ; Re-notify about service problems every hour
- notification_period 24x7 ; Notifications can be sent out at any time
- register 0 ; DONT REGISTER THIS DEFINITION - ITS NOT A REAL SERVICE, JUST A TEM
- PLATE!
- }
- nagios相关错误解决方法
- 错误解决方法
- 一 当新增加一台监控主机(举例为129的load)监控项
- 点击Scheduling Queue--129load时 Status Information :这一项提示为CHECK_NRPE: Socket timeout after 10 seconds
- 检查
- 1 首先在监控主机上 执行
- /usr/local/nagios/libexec/check_nrpe -H 192.168.1.129
- 看能不能得到NRPE的版本号
- 然后查看iptables是否有相关限制
- 2 查看文件权限
- cd /usr/local/nagios/etc/objects
- [root@localhost objects]# ll
- total 52
- -rw-r--r-- 1 root root 314 Feb 16 15:58 129.cfg
- -rwxrwxrwx 1 nagios nagios 7856 Feb 16 16:06 commands.cfg
- -rwxrwxrwx 1 nagios nagios 2166 Feb 16 13:58 contacts.cfg
- -rwxrwxrwx 1 nagios nagios 5403 Feb 16 13:58 localhost.cfg
- -rwxrwxrwx 1 nagios nagios 3124 Feb 16 13:58 printer.cfg
- -rwxrwxrwx 1 nagios nagios 3293 Feb 16 13:58 switch.cfg
- -rwxrwxrwx 1 nagios nagios 10812 Feb 16 13:58 templates.cfg
- -rwxrwxrwx 1 nagios nagios 3209 Feb 16 13:58 timeperiods.cfg
- -rwxrwxrwx 1 nagios nagios 4007 Feb 16 13:58 windows.cfg
- 看看新增加的这个监控主机文件权限是不是nagios用户可读可写 不可以的话参照其他文件修改如下
- [root@localhost objects]# ll
- total 52
- -rwxrwxrwx 1 nagios nagios 314 Feb 16 15:58 129.cfg
- -rwxrwxrwx 1 nagios nagios 7856 Feb 16 16:06 commands.cfg
- -rwxrwxrwx 1 nagios nagios 2166 Feb 16 13:58 contacts.cfg
- -rwxrwxrwx 1 nagios nagios 5403 Feb 16 13:58 localhost.cfg
- -rwxrwxrwx 1 nagios nagios 3124 Feb 16 13:58 printer.cfg
- -rwxrwxrwx 1 nagios nagios 3293 Feb 16 13:58 switch.cfg
- -rwxrwxrwx 1 nagios nagios 10812 Feb 16 13:58 templates.cfg
- -rwxrwxrwx 1 nagios nagios 3209 Feb 16 13:58 timeperiods.cfg
- -rwxrwxrwx 1 nagios nagios 4007 Feb 16 13:58 windows.cfg