Nagios3

事前確認事項 †

設定箇所 †

Nagiosで監視される側 †

apt-get install nagios-plugins nagios-nrpe-server

/etc/nagios/nrpe.cfg

allowed_hosts=127.0.0.1,監視する側のIP
# you can place your config snipplets into　nrpe.d/ 
#include_dir=/etc/nagios/nrpe.d/

/etc/nagios/nrpe_local.cfg

nrpe.cfgに含まれないcommandを追加                                
# 全partitionの残りdisk容量を%指定で監視する
command[check_disk_all]=/usr/lib/nagios/plugins/check_disk -w 30% -c 10%
command[check_swap]=/usr/lib/nagios/plugins/check_swap -w 50% -c 10%

Nagios-nrpe-serverを再起動

/etc/init.d/nagios-nrpe-server restart

↑

Nagiosで確認する際の個別設定 †

↑

確認方法 †

↑

contactgroups.cfg †

define contactgroup{
      contactgroup_name        server-admins
      alias                    Server Administrators
      members                  popowa, nagios_watcher
}

server-adminsはすべてのHOSTのdown/Serviceのdownが発生したらメールが飛ぶようになっている（services.cfgのdefault-templateを確認）。contactgroups.cfgのmembersに記載されるuserはcontacts.cfgに記載されている事が前提

↑

contacts.cfg †

個人名をアルファベット順に列挙

define contact{
      contact_name                     popowa
      alias                            Aya
      host_notifications_enabled           1
      service_notifications_enabled        1
      service_notification_period     24x7
      host_notification_period        24x7
      service_notification_options    w,u,c,r
      host_notification_options       d,u,r
      service_notification_commands   notify-service-by-email
      host_notification_commands      notify-host-by-email
      email                           PCメールアドレス
      pager                   　　　　携帯メールアドレス
      can_submit_commands              1
}

上記のcontact_nameは/etc/nagios3/htaccess.userと紐付けることが出来、contactが監視するホスト、サービスのみ表示させる事が可能

↑

hostgroups.cfg †

define hostgroup{
      hostgroup_name           aws
      alias                    AWS Servers
      members                  serverA, serverB
}

hosts.cfgに設定されたホストをグループ化することが出来る。

↑

hosts.cfg †

ホスト名をアルファベット順に列挙している

define host{
      host_name                        serverA　＃サーバ名
      alias                            AWS/clientA/web 　#説明
      address                          XXX.XXX.XXX.XXX #IP
      #parents                         YamahaRTX3000 #サーバの親host(大体はrouter/LBなど)
      check_command                    check-host-alive　
      #hostが生きている、の元になるコマンド（check-host-aliveはIPを元にcheck_httpする）
      check_interval                   0 
    #上記コマンドで死活確認するか
    #（これはservices.cfgに設定しなくても動く監視プログラム設定。
      services.cfgで設定していたらいらない)
      retry_interval                   1
      max_check_attempts               5
      check_period                     24x7 #これはtimeperiods.cfgにある時間設定
      process_perf_data                0
      retain_nonstatus_information     0
      contact_groups                   server-admins #連絡先
      notification_interval            30
      notification_period              24x7
      notification_options             d,u,r
      }

サーバ一覧。サーバには複数のIPが紐づいている場合は、別のhostとして追加したほうがよい。（例えばサーバー自体は生きているが、virtual IPが落ちている時などに確認出来る為）

↑

servicegroups.cfg †

services.cfgに設定されたホストのサービス別にグループ化する事が出来る

define servicegroup{
      servicegroup_name        ping-service
      alias                    Ping Services
      members                  serverA　,PING,serverB,PING
}

membersにはホスト名、サービス名を{xxx,ooo}形式で追加する。

↑

services.cfg †

ホストのサービスを監視する設定ファイル

generic-service(サービステンプレート)

define service{
       name                            generic-service        ; テンプレート名
       notifications_enabled           1       ; 通知有
       event_handler_enabled           1       ; イベントハンドラの有
       process_perf_data               1       ; パフォーマンス情報を保存
       retain_status_information       1       ; ステータス情報を保存
       retain_nonstatus_information    1       ; ステータス情報以外を保存
       is_volatile                     0       ; 通常サービスチェック
       max_check_attempts              3       ; リトライ回数
       normal_check_interval           5       ; 5分間隔でチェック
       retry_check_interval            1       ; リトライ間隔
       check_period                    24x7    ; 24x7でチェック
       notification_interval           240     ; 障害発生から240分したら再通知
       notification_period             24x7    ; 24x7で通知
       notification_options            w,c,r   ; Warning, Critical, Recover時に通知
       contact_groups                  server-admins        
       register                        0       ;
}

実際のサービス監視は以下の通り

define service {
      use                      generic-service
      host_name                serverA
      service_description      SSH
      check_command            check_ssh
}

上記のサービスを他のcontactに送りたい場合は

    contacts	test-user, server-a-user
    contact_groups	client-a-admins

のようにさらに追加する事が出来る。useでgeneric-serviceを選んで場合は、generic-serviceを継承するのでserver-adminsに警告メールが飛ぶことになる

↑

ssh監視 †

check_command check_ssh

↑

ping監視 †

check_command check_ping!100.0,20%!500.0,60%

↑

IPでhttp監視 †

check_command check_http

↑

Virtual Domainでhttpを監視 †

これは設定が必要

check_command check_vhost!www.server-a.com

↑

POPの監視 †

check_command            check_tcp!110!5!10

↑

SMTPの監視 †

check_command            check_tcp!25!5!10

↑

Proxyの監視 †

check_command            check_tcp!3128!5!10

↑

NRPE: disk監視 †

check_command check_nrpe_1arg!check_disk_all

↑

NRPE: load監視 †

check_command            check_nrpe_1arg!check_load

↑

timeperiods.cfg †

監視する時間帯の設定

define timeperiod{
       timeperiod_name 24x7
       alias           24 Hours A Day, 7 Days A Week
       sunday          00:00-24:00
       monday          00:00-24:00
       tuesday         00:00-24:00
       wednesday       00:00-24:00
       thursday        00:00-24:00
       friday          00:00-24:00
       saturday        00:00-24:00
       }

↑

エラー時には †

↑

最新の20件