高可用集群原理詳解
資源粘性:
資源約束:Constraint
排列約束: (colocation)
資源是否能夠運行于同一節點
score:
正值:可以在一起
負值:不能在一起
位置約束:(location), score(分數)
正值:傾向于此節點
負值:傾向于逃離于此節點
順序約束: (order)
定義資源啟動或關閉時的次序
vip, ipvs
ipvs–>vip
-inf: 負無窮
inf: 正無窮
資源隔離:
節點級別:STONITH
資源級別:
例如:FC SAN switch可以實現在存儲資源級別拒絕某節點的訪問
STONITH:
split-brain: 集群節點無法有效獲取其它節點的狀態信息時,產生腦裂
后果之一:搶占共享存儲
active/active: 高可用
高可用集群原理之共享存儲
IDE:(ATA),130M
SATA:600M
7200rpm
IOPS: 100
SCSI: 320M
SAS:
15000rpm
IOPS: 200
USB 3.0: 400M
機械:
隨機讀寫
順序讀寫
固態:
IDE, SCSI: 并口
SATA, SAS, USB: 串口
DAS:
Direct Attached Storage
直接接到主板總線,BUS
文件:塊
NAS:
Network Attached Storage
文件服務器:文件級別
SAN:
Storage Area network
存儲區域網絡
FC SAN
IP SAN: iSCSI
SCSI: Small Computer System Interface
高可用集群原理之多節點集群
crm:使本身不具備高可用的使其具有高可用,rm本身就是一個腳本。
資源粘性:資源對某點的依賴程度,通過score定義
資源約束:
location: 資源對節點傾向程度
coloation: 資源間依賴性
order: 資源的采取動作的次序
Heartbeat v1 自帶的資源管理器
haresources:
Heartbeat v2 自帶的資源管理器
haresources
crm
Heartbeat v3: 資源管理器crm發展為獨立的項目,pacemaker
Resource Type:
primitive: 主資源,在某一時刻是能運行在某一節點
clone: 可以在多個節點運行
group:把多個primitive歸為組,一般只包含primitive
master/slave: drbd只能運行在兩個節點
RA: Resource Agent
RA Classes:
Legacy heartbeat v1 RA
LSB (/etc/rc.d/init.d/) Linux Standard Base
OCF (Open Cluster Framework)
pacemaker
linbit (drbd)
STONITH:管理硬件stonith設備
隔離級別:
節點級別
STONTIH
資源級別
FC SAN Switch
Stonith設備
1、Power Distribution Units (PDU)
Power Distribution Units are an essential element in managing power capacity and functionality for critical network, server and data center equipment. They can provide remote load monitoring of connected equipment and individual outlet power control for remote power recycling.
2、Uninterruptible Power Supplies (UPS)
A stable power supply provides emergency power to connected equipment by supplying power from a separate source in the event of utility power failure.
3、Blade Power Control Devices
If you are running a cluster on a set of blades, then the power control device in the blade enclosure is the only candidate for fencing. Of course, this device must be
capable of managing single blade computers.
4、Lights-out Devices
Lights-out devices (IBM RSA, HP iLO, Dell DRAC) are becoming increasingly popular and may even become standard in off-the-shelf computers. However, they are inferior to UPS devices, because they share a power supply with their host (a cluster node). If a node stays without power, the device supposed to control it would be just as useless. In that case, the CRM would continue its attempts to fence the node indefinitely while all other resource operations would wait for the fencing/STONITH operation to complete.
5、Testing Devices
Testing devices are used exclusively for testing purposes. They are usually more gentle on the hardware. Once the cluster goes into production, they must be replaced
with real fencing devices.
stonithd
stonithd is a daemon which can be accessed by local processes or over the network. It accepts the commands which correspond to fencing operations: reset, power-off, and power-on. It can also check the status of the fencing device.
The stonithd daemon runs on every node in the CRM HA cluster. The stonithd instance running on the DC node receives a fencing request from the CRM. It is up to this and other stonithd programs to carry out the desired fencing operation.
STONITH Plug-ins
For every supported fencing device there is a STONITH plug-in which is capable of controlling said device. A STONITH plug-in is the interface to the fencing device.
On each node, all STONITH plug-ins reside in /usr/lib/stonith/plugins (or in /usr/lib64/stonith/plugins for 64-bit architectures). All STONITH plug-ins look the same to stonithd, but are quite different on the other side reflecting the nature of the fencing device.
Some plug-ins support more than one device. A typical example is ipmilan (or external/ipmi) which implements the IPMI protocol and can control any device which supports this protocol.
Heartbeat:udp:694
高可用集群之heartbeat安裝配置兩個節點:172.16.100.6 172.16.100.7
vip:172.16.100.1
1.兩個節點互相通信
2.配置主機名:hostname node1.magedu.com uname -a
永久生效:vim /etc/sysconfig/network HOSTNAME=node1.magedu.com
3.ssh雙機互信:
4.配置主機名解析
vim /etc/hosts
172.16.100.6 node1.magedu.com node1
172.16.100.7 node2.magedu.com node2
關閉iptables
5.時間同步
ntpdate 172.16.0.1
service ntpd stop
chkconfig ntpd off
為了保證以后盡可能同步:crontab -e
*/5 * * * * /sbin/ntpdate 172.16.0.1 &> /dev/null
scp /var/spool/cron/root /node2:/var/spool/cron/
epel
heartbeat – Heartbeat subsystem for High-Availability Linux
heartbeat-devel – Heartbeat development package
heartbeat-gui – Provides a gui interface to manage heartbeat clusters
heartbeat-ldirectord – Monitor daemon for maintaining high availability resources, 為ipvs高可用提供規則自動生成及后端realserver健康狀態檢查的組件;
heartbeat-pils – Provides a general plugin and interface loading library
heartbeat-stonith – Provides an interface to Shoot The Other Node In The Head
http://dl.fedoraproject.org/pub/epel/5/i386/repoview/letter_h.group.html
三個配置文件:
1、密鑰文件,600, authkeys
2、heartbeat服務的配置配置ha.cf
3、資源管理配置文件
haresources
vim authkeys
vim ha.cf
logfacility local0
keepalive 1
node node1.magedu.com
node node2.magedu.com
ping 172.16.0.1
vim haresources 保證httpd服務沒啟動,并且chkconfig httpd off
node1.magedu.com IPaddr::162.16.100.1/16/eth0 httpd
訪問:172.16.0.1
模擬172.16.100.6故障,訪問172.16.0.1出現node2.magedu.com
mkdir /web/htdocs -pv
vim /etc/exports
/web/htdocs 172.16.0.0/16(ro)
service nfs restart
showmount -e 172.16.100.10輸出結果正常
關閉服務:mount 172.16.100.10:/web/htdocs /mnt
ls /mnt index.html
umount /mnt
編輯haresources:node1.magedu.com IPaddr::172.16.100.1/16/eth0 Filesystem::172.16.100.10:/web/htdocs::/var/www/html::nfs httpd
scp haresources node2:/etc/ha.d/
tail -f /var/log/messages
高可用集群之heartbeat基于crm進行資源管理
RA classes:
OCF
pacemaker
linbit
LSB
Legacy Heartbeat V1
STONITH
RA: Resource Agent
代為管理資源
LRM: Local Resource Manager
DC:TE,PE
CRM: Cluster Resource Manager
haresource (heartbeat v1)
crm, haresource (heartbeat v2)
pacemaker (heartbeat v3)
rgmanager (RHCS)
為那些非ha-aware的應用程序提供調用的基礎平臺;
crmd:管理API GUI,CLI
web(三個資源):vip,httpd,filesystem
Resource Type:
primitive(native)
group
clone
STONISH
Cluster Filesystem dlm:Distributed Lock Manager
master/slave:drbd
資源粘性:資源是否傾向于留在當前節點
正數:樂意
負數:離開
資源約束:
location:位置約束 colocation:排列約束 order:順序約束
heartbeat:
authkeys
ha.cf
node
bcast、mcast、ucast
haresource
HA:
1、時間同步;2、SSH雙機互信;3、主機名稱要與uname -n,并通過/etc/hosts解析;
CIB: Cluster Information Base
xml格式
crm –> pacemaker
crmd respawn|on
mcast eth0 255.0.100.19 694 1 0
原理簡介
組播報文的目的地址使用D類IP地址, 范圍是從224.0.0.0到239.255.255.255。D類地址不能出現在IP報文的源IP地址字段。單播數據傳輸過程中,一個數據包傳輸的路徑是從源地址路由到目的地址,利用“逐跳”(hop-by-hop)的原理在IP網絡中傳輸。然而在ip組播環中,數據包的目的地址不是一個,而是一組,形成組地址。所有的信息接收者都加入到一個組內,并且一旦加入之后,流向組地址的數據立即開始向接收者傳輸,組中的所有成員都能接收到數據包。組播組中的成員是動態的,主機可以在任何時刻加入和離開組播組。
組播組分類
組播組可以是永久的也可以是臨時的。組播組地址中,有一部分由官方分配的,稱為永久組播組。永久組播組保持不變的是它的ip地址,組中的成員構成可以發生變化。永久組播組中成員的數量都可以是任意的,甚至可以為零。那些沒有保留下來供永久組播組使用的ip組播地址,可以被臨時組播組利用。
224.0.0.0~224.0.0.255為預留的組播地址(永久組地址),地址224.0.0.0保留不做分配,其它地址供路由協議使用;
224.0.1.0~224.0.1.255是公用組播地址,可以用于Internet;
224.0.2.0~238.255.255.255為用戶可用的組播地址(臨時組地址),全網范圍內有效;
239.0.0.0~239.255.255.255為本地管理組播地址,僅在特定的本地范圍內有效。
常用預留組播地址
列表如下:
224.0.0.0 基準地址(保留)
224.0.0.1 所有主機的地址 (包括所有路由器地址)
224.0.0.2 所有組播路由器的地址
224.0.0.3 不分配
224.0.0.4 dvmrp 路由器
224.0.0.5 ospf 路由器
224.0.0.6 ospf dr
224.0.0.7 st 路由器
224.0.0.8 st 主機
224.0.0.9 rip-2 路由器
224.0.0.10 Eigrp 路由器
224.0.0.11 活動代理
224.0.0.12 dhcp 服務器/中繼代理
224.0.0.13 所有pim 路由器
224.0.0.14 rsvp 封裝
224.0.0.15 所有cbt 路由器
224.0.0.16 指定sbm
224.0.0.17 所有sbms
224.0.0.18 vrrp
以太網傳輸單播ip報文的時候,目的mac地址使用的是接收者的mac地址。但是在傳輸組播報文時,傳輸目的不再是一個具體的接收者,而是一個成員不確定的組,所以使用的是組播mac地址。組播mac地址是和組播ip地址對應的。iana(internet assigned number authority)規定,組播mac地址的高24bit為0x01005e,mac 地址的低23bit為組播ip地址的低23bit。
由于ip組播地址的后28位中只有23位被映射到mac地址,這樣就會有32個ip組播地址映射到同一mac地址上。
高可用集群之基于heartbeat和nfs的高可用mysql