Tuesday, March 30, 2010

Sample Exercise to Create a High Availability Service Group in VCS:

Sample Process Script:
Copy and Save the below script under your mount point (Eg: /mohi/loopy)

#!/bin/ksh
 
# Loopy script for VCS class.
#############################
#
# $1 is Service Group name
# $0 is name of shell script being executed
#
 
while true
do
        echo `date` ${1} Loopy is alive  >> ${0}out
        sleep 4
        echo `date` ${1} Loopy is still alive  >> ${0}out
        sleep 4
done



Service Group Name: mohisg
Participating Nodes: sys11 and sys12

Resources:

Network Resources:
NIC Resource Name: mohinic
IP Resource Name: mohiip

Disk Resources:
DiskGroup Name: mohidg
Volume Name: mohivol
Mount Point: mohimount

Process Resource:
Process Name: mohiprocess


Creating Service Group.
+++++++++++++++++
# haconf -makerw
# hagrp -add mohisg
# hagrp -modify mohisg SystemList sys11 0 sys22 1
# hagrp -modify mohisg AutoStartList sls11
# hagrp -display mohisg
# haconf -dump
# view /etc/VRTSvcs/conf/config/main.cf


Adding a NIC resource to a service group
+++++++++++++++++++++++++++
# hares -add mohinic NIC mohisg
# hares -modify mohinic Critical 0
# hares -modify mohinic Device hme0
# hares -modify mohinic NetworkHosts 192.168.1.11
# hares -modify mohinic Enabled 1
# hares -state mohinic


Adding a IP resource
++++++++++++++++
# hares -add mohiip IP mohisg
# hares -modify mohiip Critical 0
# hares -modify mohiip Device hme0
# hares -modify mohiip Address 192.168.1.92
# hares -modify mohiip Address 192.168.1.93
# hares -modify mohiip Enabled 1
# hares -online mohiip -sys sls11
# hares -state mohiip
# hastatus -sum
# haconf -dump


Adding a DiskGroup resource
++++++++++++++++++++++
# hares -add mohidg DiskGroup mohisg
# hares -modify mohidg Critical 0
# hares -modify mohidg DiskGroup mohidg
# hares -modify mohidg mohidg Enabled 1
# hares -online mohidg -sys sls11
# hares -state mohidg
# vxdg list | grep mohidg
# haconf -dump
# view main.cf


Adding Volume resource
++++++++++++++++++
# hares -add mohivol Volume mohisg
# hares -modify mohivol Critical 0
# hares -modify mohivol Volume mohivol
# hares -modify mohivol DiskGroup mohidg
# hares -modify mohivol Enabled 1
# hares -display mohivol
# hares -online mohivol -sys sls11
# hares -state mohivol
# vxprint -g mohidg
# haconf -dump


Adding a MountPoint resource
+++++++++++++++++++++++
# hares -add mohimount mount mohisg
# hares -modify mohimount Critical 0
# hares -modify mohimount MountPoint /mohi
# hares -modify mohimount BlockDevice /dev/vx/dsk/mohidg/mohivol
# hares -modify mohimount FSType vxfs
# hares -modify mohimount FSCKopt %-y
# hares -modify mohimount Enabled 1
# hares -display mohimount
# hares -online mohimount -sys sls11
# hares -state mohimount
# haconf -dump
# view main.cf


Adding a Process Resource
++++++++++++++++++++
# hares -add mohiprocess Process mohisg
# hares -modify mohiprocess Critical 0
# hares -modify mohiprocess PathName /bin/sh
# hares -modify mohiprocess Arguments "/mohi/loopy mohisg"
# hares -modify mohiprocess Enabled 1
# hares -display mohiprocess
# hares -online mohiprocess -sys sls11
# hares -state mohiprocess

# ps -ef | grep loopy
# haconf -dump
# view main.cf


Linking Resources in the service group
+++++++++++++++++++++++++++++
# hares -link mohiip mohinic
# hares -link mohivol mohidg
# hares -link mohimount mohivol
# hares -link mohiprocess mohiip
# hares -link mohiprocess mohimount
# hares -dep | grep mohisg
# haconf -dump -makerw

Testing the Service Group
++++++++++++++++++++

#hastatus -sum
#hagrp -switch mohisg -to sys12
#hagrp -state mohisg
#hagrp -switch mohisg -to sys11
#hagrp -state mohisg

Things to Remember

Service Group: Collection of dependent Resources
Resource: Anything that the end user requires
Resource Type: Collection of the resources with same type
Agents: To manage the Resource Types (Start,Stop and Monitor)
Service Group Online: Child Resource to Parent Resource
Service Group Offline: Parent Resource to Child Resource

LLT Files
/etc/llthosts
/etc/llttab

GAB Files:
/etc/gabtab

Manipulating Service Groups:
1. hagrp -offline AppSG -sys S1 -localclus --> Offline the AppSG only in S1 system (node)
2. hagrp -offline OracleSG -any --> Offline the OracleSG in all the systems
3. hagrp -online AppSG -sys S2 -localclus --> Online the AppSG in node S2
4. hagrp -switch AppSG -to S1  -->  AppSG will be moved to node S1

Manipulating Resources:
1. hares -offline Oralistener -sys S3  -->  Bring offline the Oralistener resource in node S3
2. hares -online ipres -sys S2  ->  Bring online the ipres resource in node S2

Handling VCS services:
haconf -dump -makero --> sync the RAM's main.cf with hardisk's main.cf and make the status as Readonly

hastop -all --> Stop the application and cluster

hastop -all -force --> Application will be continue running but the cluster service has been stopped
hastop -local --> stop the cluster service in local node

Useful Commands

SERVICE GROUPS AND RESOURCE OPERATIONS:
Configuring service groups
hagrp –add|-delete|-online|-offline group_name

Modifying resources
hares –add|-delete res_name type group
hares –online|-offline res_name –sys system_name

Modifying agents
haagent –start|-stop agent_name –sys system_name

BASIC CONFIGURATION OPERATIONS:
Service Goups
hagrp -modify group_name attribute_name value
hagrp –list group_name
hagrp –value attribute_name

Resources
hares -modify res_name attribute_name value
hares -link res_name res_name


Agents
haagent -display agent_name –sys system_name
hatype –modify

VCS ENGINE OPERATIONS:
Starting had
hastart –force|–stale system_name
hasys –force system_name

Stopping had
hastop –local|-all|-force|-evacuate
hastop –sys system_name

Adding Users
hauser –add user_name

STATUS AND VERIFICATION:
Group Status/Verification
hagrp -display group_name|–state|–resource group_name

Resources Status/Verification
hares -display res_name
hares –list
hares -probe res_name –sys system_name

Agents Status/Verification
haagent –list
haagent -display agent_name –sys system_name
ps –ef|grep agent_name

VCS Status
hastatus –group
LLT Status/Verification
lltconfig –a list
lltstat|lltshow|lltdump

GAB Status/Verification
gabconfig –a
gabdiskhb –l

COMMUNICATION:
Starting and Stopping LLT
lltconfig -U
lltconfig -c
lltconfig -a list

Starting and Stopping GAB
gabconfig –c –n #seed number (eg: gabconfig -c -n 2)
gabconfig –U

Administering Group Services
hagrp –clear|-flush|-switch group_name –sys system_name

Administering Resources
hares –clear|-probe res_name –sys system_name

Administering Agents
haagent -list
haagent -display agent_name –sys system_name

Verify Configuration
hacf –verify

VCS/VxVm vs HACMP/AIX/LVM


VCS Concepts

Concepts
VCS is built on three components: LLT, GAB, and HAD.


LLT (Low-Latency Transport)

veritas uses a high-performance, low-latency protocol for cluster communications. LLT runs directly on top of the data link provider interface (DLPI) layer ver ethernet and has several major junctions:

· sending and receiving heartbeats
· monitoring and transporting network traffic over multiple network links to every active system within the cluster
· load-balancing traffic over multiple links
· maintaining the state of communication
· providing a nonroutable transport mechanism for cluster communications.

Group membership services/Atomic Broadcast (GAB)

GAB provides the following:

· Group Membership Services - GAB manitains the overall cluster membership by the way of its Group Membership Sevices function. Heartbeats are used to determine if a system is active member, joining or leaving a cluster. GAB determines what the position of a system is in within a cluster.

· Atomic Broadcast - Cluster configuration and status information is distributed dynamically to all system within the cluster using GAB's Atomic Broadcast feature. Atomic Broadcast ensures all active system receive all messages, for every resource and service group in the cluster. Atomic means that all system receives the update, if one fails then the change is rolled back on all systems.

High Availability Daemon (HAD)

The HAD tracks all changes within the cluster configuration and resource status by communicating with GAB. Think of HAD as the manager of the resource agents. A companion daemon called hashadow moniotrs HAD and if HAD fails hashadow attempts to restart it. Like wise if hashadow daemon dies HAD will restart it. HAD maintains the cluster state information. HAD uses the main.cf file to build the cluster information in memory and is also responsible for updating the configuration in memory.

VCS architecture

So putting the above altogether we get:

· Agents monitor resources on each system and provide status to HAD on the local system
· HAD on each system send status information to GAB
· GAB broadcasts configuration information to all cluster members
· LLT transports all cluster communications to all cluster nodes
· HAD on each node takes corrective action, such as failover, when necessary

Service Groups


There are three types of service groups:
· Failover - The service group runs on one system at any one time.
· Parallel - The service group can run simultaneously pn more than one system at any time.
· Hybrid - A hybrid service group is a combination of a failover service group and a parallel service group used in VCS 4.0 replicated data clusters, which are based on Veritas Volume Replicator.

When a service group appears to be suspended while being brought online you can flush the service group to enable corrective action. Flushing a service group stops VCS from attempting to bring resources online or take them offline and clears any internal wait states.

Resources

Resources are objects that related to hardware and software, VCS controls these resources through these actions:
· Bringing resource online (starting)
· Taking resource offline (stopping)
· Monitoring a resource (probing)

When you link a parent resource to a child resource, the dependency becomes a component of the service group configuration. You can view the dependencies at the bottom of the main.cf file.

Proxy Resource

A proxy resource allows multiple service groups to monitor the same network interface. This reduces the network traffic that would result from having multiple NIC resources in different service groups monitoring the same interface.

Example for Proxy Resource:
Proxy PreProd_proxy (

Critical = 0
TargetResName = PreProd_MultiNICB
)

Phantom Resource

The phantom resource is used to report the actual status of a service group that consists of only persistent resources. A service group shows an online status only when all of its nonpersistent resources are online. Therefore, if a service group has only persistent resources (network interface), VCS considers the group offline, even if the persistent resources are running properly. By adding a phantom resource, the status of the service group is shown as online.

Example for Phantom:
Phantom Phantom_NIC (

)
Configuration
VCS configuration is fairly simple. The three configurations to worry about are LLT, GAB, and VCS resources.

LLT
LLT configuration requires two files: /etc/llttab and /etc/llthosts. llttab contains information on node-id, cluster membership, and heartbeat links. It should look like this:
# llttab -- low-latency transport configuration file

# this sets our node ID, must be unique in cluster
set-node 0

# set the heartbeat links
link hme1 /dev/hme:1 - ether - -
# link-lowpri is for public networks
link-lowpri hme0 /dev/hme:0 - ether - -

# set cluster number, must be unique
set-cluster 0

start
The "link" directive should only be used for private links. "link-lowpri" is better suited to public networks used for heartbeats, as it uses less bandwidth. VCS requires at least two heartbeat signals (although one of these can be a communication disk) to function without complaints.
The "set-cluster" directive tells LLT which cluster to listen to. The llttab needs to end in "start" to tell LLT to actually run.
The second file is /etc/llthosts. This file is just like /etc/hosts, except instead of IP->hostnames, it does llt node numbers (as set in set-node). You need this file for VCS to start. It should look like this:
0       daldev05
1       daldev06

GAB
GAB requires only one configuration file, /etc/gabtab. This file lists the number of nodes in the cluster and also, if there are any communication disks in the system, configuration for them. Ex:
/sbin/gabconfig -c -n2
tells GAB to start GAB with 2 hosts in the cluster. To specify VCS communication disks:
/sbin/gabdisk -a /dev/dsk/cXtXdXs2 -s 16 -p a
/sbin/gabdisk -a /dev/dsk/cXtXdXs2 -s 144 -p h
/sbin/gabdisk -a /dev/dsk/cYtYdYs2 -s 16 -p a
/sbin/gabdisk -a /dev/dsk/cYtYdYs2 -s 144 -p h
-a specifies the disk, -s specifies the start block for each communication region, and -p specifies the port to use, "a" being the GAB seed port and "h" the VCS port. The ports are the same as the network ports used by LLT and GAB, but are simulated on a disk.

VCS
The VCS configuration file(s) are in /etc/VRTSvcs/conf/config. The two most important files are main.cf and types.cf. I like to set $VCSCONF to that directory to make my life easier. main.cf contains the actual VCS configuration for Clusters, Groups, and Resources, while types.cf contains C-like prototypes for each possible Resource.
The VCS configurationis very similar to the C language, but all you are doing is defining variables. Comments are "//" (if you try to use #'s, you'll be unhappy with the result), and you can use "include" statements if you want to break up your configuration to make it more readable. One file you must include is types.cf.
In main.cf, you need to specify a Cluster definition:
cluster iMS ( )
You can specify variables within this cluster definition, but for the most part, the defaults are acceptible. Cluster variables include maximum number of groups per cluster, link monitoring, log size, maximum number of resources, maximum number of types, and a list of user names for the GUI that you will never use and shouldn't install.
You then need to specify the systems in the cluster:
system daldev05 ( )
system daldev06 ( )
These systems must be in /etc/llthosts for VCS to start.
You can also specify SNMP settings for VCS:
snmp vcs (
        Enabled = 1
        IPAddr = 0.0.0.0
        TrapList = { 1 = "A new system has joined the VCS Cluster",
            2 = "An existing system has changed its state",
            3 = "A service group has changed its state",
            4 = "One or more heartbeat links has gone down",
            5 = "An HA service has done a manual restart",
            6 = "An HA service has been manually idled",
            7 = "An HA service has been successfully started" }
        )
IPAddr is the IP address of the trap listener. Enabled defaults to 0, so you need to include this if you want VCS to send traps. You can also specify a list of numerical traps; listed above are the VCS default traps.
Each cluster can have multiple Service Group definitions. The most basic Service Group looks like this:
group iMS5a (
        SystemList = { daldev05, daldev06 }
        AutoStartList = { daldev05 }
        )
You can also set the following variables (not a complete list):
·    FailOverPolicy - you can set which policy is used to determine which system to fail over to, choose from Priority (numerically based on node-id), Load (system with the lowest system load gets failover), or RoundRobin (system with the least number of active services is chosen).
·    ManualOps - whether VCS allows manual (CLI) operation on this Group
·    Parallel - indicats if the service group is parallel or failover
Inside each Service Group you need to define Resources. These are the nuts and bolts of VCS. A full description of the bundled Resources can be found in the Install Guide and a full description of the configuration language can be found in the User's Guide.
Here are a couple of Resource examples:
    NIC networka (
        Device = hme0
        NetworkType = ether
        )

    IP logical_IPa (
        Device = hme0
        Address = "10.10.30.156"
        )
The first line begins with a Resource type (e.g. NIC or IP) and then a globally unique name for that particular resource. Inside the paren block, you can set the variables for each resource.
Once you have set up resources, you need to build a resource dependancy tree for the group. The syntax is "child_resource requires parent_resource." A dependancy tree for the above resources would look like this:
logical_IPa requires networka
The dependancy tree tells VCS which resources need to be started before other resources can be activated. In this case, VCS knows that the NIC hme0 has to be working before resource logical_IPa can be started. This works well with things like volumes and volumegroups; without a dependancy tree, VCS could try to mount a volume before importing the volume group. VCS deactivates all VCS controlled resources when it shuts down, so all virtual interfaces (resource type IP) are unplumbed and volumes are unmounted/exported at VCS shutdown.
Once the configuration is buld, you can verify it by running /opt/VRTSvcs/bin/hacf -verify and then you can start VCS by running /opt/VRTSvcs/bin/hastart.

Commands and Tasks
Here are some important commands in VCS. They are in /opt/VRTSvcs/bin unless otherwise noted. It's a good idea to set your PATH to include that directory.
Manpages for these commands are all installed in /opt/VRTS/man.
·    hastart starts VCS using the current seeded configuration.
·    hastop stops VCS. -all stops it on all VCS nodes in the cluster, -force keeps the service groups up but stops VCS, and -local stop VCS on the current node, and -sys systemname stop VCS on a remote system.
·    hastatus shows VCS status for all nodes, groups, and resources. It waits for new VCS status, so it runs forever unless you run it with the -summary option.
·    /sbin/lltstat shows network statistics (for only the local host) much like netstat -s. Using the -nvv option shows detailed information on all hosts on the network segment, even if they aren't members of the cluster.
·    /sbin/gabconfig sets the GAB configuration just like in /etc/gabtab. /sbin/gabconfig -a show current GAB port status. Output should look like this:
·    daldev05 # /sbin/gabconfig -a
GAB Port Memberships
===============================================================
Port a gen f6c90005 membership 01                             
Port h gen 3aab0005 membership 01                             
The last digits in each line are the node IDs of the cluster members. Any mention of "jeopardy" ports means there's a problem with that node in the cluster.
·    haclus displays information about the VCS cluster. It's not particularly useful because there are other, more detailed tools you can use:
·    hasys controls information about VCS systems. hasys -display shows each host in the cluster and it's current status. You can also set this to add, delete, or modify existing systems in the cluster.
·    hagrp controls Service Groups. It can offline, online (or swing) groups from host to host. This is one of the most useful VCS tools.
·    hares controls Resources. This is the finest granular tool for VCS, as it can add, remove, or modify individual resources and resource attributes.
Here are some useful things you can do with VCS:
Activate VCS: run "hastart" on one system. All members of the cluster will use the seeded configuration. All the resources come up.
Swing a whole Group administratively:
Assuming the system you're running GroupA on is sysa, and you want to swing it to sysb
hagrp -switch GroupA -to sysb
Turn off a particular resource (say, ResourceA on sysa):
hares -offline ResourceA -sys sysa
In a failover Group, you can only online the resource on system on which the group is online, so if ResourceA is a member of GroupA, you can only bring ResourceA online on the system that is running GroupA. To online a resource:
hares -online ResourceA -sys sysa
If you get a fault on any resource or group, you need to clear the Fault on a system before you can bring that resource/group up on it. To clear faults:
hagrp -clear GroupA
hares -clear ResourceA

Caveats
Here are some tricks for VCS:
VCS likes to have complete control of all its resources. It brings up all its own virtual interfaces, so don't bother to do that in your init scripts. VCS also likes to have complete control of all the Veritas volumes and groups, so you shouldn't mount them at boot. VCS will fail to mount a volume unless it is responsible for importing the Volume Group; if you import the VG and then start VCS, it will fail after about 5 minutes and drop the volume without cleaning the FS. So make sure all VCS-controlled VG's are exported before starting VCS.
Resource and Group names have no scope in VCS, so each must be a unique identifier or VCS will fail to load your new configuration. There is no equivalent to perl's my or local. VCS is also very case sensitive, so all Types, Groups, Resources, and Systems must be the same every time. To make matters worse, most of the VCS bundled types use random capitalization to try to fool you. Copy and paste is your friend.
Make sure to create your Resource Dependancy Tree before your start VCS or It will creates a problem to your whole cluster.
The default time-out for LLT/GAB communication is 15 seconds. If VCS detects a system is down on all communcations channels for 15 seconds, it fails all of that system's resource groups over to a new system.
If you use Veritas VM, VCS can't manage volumes in rootdg, so what I do is encapsulate the root disk into rootdg and create new volume in their own VCS managed VG. Don't put VCS and non-VCS volumes in the same VG.
Don't let VCS manage non-virtual interfaces. I did this in testing, and if you fail a real interface, VCS will unplumb it, fail it over to a virtual on the fail-over system. Then when you try to swing it back, it will fail.
Notes on how the configuration is loaded
Because VCS doesn't have any determination of primary/slave for the cluster, VCS needs to determine who has the valid configuration for the cluster. As far as I can tell (because of course it's not documented), this is how it works: When VCS starts, GAB waits a predetermined timeout for the number of systems in /etc/gabtab to join the cluster. At this point, all the systems in the cluster compare local configurations, and the system with the newest config tries to load it. If it's invalid, it pulls down the second newest valid config. If it is valid, all the systems in VCS load that config.

VCS Intro

Veritas Cluster Server:
Veritas Cluster Server is the industry's leading cross-platform clustering solution for minimizing application downtime. Through central management tools, automated failover, features to test disaster recovery plans without disruption, and advanced failover management based on server capacity, Cluster Server allows IT managers to maximize resources by moving beyond reactive recovery to proactive management of application availability.

System Requirements:
Solaris
Solaris 9 & 10 on SPARC
Solaris 10 on x64

AIX
AIX 5.3
AIX 6.1

LINUX
Red Hat Enterprise Linux (RHEL) 5 on x86 and IBM system p
Novell SUSE Linux Enterprise Server (SLES) 10 & 11 on x86 & IBM system p
Oracle Enterprise Linux (OEL) 5 on x86

HP-UX
HP-UX 11i version 1/2/3

Windows (Requires Veritas Storage Foundation)
Windows Server 2003 SP2 (x86): Web Edition
Windows Server 2003 SP2 (x86, x64, IA64): Standard, Enterprise, Datacenter Editions
Windows Server 2003 R2 SP2 (x86, x64): Standard, Enterprise, Datacenter Editions
Windows Server 2008 SP1 or SP2 (x86, x64): Web, Standard (without Hyper-V or in guest), Enterprise (without Hyper-V or in guest), Datacenter (without Hyper-V or in guest) Editions
Windows Server 2008 for Itanium-based Systems SP1 or SP2 (IA64)
Windows Server 2008 R2 (x64): Web, Standard (without Hyper-V or in guest), Enterprise (without Hyper-V or in guest), Datacenter (without Hyper-V or in guest) Editions
Windows Server 2008 R2 for Itanium-based Systems (IA64)
Windows 7 (x86, x64): (use with SFWHA client components)
Windows Vista SP1 or SP2 (x86, x64): Ultimate, Enterprise, Business Editions (use with SFWHA client components)
Windows XP SP2 or SP3 (x86, x64): (use with SFWHA client components)
Vertias Cluster Server for Windows (Standalone)
Windows Server 2003 SP2 (x86, x64, IA64): Standard, Enterprise, Datacenter Editions
Windows Server 2003 R2 SP2 (x86, x64): Standard, Enterprise, Datacenter Editions
Windows 7 (x86, x64): (use with SFWHA client components)
Windows Vista SP1 or SP2 (x86, x64): Ultimate, Enterprise, Business Editions (use with SFWHA client components)
Windows XP SP2 or SP3 (x86, x64): (use with SFWHA client components)

VMware
VMware ESX 3.0, 3.0.1, 3.0.2, 3.5, Virtual Center