VMware Kernel Return Codes

From time to time I find myself reaming through VMware log files in an effort to diagnose various failure events. This is certainly not my favorite task so to make the process a little less painful I decided to extract the vmkernel return codes from the VMware open source libraries and place an easily accessible tabled version of them on my blog. While it’s not a very interesting blog entry it does have a useful purpose. I promise to get back to some more interesting entries soon.You can also use the console command vmkerrcode -l if it’s handy.

This posted list is only for ESX 3.5.x, vSphere has different error codes and they do not map to ESX 3.5.x

Regards,

Mike

 

0         Success
0xbad0001 Failure
0xbad0002 Would block
0xbad0003 Not found
0xbad0004 Busy
0xbad0005 Already exists
0xbad0006 Limit exceeded
0xbad0007 Bad parameter
0xbad0008 Metadata read error
0xbad0009 Metadata write error
0xbad000a I/O error
0xbad000b Read error
0xbad000c Write error
0xbad000d Invalid name
0xbad000e Invalid handle
0xbad000f No such SCSI adapter
0xbad0010 No such target on adapter
0xbad0011 No such partition on target
0xbad0012 No filesystem on the device
0xbad0013 Memory map mismatch
0xbad0014 Out of memory
0xbad0015 Out of memory (ok to retry)
0xbad0016 Out of resources
0xbad0017 No free handles
0xbad0018 Exceeded maximum number of allowed handles
0xbad0019 No free pointer blocks (deprecated)
0xbad001a No free data blocks (deprecated)
0xbad001b Corrupt RedoLog
0xbad001c Status pending
0xbad001d Status free
0xbad001e Unsupported CPU
0xbad001f Not supported
0xbad0020 Timeout
0xbad0021 Read only
0xbad0022 SCSI reservation conflict
0xbad0023 File system locked
0xbad0024 Out of slots
0xbad0025 Invalid address
0xbad0026 Not shared
0xbad0027 Page is shared
0xbad0028 Kseg pair flushed
0xbad0029 Max async I/O requests pending
0xbad002a Minor version mismatch
0xbad002b Major version mismatch
0xbad002c Already connected
0xbad002d Already disconnected
0xbad002e Already enabled
0xbad002f Already disabled
0xbad0030 Not initialized
0xbad0031 Wait interrupted
0xbad0032 Name too long
0xbad0033 VMFS volume missing physical extents
0xbad0034 NIC teaming master valid
0xbad0035 NIC teaming slave
0xbad0036 NIC teaming regular VMNIC
0xbad0037 Abort not running
0xbad0038 Not ready
0xbad0039 Checksum mismatch
0xbad003a VLan HW Acceleration not supported
0xbad003b VLan is not supported in vmkernel
0xbad003c Not a VLan handle
0xbad003d Couldn’t retrieve VLan id
0xbad003e Connection closed by remote host, possibly due to timeout
0xbad003f No connection
0xbad0040 Segment overlap
0xbad0041 Error parsing MPS Table
0xbad0042 Error parsing ACPI Table
0xbad0043 Failed to resume VM
0xbad0044 Insufficient address space for operation
0xbad0045 Bad address range
0xbad0046 Network is down
0xbad0047 Network unreachable
0xbad0048 Network dropped connection on reset
0xbad0049 Software caused connection abort
0xbad004a Connection reset by peer
0xbad004b Socket is not connected
0xbad004c Can’t send after socket shutdown
0xbad004d Too many references: can’t splice
0xbad004e Connection refused
0xbad004f Host is down
0xbad0050 No route to host
0xbad0051 Address already in use
0xbad0052 Broken pipe
0xbad0053 Not a directory
0xbad0054 Is a directory
0xbad0055 Directory not empty
0xbad0056 Not implemented
0xbad0057 No signal handler
0xbad0058 Fatal signal blocked
0xbad0059 Permission denied
0xbad005a Operation not permitted
0xbad005b Undefined syscall
0xbad005c Result too large
0xbad005d Pkts dropped because of VLAN (support) mismatch
0xbad005e Unsafe exception frame
0xbad005f Necessary module isn’t loaded
0xbad0060 No dead world by that name
0xbad0061 No cartel by that name
0xbad0062 Is a symbolic link
0xbad0063 Cross-device link
0xbad0064 Not a socket
0xbad0065 Illegal seek
0xbad0066 Unsupported address family
0xbad0067 Already connected
0xbad0068 World is marked for death
0xbad0069 No valid scheduler cell assignment
0xbad006a Invalid cpu min
0xbad006b Invalid cpu minLimit
0xbad006c Invalid cpu max
0xbad006d Invalid cpu shares
0xbad006e Cpu min outside valid range
0xbad006f Cpu minLimit outside valid range
0xbad0070 Cpu max outside valid range
0xbad0071 Cpu min exceeds minLimit
0xbad0072 Cpu min exceeds max
0xbad0073 Cpu minLimit less than cpu already reserved by children
0xbad0074 Cpu max less than cpu already reserved by children
0xbad0075 Admission check failed for cpu resource
0xbad0076 Invalid memory min
0xbad0077 Invalid memory minLimit
0xbad0078 Invalid memory max
0xbad0079 Memory min outside valid range
0xbad007a Memory minLimit outside valid range
0xbad007b Memory max outside valid range
0xbad007c Memory min exceeds minLimit
0xbad007d Memory min exceeds max
0xbad007e Memory minLimit less than memory already reserved by children
0xbad007f Memory max less than memory already reserved by children
0xbad0080 Admission check failed for memory resource
0xbad0081 No swap file
0xbad0082 Bad parameter count
0xbad0083 Bad parameter type
0xbad0084 Dueling unmaps (ok to retry)
0xbad0085 Inappropriate ioctl for device
0xbad0086 Mmap changed under page fault (ok to retry)
0xbad0087 Operation now in progress
0xbad0088 Address temporarily unmapped
0xbad0089 Invalid buddy type
0xbad008a Large page info not found
0xbad008b Invalid large page info
0xbad008c SCSI LUN is in snapshot state
0xbad008d SCSI LUN is in transition
0xbad008e Transaction ran out of lock space or log space
0xbad008f Lock was not free
0xbad0090 Exceed maximum number of files on the filesystem
0xbad0091 Migration determined a failure by the VMX
0xbad0092 VSI GetList handler overflow
0xbad0093 Invalid world
0xbad0094 Invalid vmm
0xbad0095 Invalid transaction
0xbad0096 Transient file system condition, suggest retry
0xbad0097 Number of running VCPUs limit exceeded
0xbad0098 Invalid metadata
0xbad0099 Invalid page number
0xbad009a Not in executable format
0xbad009b Unable to connect to NFS server
0xbad009c The NFS server does not support MOUNT version 3 over TCP
0xbad009d The NFS server does not support NFS version 3 over TCP
0xbad009e The mount request was denied by the NFS server. Check that the export exists and that the client is permitted to mount it
0xbad009f The specified mount path was not a directory
0xbad00a0 Unable to query remote mount point’s attributes
0xbad00a1 NFS has reached the maximum number of supported volumes
0xbad00a2 Out of nice memory
0xbad00a3 VMotion failed to start due to lack of cpu or memory resources
0xbad00a4 Cache miss
0xbad00a5 Error induced when stress options are enabled
0xbad00a6 Maximum number of concurrent hosts are already accessing this resource
0xbad00a7 Host doesn’t have a journal
0xbad00a8 Lock rank violation detected
0xbad00a9 Module failed
0xbad00aa Unable to open slave if no master pty
0xbad00ab Not IOAble
0xbad00ac No free inodes
0xbad00ad No free memory for file data
0xbad00ae No free space to expand file or meta data
0xbad00af Unable to open writer if no fifo reader
0xbad00b0 No underlying device for major,minor
0xbad00b1 Memory min exceeds memSize
0xbad00b2 No virtual terminal for number
0xbad00b3 Too many elements for list
0xbad00b4 VMM<->VMK shared are mismatch
0xbad00b5 Failure during exec while original state already lost
0xbad00b6 vmnixmod kernel module not loaded
0xbad00b7 Invalid module
0xbad00b8 Address is not aligned on page boundary
0xbad00b9 Address is not mapped in address space
0xbad00ba No space to record a message
0xbad00bb No space left on PDI stack
0xbad00bc Invalid exception handler
0xbad00bd Exception not handled by exception handler
0xbad00be Can’t open sparse/TBZ files in multiwriter mode
0xbad00bf Transient storage condition, suggest retry
0xbad00c0 Storage initiator error
0xbad00c1 Timer initialization failed
0xbad00c2 Module not found
0xbad00c3 Socket not owned by cartel
0xbad00c4 No VSI handler found for the requested node
0xbad00c5 Invalid mmap protection flags
0xbad00c6 Invalid chunk size for contiguous mmap
0xbad00c7 Invalid MPN max for contiguous mmap
0xbad00c8 Invalid mmap flag on contiguous mmap
0xbad00c9 Unexpected fault on pre-faulted memory region
0xbad00ca Memory region cannot be split (remap/unmap)
0xbad00cb Cache Information not available
0xbad00cc Cannot remap pinned memory
0xbad00cd No cartel group by that name
0xbad00ce SPLock stats collection disabled
0xbad00cf Boot image is corrupted
0xbad00d0 Branched file cannot be modified
0xbad00d1 Name is reserved for branched file
0xbad00d2 Unlinked file cannot be branched
0xbad00d3 Maximum kernel-level retries exceeded
0xbad00d4 Optimistic lock acquired by another host
0xbad00d5 Object cannot be mmapped
0xbad00d6 Invalid cpu affinity
0xbad00d7 Device does not contain a logical volume
0xbad00d8 No space left on device
0xbad00d9 Invalid vsi node ID
0xbad00da Too many users accessing this resource
0xbad00db Operation already in progress
0xbad00dc Buffer too small to complete the operation
0xbad00dd Snapshot device disallowed
0xbad00de LVM device unreachable
0xbad00df Invalid cpu resource units
0xbad00e0 Invalid memory resource units
0xbad00e1 IO was aborted
0xbad00e2 Memory min less than memory already reserved by children
0xbad00e3 Memory min less than memory required to support current consumption
0xbad00e4 Memory max less than memory required to support current consumption
0xbad00e5 Timeout (ok to retry)
0xbad00e6 Reservation Lost
0xbad00e7 Cached metadata is stale
0xbad00e8 No fcntl lock slot left
0xbad00e9 No fcntl lock holder slot left
0xbad00ea Not licensed to access VMFS volumes
0xbad00eb Transient LVM device condition, suggest retry
0xbad00ec Snapshot LV incomplete
0xbad00ed Medium not found
0xbad00ee Maximum allowed SCSI paths have already been claimed
0xbad00ef Filesystem is not mountable
0xbad00f0 Memory size exceeds memSizeLimit
0xbad00f1 Disk lock acquired earlier, lost
0×2bad0000 Generic service console error

Protecting ESX VMFS Stores with Automation

Some time ago I shared some interesting information about VMFS volumes that I found using direct analysis in my blog named Understanding VMFS volumes. This spawned some discussions on the VMware Community forums and it became apparent that an automated backup of the critical VMFS info could be useful in the event of an undesirable security event that impacts our system availability. By creating a simple backup script process we can provide the ability to recover much more quickly from such events. In this howto guide we will enable this process with a cron job using the existing /etc/cron.daily/ job location directory. We simply need to copy an automation script to this location and it will run daily. Or if your change rate is less frequent maybe the /etc/cron.weekly location is more suitable.    

I have provided a simple script named vmfs-bu linked here which creates VMFS header file backups using dd for all the VMFS volumes visible on an ESX host. The script will output it’s backups to the /var/log directory. Here is a sample of what the backups look like from one my lab hosts.

-rw-r–r–    1 root     root      2097152 Mar 28 21:01 vmfs-header-backup-sda.hex
-rw-r–r–    1 root     root      2097152 Mar 28 21:01 vmfs-header-backup-sdb.hex
-rw-r–r–    1 root     root      2097152 Mar 28 21:01 vmfs-header-backup-sdc.hex
-rw-r–r–    1 root     root      2097152 Mar 28 21:01 vmfs-header-backup-sdd.hex
-r——–    1 root     root      8388608 Mar 28 21:01 vmfs-metadata-sda-vh.sf.bu
-r——–    1 root     root      8388608 Mar 28 21:01 vmfs-metadata-sdb-vh.sf.bu
-r——–    1 root     root      8388608 Mar 28 21:01 vmfs-metadata-sdc-vh.sf.bu
-r——–    1 root     root      8388608 Mar 28 21:01 vmfs-metadata-sdd-vh.sf.bu

Also be sure that the script permissions are configured as executable using the follwing.

chmod 755 /etc/cron.daily/vmfs-bu

With these simple easy steps we can create a proactive failsafe for VMFS recovery in the event of those undesirable moments we hope will never occur.

In addition to this basic method we can extend it’s functionally to recovery from undesirable VM deletions by automating the experimental vmfs-undelete tool. Before I demonstrate this functionally I must advise you that this tool may have some negative impacts to certain configurations. The vmfs-undelete tool is a set of python scripts that basically read vmks block allocations and outputs them to an archive file. In order to read the vmdks the scripts will invoke VM snapshots and this will not work if a current snapshot exists. This is not a big issue since you should not be leaving any snapshots around anyway as a best practice. However this action can negatively impact VCB or other backup agents that require snapshoting to grant vmdk access, Thus if you do not carefully time the script invocation when we have other snapshot dependant activities it could cause failures of those functions.

With this important information considered we can proceed to look at how to invoke the additional protection feature. The scripts that were originally developed by VMware are designed to be user interactive and cannot be used as originally coded therefore I have modified them in order to provision some basic automation. You can access the modified scripts named vmfs-undelete-auto-script and menuauto.py here.

The modified menuauto.py script needs to be placed within the /usr/lib/vmware/python2.2/site-packages/vmware/undeletemods directory. While I could have just modified the existing menu.py script it is subject to change so this method prevents potential conflict issues. The vmfs-undelete-auto-script script location is optional and can be placed where ever you find apropriate. I chose to place it in the /root directory. The script requires a single argument which is to direct it output location with a path and file name. Since there is a potention for conflict with other snapshot based services the script should be invoked using a cron job outside of the daily or other predefined jobs. This cron job can be implemented using the crontab facility. Here is an example of how to create it while logged in as root.

crontab -e

This command will create a cron entry in /var/spool/cron/root and will load vi as a editor you will need to create the following cron details to invoke the vmfs-undelete script. You can find more info on cron editing at http://en.wikipedia.org/wiki/Cron.

05 22 * * * python /root/vmfs-undelete-auto-script /var/log/vmfs-undelete-archive

This entry will execute at 10:05PM every day. Again be sure to set the security of the script file as executable as shown here.

chmod 755 /root/vmfs-undelete-auto-script

When the script executes it can take several minutes to complete.

Hope you found this information to be useful.

Regards,

Mike

Awarded VMware vExpert Status

While I was away at the OpenSolaris Storage Summit in San Francisco a really nice email appeared in my inbox which put a grateful smile on my face. John Troyer and his team felt that I was worthy of the vExperts award. I’m honoured to receive the recognition and thank-you to the judging panel members 

Congrats to all the other recipients as well.

Multi Protocol Storage Provisioning with COMSTAR

 COMSTAR is a new breed of open source storage product available to the world. What was traditionally a closed and proprietary storage capability is now available to our open source communities. With OpenSolaris and COMSTAR the ability to freely provision virtual storage services over very mature high end protocols on standard commodity server hardware is now a reality. High performance transports are integral within the feature sets of COMSTAR and Sun’s open source portfolio of projects. The COMSTAR product is revolutionary in its method of provisioning storage virtualization and transport services to storage resource consumers.
COMSTAR provisions virtualized SCSI block storage over multiple SCSI transport protocols. While this function class is not new to us the ease of implementation using COMSTAR certainly is. All the complexities of using a multi protocol target services platform are cleaned up. It is simple to use and facilitates advanced high performance storage provisioning at the block level.
The services within this product have multiple common storage provisioning applications. One very interesting application is a storage gateway server and this blog demonstrates howto build a Fiber Channel (FC) storage gateway using the COMSTAR service layers and as well provision some additional features using the target services.

 

 COMSTAR FC Gateway Architecture by Mike La Spina

 

 

 

 

 

 

 

 

 

 

In this example instance we are re-provisioning an existing storage system with an OpenSolaris COMSTAR configuration running on a commodity white box which functions as a storage server head that can compress, scrub, thin provision, replicate, snapshot and clone the existing block storage attachments. The example FC based storage could also be comprised of a JBOD FC array directly attached to the OpenSolaris storage head if we so desired or many other commonly available SCSI attachment methods. The objective here is to extend and enhance any block storage system with high performance transports and virtualization features. Of course we could also formalize the white box to an industrial strength host once we are satisfied that the proof of concept is mature and optimal.  

The reality is that many older existing FC storage systems are installed without these features primarily due to the excessive licensing costs of them. And even when these features are available, its use is probably restricted to like proprietary systems thus obsolescing the entire lot of any useful future functionality. But what if you could re-purpose an older storage system to act as a DR store or backup cache system or maybe a test and development environment. With today’s economy this is from a cost perspective, very attractive and can be accomplished with very little risk on the investment side.

One of the possible applications for this flexible storage service is the re-provisioning of existing LUN’s from an existing system to newer more flexible SCSI transport protocols. This is particularly useful when we need to re-target the existing storage system from FC to iSCSI or the likes of. We can begin by exploring this functionality and explain how COMSTAR can provide us with this service.

First we need to understand the high level functionality of the COMSTAR service layers. Virtual LUN’s on COMSTAR are provisioned with a service layer named the LU provider. This layer maps backing stores of various types to a storage GUID assignment and additionally defines other properties like the LUN ID and size dimensions. This layer allows us to carve out the available block storage devices that are accessible on our OpenSolaris storage host. For example if we attached an FC Initiator to an external storage system we can then map the accessible SCSI block devices to the LU provider layer and then present this virtualized LUN to the other COMSTAR service layers for further processing.

Once we have defined the LU’s we can present this storage resource to the SCSI Target Mode Framework Service (STMF) layer which acts as the storage gate keeper. At this layer we define which clients (initiators) can connect to the LU’s based on Membership of Target Groups and Host Groups that are assigned logical views of the LU(s). The STMF layer routes the defined LU(s) as SCSI targets over a multiprotocol interface connection pool to a Port Provider. Port Providers are the protocol connection service instances which can be the likes of FC, iSCSI, SAS, iSER, FCoE and so on. 

With these COMSTAR basics in mind let us begin by diving into some of the details of how this can be applied.  

Sun has detailed howto setup COMSTAR at dlc.sun.com so no need to re-invent the wheel here.

Just as a note SXCE snv_103 and up integrate the COMSTAR FC and iSCSI port provider code. With the COMSTAR software components and FC target setup we can demonstrate the re-provisioning of an existing FC based storage server. Since I don’t have the luxury of having a proprietary storage server at home I will emulate this storage using an additional COMSTAR white box to act as the FC storage target to be re-provisioned.

On the existing FC target system we need to create Raid0 arrays of three disks each which will total up to a set of six trios. We will use these six non-fault tolerant disk groups as vdevs for a ZFS raidz2 group. This will allow us to create fault tolerant arrays from the existing storage server. The reasons for sets of three Raid0 groupings are to reduce the possibility of reaching the LUN maximums of the proprietary storage system and also we do not want to erode the performance by layering Raid 5 groups. As well we can tolerate a disk failure in two of the trios since we have Raidz2 across the Raid0 trio groups. Additionally using these Raid0 disk groups actually lowers the array failure probability rate. For example if a second disk were to failure in a single Raid0 set there would be no additional impact to other trios, thus reducing the overall failure rate. 

To create the emulated FC storage system I have defined the following 16G ZFS sparse volumes respectively named trio1 through trio6 each as a representation of the 3 disk Raid0 spanned LUN on a source storage host named ss1. 

root@ss1:~# zfs create sp1/gw
root@ss1:~# zfs create -s -V 16G sp1/gw/trio1
root@ss1:~# zfs create -s -V 16G sp1/gw/trio2
root@ss1:~# zfs create -s -V 16G sp1/gw/trio3
root@ss1:~# zfs create -s -V 16G sp1/gw/trio4
root@ss1:~# zfs create -s -V 16G sp1/gw/trio5
root@ss1:~# zfs create -s -V 16G sp1/gw/trio6

Once these mockup volumes are created they are then defined as backing stores using the sbdadm utility as follows.

root@ss1:~# sbdadm create-lu /dev/zvol/rdsk/sp1/gw/trio1

Created the following LU:

              GUID                    DATA SIZE           SOURCE
——————————–  ——————-  —————-
600144f01eb3862c0000494b55cd0001      17179803648      /dev/zvol/rdsk/sp1/gw/trio1

All the backing stores were added to the LU provider service layer to which in turn were assigned to the STMF service layer. Here we can see the automatically generated GUID’s that are assigned to the ZFS backing stores.

root@ss1:~# sbdadm list-lu

Found 6 LU(s)

              GUID                    DATA SIZE           SOURCE
——————————–  ——————-  —————-
600144f01eb3862c0000494b56000006      17179803648      /dev/zvol/rdsk/sp1/gw/trio6
600144f01eb3862c0000494b55fd0005      17179803648      /dev/zvol/rdsk/sp1/gw/trio5
600144f01eb3862c0000494b55fa0004      17179803648      /dev/zvol/rdsk/sp1/gw/trio4
600144f01eb3862c0000494b55f80003      17179803648      /dev/zvol/rdsk/sp1/gw/trio3
600144f01eb3862c0000494b55f50002      17179803648      /dev/zvol/rdsk/sp1/gw/trio2
600144f01eb3862c0000494b55cd0001      17179803648      /dev/zvol/rdsk/sp1/gw/trio1

A host group was defined named GW1 and respectively these LU GUID’s were added to the GW1 host group as LU views assigning LUN 0 to 5.

Just as a note the group names are case sensitive. 
root@ss1:~#stmfadm create-hg GW1

Here we assigned the GUID’s a LUN value on the GW1 host group with the -n parm.   

root@ss1:~# stmfadm add-view -h GW1 -n 0 600144F01EB3862C0000494B55CD0001
root@ss1:~# stmfadm add-view -h GW1 -n 1 600144F01EB3862C0000494B55F50002
root@ss1:~# stmfadm add-view -h GW1 -n 2 600144F01EB3862C0000494B55F80003
root@ss1:~# stmfadm add-view -h GW1 -n 3 600144F01EB3862C0000494B55FA0004
root@ss1:~# stmfadm add-view -h GW1 -n 4 600144F01EB3862C0000494B55FD0005
root@ss1:~# stmfadm add-view -h GW1 -n 5 600144F01EB3862C0000494B56000006

With the LU’s now available in a host group view we can add the COMSTAR re-provisioning gateway server FC wwn’s to this host group and it will become available as a storage resource on the re-provisioning gateway server named ss2. We need to obtain the wwn from the gateway server using the fcinfo hba-port command.  
root@ss2:~# fcinfo hba-port
HBA Port WWN: 210000e08b100163
        Port Mode: Initiator
        Port ID: 10300
        OS Device Name: /dev/cfg/c8
        Manufacturer: QLogic Corp.
        Model: QLA2300
        Firmware Version: 03.03.27
        FCode/BIOS Version:  BIOS: 1.47;
        Serial Number: not available
        Driver Name: qlc
        Driver Version: 20080617-2.30
        Type: N-port
        State: online
        Supported Speeds: 1Gb 2Gb
        Current Speed: 2Gb
        Node WWN: 200000e08b100163
        NPIV Not Supported

Using the stmfadm utility we add the gateway server’s wwn address to the GW1 host group. 
root@ss1:~# stmfadm add-hg-member -g GW1 wwn.210000e08b100163

Once added to ss1 we can see that it is indeed available and online. 
root@ss1:~# stmfadm list-target -v

Target: wwn.2100001B320EFD58
    Operational Status: Online
    Provider Name     : qlt
    Alias             : qlt2,0
    Sessions          : 1
        Initiator: wwn.210000E08B100163
            Alias: :qlc1
            Logged in since: Fri Dec 19 01:47:07 2008

The cfgadm command will scan for the newly available LUN’s and now we can access the emulated (aka boat anchor) storage system using our gateway server ss2. Of course we could also set up more initiators and access it over a multipath connection.  

cfgadm -a

root@ss2:~# format
Searching for disks…done

AVAILABLE DISK SELECTIONS:
       0. c0t600144F01EB3862C0000494B55CD0001d0 <DEFAULT cyl 2086 alt 2 hd 255 sec 63>
          /scsi_vhci/disk@g600144f01eb3862c0000494b55cd0001

       1. c0t600144F01EB3862C0000494B55F50002d0 <DEFAULT cyl 2086 alt 2 hd 255 sec 63>
          /scsi_vhci/disk@g600144f01eb3862c0000494b55f50002

       2. c0t600144F01EB3862C0000494B55F80003d0 <DEFAULT cyl 2086 alt 2 hd 255 sec 63>
          /scsi_vhci/disk@g600144f01eb3862c0000494b55f80003

       3. c0t600144F01EB3862C0000494B55FA0004d0 <DEFAULT cyl 2086 alt 2 hd 255 sec 63>
          /scsi_vhci/disk@g600144f01eb3862c0000494b55fa0004

       4. c0t600144F01EB3862C0000494B55FD0005d0 <DEFAULT cyl 2086 alt 2 hd 255 sec 63>
          /scsi_vhci/disk@g600144f01eb3862c0000494b55fd0005

       5. c0t600144F01EB3862C0000494B56000006d0 <DEFAULT cyl 2086 alt 2 hd 255 sec 63>
          /scsi_vhci/disk@g600144f01eb3862c0000494b56000006

Now that we some FC LUN connections configured from to the storage system to be re-provisioned we can create a ZFS based pool which grants us the ability to carve out the block storage in a virtual manner. As discussed previously we will use raid dp a.k.a. raidz2 to provide a higher level of availability with the zpool create raidz2 option command.

root@ss2:~# zpool create gwrp1 raidz2 c0t600144F01EB3862C0000494B55CD0001d0 c0t600144F01EB3862C0000494B55F50002d0 c0t600144F01EB3862C0000494B55F80003d0 c0t600144F01EB3862C0000494B55FA0004d0 c0t600144F01EB3862C0000494B55FD0005d0 c0t600144F01EB3862C0000494B56000006d0

A quick status check reveals all is well with the ZFS pool.

root@ss2:~# zpool status gwrp1
  pool: gwrp1
 state: ONLINE
 scrub: none requested
config:

        NAME                                       STATE     READ WRITE CKSUM
        gwrp1                                      ONLINE       0     0     0
          raidz2                                   ONLINE       0     0     0
            c0t600144F01EB3862C0000494B55CD0001d0  ONLINE       0     0     0
            c0t600144F01EB3862C0000494B55F50002d0  ONLINE       0     0     0
            c0t600144F01EB3862C0000494B55F80003d0  ONLINE       0     0     0
            c0t600144F01EB3862C0000494B55FA0004d0  ONLINE       0     0     0
            c0t600144F01EB3862C0000494B55FD0005d0  ONLINE       0     0     0
            c0t600144F01EB3862C0000494B56000006d0  ONLINE       0     0     0

Let’s carve out some of this newly created pool as a 32GB sparse volume. The -p option creates the full path if it does not currently exist.


root@ss2:~# zfs create -p -s -V 32G gwrp1/stores/lun0
 

root@ss2:~# zfs list
NAME                         USED  AVAIL  REFER  MOUNTPOINT
gwrp1                        220K  62.6G  38.0K  /gwrp1
gwrp1/stores                67.9K  62.6G  36.0K  /gwrp1/stores
gwrp1/stores/lun0           32.0K  62.6G  32.0K  -

With a slice of the pool created we can now assign a GUID within the LU Provider layer using the sbdadm utility.

root@ss2:~# sbdadm create-lu /dev/zvol/rdsk/gwrp1/stores/lun0

Created the following LU:

              GUID                    DATA SIZE           SOURCE
——————————–  ——————-  —————-
600144f07ed404000000496813070001      34359672832      /dev/zvol/rdsk/gwrp1/stores/lun0

The LU Provider layer can also provision sparse based storage. However in this case the ZFS backing store is already thin provisioned. If this were a physical disk backing store it would be prudent to use the LU Provider’s sparse/thin provisioning feature. At this point we are ready to create the STMF Host Group and View that will be used to demonstrate a real world example of the multi protocol capability with the COMSTAR OpenStorage ss2 host. In this case I will use VMware ESX as a storage consumer. To reflect the host group type we will name it ESX1 and then we need to add a view for the LU GUID of the virtualized storage.

root@ss2:~# stmfadm create-hg ESX1

root@ss2:~# stmfadm add-view -h ESX1 -n 1 600144f07ed404000000496813070001

root@ss2:~# stmfadm list-view -l 600144F07ED404000000496813070001
View Entry: 0
    Host group   : ESX1
    Target group : All
    LUN          : 1

With a view defined for the VMware hosts let’s add an ESX host FC HBA wwn membership to the defined ESX1 host group. We need to retrieve the wwn from the VMware server using either the console or a Virtual Infrastructure Client GUI. Personally I like the console esxcfg-info tool, however if it’s an ESXi host then the GUI will serve the info just as well.

 

VMware Screen shot WWN by Mike La Spina

[root@vh1 root]# esxcfg-info -s | grep ‘Adapter WWN’
                     |—-Adapter WWNN…………………………20:00:00:e0:8b:01:f7:e2

root@ss2:~# stmfadm add-hg-member -g ESX1 wwn.210000e08b01f7e2

And the result of this change after we issue a rescan on vmhba1 and create a VMFS volume named ss2-cstar-zs0.0 with the re-provisioned storage is reflected here.

VMware Screen shot VMFS volume by Mike La Spina

This crafted storage is now a thinly provisioned VMFS store that can deliver replication, snapshots, cloning, advanced error detection and can also be re-platformed to a new storage system at a later date using ZFS’s hardware autonomy. The storage server is very attractive as it creates a level of future proofing and insulates the storage consumers from proprietary vendor lock in. But that’s not the best part of this example. Let’s say you wish provide different tiers of connectivity services for your storage consumers. For example we could attach a development or test environment using an iSCSI protocol and the more critical environments can use FC or FCoE based protocol.
So let’s look at how we can add a second SCSI transport protocol to this interesting configuration.
Just as a note the new iSCSI port provider is a kernel based implementation and has superior performance to its predecessor iscsitgt user land implementation.

To add the iSCSI protocol we need to enable the iscsi/target port provider service.

root@ss2:~# svcadm enable iscsi/target

Now we need to create an iSCSI target and iSCSI initiator definition so that we can add the iSCSI initiator to the ESX1 host group. As well we should define a target portal group so we can control what host IP(s) will service this target.

root@ss2:~# itadm create-tpg 2 10.0.0.1

root@ss2:~# itadm create-target

root@ss2:~# itadm create-target -n iqn.1986-03.com.sun:02:ss2.0 -t 2
Target iqn.1986-03.com.sun:02:ss2.0 successfully created

By default the iqn will be created as a member of the All targets group.

If we left out the parameters the itadm utility would create an iqn GUID and use the default target portal group of 1. And yes for those familiar with the predecessor iscsitadm utility we can now create a iqn name at the command line.

At this point we need to define the initiator iqn to the iSCSI port provider service and if required additionally secure it using CHAP. We need to retrieve the VMware initiator iqn name from either the Virtual Infrastructure Client GUI or console command line. Just as a note if we did not specify a host group when we defined our view the default would allow any initiator FC, iSCSI or otherwise to connect to the LU and this may have a purpose but generally it is a bad practice to allow in most configurations. Once created the initiator is added to the ESX1 host group thus enables our second access protocol to the same LU.

[root@vh1 root]# esxcfg-info -s | grep ‘iqn’
         |—-ISCSI Name……………………………………..iqn.1998-01.com.vmware:vh1.1
         |—-ISCSI Alias…………………………………….iqn.1998-01.com.vmware:vh1.1

root@ss2:~# itadm create-initiator iqn.1998-01.com.vmware:vh1.1

root@ss2:~# stmfadm add-hg-member -g ESX1 iqn.1998-01.com.vmware:vh1.1

After adding the ss2 iSCSI interface IP to VMware’s Software iSCSI initiator we now have a multipath multiprotocol connection to our COMSTAR storage host.

 VMware iqn example By Mike La Spina

VMware mpath example by Mike La Spina 

 

 

 

 

 

 

 

 

 

This is simply the most functional and advanced Open Source storage product in the world today. Here we have commodity white boxes serving advanced storage protocols in my home lab, can you imagine what could be done with Data Center class server hardware and Fishworks. You can begin to see the advantages of this future proof platform. As protocols like FCoE, Infiniband and iSER (iSCSI without the TCP session overhead) already working in COMSTAR the Sun Software Engineers and OpenSolaris community are crafting outstanding storage products.
Hope you found this blog to be interesting.
Regards,

Mike

 

 

 

 

Understanding VMFS volumes

Understanding VMFS volumes is an important element within VMware ESX environments. When storage issues surface we need to correctly evaluate the VMFS volume states and apply the appropriate corrective actions to remediate undesirable storage events. VMFS architecture is not publically available and this certainly adds to the challenge when we need to correct a volume configuration or change issue. So lets begin to look at the components of a VMFS from what I have been able to decrypt using direct analysis.

All VMFS volume partitions will have a partition ID value of fb. Running fdisk can identify any partitions that are flagged as VMFS as shown here.

[root@vh1 ]# fdisk -lu /dev/sdc

Disk /dev/sdc: 274.8 GB, 274877889536 bytes
255 heads, 63 sectors/track, 33418 cylinders, total 536870878 sectors
Units = sectors of 1 * 512 = 512 bytes

   Device Boot    Start       End    Blocks   Id  System
/dev/sdc1           128 536860169 268430021   fb  Unknown

What’s important to note here is the sector size = 512 and the starting/ending blocks.


Many VMFS volume configuration elements are visible in the /vmfs mount folder. Within the directory the are two subdirectories, the volumes directory and the devices directory. The volumes directory provisions the mount point location and the devices directory holds configuration elements. Within the devices directory the are several subdirectories of which I can explain the disks and lvm folders, the others are not known to me outside of theory only.

A key part of a VMFS volume is it’s UUID (aka Universally Unique Identifier) and as the name suggests it used to ensure uniqueness when more than one volume is in use. The UUID is generated on the initial ESX host that created the VMFS volume based on the UUID creation standards. You can determine which ESX host the initial VMFS volume was created on by referring to the last 6 bytes of the UUID. This value is the same as the last six bytes of the ESX host’s system UUID found in the /etc/vmware/esx.conf file.

By far one of the most critical elements on a VMFS volume is the GUID. The GUID is integral within the volume because it is used to form the vml path (aka virtual multipath link). The GUID is stored within the VMFS volume header and begins at address 0×10002E.

The format of the GUID can vary based on different implementations of SCSI transport protocols but generally you will see some obvious length variances of the vml path identifiers which stem from the use of T11 and T10 Standard SCSI address formats like EUI-64, and NAA 64. Regardless of those variables there are components outside of the GUID within the vml that we should take notice of. The vml construct contains references to the LUN and partition values and these are useful to know about. The following illustrates where these elements appear in some real examples.

When we issue an ls -l from the /vmfs/devices/disks directory the following info is observed.

vhhba#:Target:LUN:Partition -> vml:??_LUN_??_GUID:Partition

                       LUN     GUID                         PARTITION
                       ^       ^                            ^
vmhba0:1:0:0  -> vml.02000000005005076719d163d844544e313436
vmhba0:1:0:1  -> vml.02000000005005076719d163d844544e313436:1
vmhba32:1:3:0 -> vml.0200030000600144f07ed404000000496ff8cd0003434f4d535441
vmhba32:1:3:1 -> vml.0200030000600144f07ed404000000496ff8cd0003434f4d535441:1

As well the issuing ls -l on the /vmfs/volumes list the VMFS UUID’s and the link name which is what we see displayed in the GUI client. In this example we will follow the UUID shown in blue and the named ss2-cstar-zs0.2 volume.

ss2-cstar-zs0.2 -> 49716cd8-ebcbbf9a-6792-000d60d46e2e

Additionally we can use esxcfg-vmkhbadevs -m to list the vmhba, dev and UUID associations.

[root@vh1 ]#  esxcfg-vmhbadevs -m
vmhba0:1:0:1    /dev/sdd1                        48a3b0f3-736b896e-af8f-00025567144e
vmhba32:1:3:1   /dev/sdf1                        49716cd8-ebcbbf9a-6792-000d60d46e2e

As you can see we indeed have different GUID lengths in this example. We also can see that the vmhba device is linked to a vml construct and this is how the kernel defines paths to a visible SCSI LUN. The vml path hosts the LUN ID, GUID and partition number information and this is also stored in the volumes VMFS header. As well the header contains a UUID signature but this is not the VMFS UUID.

If we use hexdump as illustrated below we can see these elements in the VMFS header directly.


[root@vh1 root]# hexdump -C -s 0×100000 -n 800 /dev/sdf1
00100000  0d d0 01 c0 03 00 00 00  10 00 00 00 02 16 03 00  |                | <- LUN ID
00100010  00 06 53 55 4e 20 20 20  20 20 43 4f 4d 53 54 41  |  SUN     COMSTA| <- Target Label
00100020  52 20 20 20 20 20 20 20  20 20 31 2e 30 20 60 01  |R         1.0 ` | <- LUN GUID
00100030  44 f0 7e d4 04 00 00 00  49 6f f8 cd 00 03 43 4f  |D ~     Io    CO|
00100040  4d 53 54 41 00 00 00 00  00 00 00 00 00 00 00 00  |MSTA            |
00100050  00 00 00 00 00 00 00 00  00 00 00 02 00 00 00 fc  |                | <- Volume Size
00100060  e9 ff 18 00 00 00 01 00  00 00 8f 01 00 00 8e 01  |                |
00100070  00 00 91 01 00 00 00 00  00 00 00 00 10 01 00 00  |                |
00100080  00 00 d8 6c 71 49 b0 aa  97 9b 6c 2f 00 0d 60 d4  |   lqI    l/  ` |
00100090  6e 2e 6e 89 19 fb a6 60  04 00 a7 ce 20 fb a6 60  |n n    `       `|
001000a0  04 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |                |
001000b0  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |                |
*
00100200  00 00 00 f0 18 00 00 00  90 01 00 00 00 00 00 00  |                |
00100210  01 00 00 00 34 39 37 31  36 63 64 38 2d 36 30 37  |    49716cd8-607| <- SEG UUID in ASCII
00100220  35 38 39 39 61 2d 61 64  31 63 2d 30 30 30 64 36  |5899a-ad1c-000d6|
00100230  30 64 34 36 65 32 65 00  00 00 00 00 00 00 00 00  |0d46e2e         |
00100240  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |                |
00100250  00 00 00 00 d8 6c 71 49  9a 89 75 60 1c ad 00 0d  |     lqI  u`    | <- SEG UUID
00100260  60 d4 6e 2e 01 00 00 00  e1 9c 19 fb a6 60 04 00  |` n          `  |
00100270  00 00 00 00 8f 01 00 00  00 00 00 00 00 00 00 00  |                |
00100280  8e 01 00 00 00 00 00 00  64 cc 20 fb a6 60 04 00  |        d    `  |
00100290  01 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |                |
001002a0  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |                |

 

In addition to the VMFS header block we have the hidden metadata files of the volume which you can list using ls -al. The vh.sf contains the UUID of the VMFS store and any member segments info. (I would presume the name vh stands for Volume Header … ;D) 

 

 

[root@vh1 ]# hexdump -C -s 0×200000 -n 256 /vmfs/volumes/49716cd8-ebcbbf9a-6792-000d60d46e2e/.vh.sf
00200000  5e f1 ab 2f 04 00 00 00  1f d8 6c 71 49 9a bf cb  |^   /  lqI      | <- VMFS UUID
00200010  eb 92 67 00 0d 60 d4 6e  2e 02 00 00 00 73 73 32  |  g  ` n     ss2| <- Volume Name
00200020  2d 63 73 74 61 72 2d 7a  73 30 2e 32 00 00 00 00  |-cstar-zs0.2    |
00200030  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |                |
*
00200090  00 00 00 00 00 00 00 00  00 00 00 00 00 00 02 00  |                |
002000a0  00 00 00 10 00 00 00 00  00 d8 6c 71 49 01 00 00  |          lqI   |
002000b0  00 d8 6c 71 49 9a 89 75  60 1c ad 00 0d 60 d4 6e  |  lqI  u`    ` n| <- SEG UUID
002000c0  2e 01 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |                |
002000d0  00 00 00 01 00 20 00 00  00 00 00 01 00 00 00 00  |                |
002000e0  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |                |

And of course we can not leave out the partition entry block data for the device.

hexdump -C -n 256 /dev/sdf1

00000000  fa b8 00 10 8e d0 bc 00  b0 b8 00 00 8e d8 8e c0  |                |
00000010  fb be 00 7c bf 00 06 b9  00 02 f3 a4 ea 21 06 00  |   |         !  |
00000020  00 be be 07 38 04 75 0b  83 c6 10 81 fe fe 07 75  |    8 u        u|
00000030  f3 eb 16 b4 02 b0 01 bb  00 7c b2 80 8a 74 01 8b  |         |   t  |
00000040  4c 02 cd 13 ea 00 7c 00  00 eb fe 00 00 00 00 00  |L     |         |
00000050  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |                |
*
000001b0  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 02  |                |
000001c0  03 00 fb fe ff ff 80 00  00 00 72 ef bf 5d 00 00  |          r  ]  | Type Start End
000001d0  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |                |

With this detailed information it is possible to solve some common security issues with VMware stores like volume deletion and unintentional LUN ID changes.

Recently VMware added a some what useful command line tool named vmfs-undelete which exports metadata to a recovery log file which can restore vmdk block addresses in the event of deletion. It’s a simple tool and at present it’s experimental and unsupported and is not available on ESXi. The tool of course demands that you were proactive and ran it’s backup function in order to use it. Well I think this falls well short of what we need here. What if you have no previous backups of the VMFS configuration, so we really need to know what to look for and how to correct it and that’s exactly why I created this blog.  

The volume deletion event is quite easy to fix and thats simply because the VMFS volume header is not actually deleted. The partition block data is what gets trashed and you can just about get way with murder when it comes to recreating that part. Within the first 128 sectors is the peice we need to fix. One method is to create a store with the same storage volume and then block copy the partition to a file which can be block copied to the deleted device partition data blocks and this will fix the issue.

For example we create a new VMFS store on the same storage backing with the same LUN size as the original and it shows up as a LUN with a device name of /dev/sdd we can use esxcfg-vmhbadevs -m to find it if required

The deleted device name was /dev/sdc

We use the dd command to do a block copy from the new partition to a file or even directly in this case.

Remember to back it up first!

dd if=/dev/sdc of=/var/log/part-backup-sdc-1st.hex  bs=512 count=1

then issue

dd if=/dev/sdd of=/dev/sdc bs=512 count=1

or 

dd if=/dev/sdd of=/var/log/part-backup-sdd.hex bs=512 count=1

dd if=/var/log/part-backup-sdd.hex of=/dev/sdc bs=512 count=1

I personally like using a file to perform this function as this becomes a future backup element which you can move to a safe location. The file can actually be edited with other utilities to provide more flexibility. e.g. hexedit etc. Addtitionally you could use fdisk to directly edit the partition table and provide the correct start and end addresses. This is something you should only do if you are well versed in it’s usage.

As as an additional level of protection we could even include making backups of the vh.sf metadata file and the VMFS header.

cp /vmfs/volumes/49716cd8-ebcbbf9a-6792-000d60d46e2e/.vh.sf /var/log/vh.sf.bu

dd if=/dev/sdc of=/var/log/vmfsheader-bu-sdc.hex bs=512 count=4096

This would grant the ability for support to examine the exact details of the VMFS configuration and potentially allow recovery from more complex issues.

One of the most annoying security events is when a VMFS LUN get’s changed inadvertently. If a VMFS volume LUN ID changes and is presented to an ESX host then the presented volume will be treated as a potential snapshot LUN. If this event occurs and the ESX servers advanced LVM parameter settings are at default the ESX host will not mount the volume. This behaviour is to prevent the possibility of corruption and downing the host since it can not determine which VM metadata inventory is correct.

If you are aware that the LUN ID has changed then the best course of action is to re-establish the correct LUN ID at the storage server first and rescan the affected vmhba’s. This is important because if you need to resignature the VMFS volume it will also require that the VM’s be imported back into inventory. Virtual Center logging and other various settings will be lost when this action is performed. This is a result of now having an incorrect UUID between the metadata, mount location and the vmx file UUID value.

If the storage change cannot be reverted back then a VMFS resignature method is the only option for reprovisioning a VMFS volume mount.

This is invoked by setting the LVM.DisallowSnapshotLun = 0 and LVM.EnableResignature = 1 and these should reverted back once the VMFS resignature operation is complete.

Regards,

Mike

A centrally based method for patching ESX3 VMWare Servers

I have updated my ESX servers manually many times and I find the process to say at the least is ”annoying” so I decided to change it to an http based method with a modified patch configuration. I found that it really works well.

I did some searching prior to the method I settled on and found this blog which is quite good.

http://virtrix.blogspot.com/2007/03/vmware-autopatching-your-esx-host.html

I felt it does have some issues and I wanted to avoid running custom scripts on the server side. So here is what I build for my patch management solution.

Using the standard tools on the ESX3 server of cron jobs and esxupdate I created the following on my servers.

Define a new cron entry for running esxupdate every first Sunday of the month.
I chose to create a separate entry avoiding the esx installed ones for safety reasons. 

Edit the file /etc/crontab and add the line in bold as show. 

[root@yourserver /]nano /etc/crontab

SHELL=/bin/bash
PATH=/sbin:/bin:/usr/sbin:/usr/bin
MAILTO=root
HOME=/

# run-parts
01 * * * * root run-parts /etc/cron.hourly
02 4 * * * root run-parts /etc/cron.daily
22 4 * * 0 root run-parts /etc/cron.weekly
42 4 1 * * root run-parts /etc/cron.monthly
30 3 1 * 0 root run-parts /etc/cron.esxupdate

Create a new directory for the esxupdate cron run part as follows.

[root@yourserver /]mkdir /etc/cron.esxupdate

Create a new command file for the esxupdate cron run part and add the text lines as follows.

[root@yourserver /]nano /etc/cron.esxupdate/esxpatch

esxupdate -r http://yourhttpserver.fq.dns:8088/esx3.all update

Set the file as executable using chmod

chmod 755 /etc/cron.esxupdate/esxpatch

Allow outgoing traffic from the server using the esxcfg-firewall script as follows.

[root@yourserver /]esxcfg-firewall –allowoutgoing

Create an http server to host the patch repository.
I built an IIS server on my Virtual Center 3 server on port 8088 as follows

Install IIS (Be sure to secure it).

Create a patch folder named C:\VMWare\Patches and set your IIS home directory as follows.

 IIS example

 
You will need to add the following content types from a right mouse click-> properties of the IIS server definition within the Computer Manager mmc as shown.

Content types .hdr and .info need to be present.

 IIS Mime Example

 
Download the patches you require and extract them to c:\vmware\patches in the usual manner for repo based deployment. For example I downloaded the available esx-upgrade-from-esx3-3.0.2-61618.tar.gz upgrade file and extracted it to c:\vmware\patches which leaves a directory named 61818 in the http distro folder.

Rename this directory to esx3.all

Now this is were things get interesting as we are going to customize this patch package to include the latest additional patches that are available.

VMWare is using the installation and update functionally of Redhat rpm, yum and esxupdate scripts which means we can modify the repo source by conforming to the existing configuration rules. This configuration of the repo is based on xml tags of which I modified three tag element types.

1) info

The descriptor xml file found in the root of the update package which contains the descriptions of various patches in the package and looks like the following.

<descriptor version=”1.0″>
  <vendor>VMware, Inc.</vendor>
  <product>VMware ESX Server</product>
  <release>3.0.2-62488</release>
  <releasedate>Fri Nov 2 05:01:31 PDT 2007</releasedate>
  <summary>3.0.2 Update 1 of VMware ESX Server</summary>
  <description>This is 3.0.2 Update 1 rollup to Nov 2nd,2007 of VMware ESX Server.
It contains:
3.0.2-52542  Full 3.0.2 release of VMware ESX Server
ESX-1001724  Security bugs fixed in vmx rpm.
ESX-1001735  To update tzdata rpm.

ESX-1002424  VMotion RARP broadcast to multiple vmnic.
ESX-1002425  VMware-hostd-esx 3.0.2-62488
ESX-1002429  Path failback issue with EMC iSCSI array.
</description>

This xml file is used by the esxupdate script and can be modified to support additional patches. The parts we need to add are some descriptor text element data so that we know what it covers. I added ESX-1002424, 25 and 29 to the descriptor tag. And edited the headings date to reflect its current value.

2) rpms

The second element type I changed was the rpmlist tag. It looks like the following which is only a partial.

  <rpmlist>
    <rpm arch=”i386″ rel=”8.37.15″ ver=”2.4″>kernel-utils</rpm>
    <rpm arch=”i686″ rel=”47.0.1.EL.62488″ ver=”2.4.21″>kernel-vmnix</rpm>
    <rpm arch=”i386″ rel=”66″ ver=”1.2.7″>krb5-libs</rpm>
    <rpm arch=”i386″ rel=”11″ ver=”1.1.1″>krbafs</rpm>
    <rpm arch=”i386″ rel=”3vmw” ver=”1.1.22.15″>kudzu</rpm>
    <rpm arch=”i386″ rel=”70RHEL3″ ver=”0.1″>laus-libs</rpm>
    <rpm arch=”i386″ rel=”12″ ver=”378″>less</rpm>
    <rpm arch=”i386″ rel=”52542″ ver=”3.0.2″>VMware-esx-perftools</rpm>
    <rpm arch=”i386″ rel=”52542″ ver=”3.0.2″>VMware-esx-uwlibs</rpm>
    <rpm arch=”i386″ rel=”62488″ ver=”3.0.2″>VMware-esx-vmx</rpm>
    <rpm arch=”i386″ rel=”61818″ ver=”3.0.2″>VMware-esx-vmkernel</rpm>
    <rpm arch=”i386″ rel=”52542″ ver=”3.0.2″>VMware-esx-vmkctl</rpm>
    <rpm arch=”i386″ rel=”55869″ ver=”3.0.2″>VMware-esx-tools</rpm>
    <rpm arch=”i386″ rel=”52542″ ver=”3.0.2″>VMware-esx-srvrmgmt</rpm>
    <rpm arch=”i386″ rel=”52542″ ver=”3.0.2″>VMware-esx-scripts</rpm>
  </rpmlist>
When you extract a newer esx3 patch it will contain its own descriptor rpmlist tags etc. and you can update this major upgrade list to include those new elements. You need to replace the old tag info if it exists which it usually will. For example the bold entry above for VMware-esx-vmkernel needs to be completely updated by removing the old tag values like rel=61818 and insert the new value of rel=62488. I simply searched for the tag name VMware-esx-vmkernel deleted the line and inserted the updated one. I will eventually write a script to do the config edits, but for now these hacks will do the trick.

3) nodeps

The third element I changed will not normally be necessary but in this case there is a bug in this upgrade package that did not account for an rpm option value of –U which means upgrade an rpm if present. The descriptor included a tag named nodeps which I don’t think works well when the rpm already exists of the same version so we need to remove the tag all together for previous base installs above version esx 3.0.0.

I deleted these tags for my config.

<nodeps>
<rpm>kbd</rpm>
<rpm>nfs-utils</rpm>
</nodeps>


In addition to the descriptor xml file edits we will need to copy the new rpm files into the esx3.all folder, remove the old ones, edit the header.info file found in the headers sub folder, copy the new hdr files to the headers folder and remove the old ones in the headers folder.

For example in adding ESX-1002424 I copied

From c:\vmware\patches\esx-1002424

Files

VMware-esx-apps-3.0.2-62488.i386.rpm
VMware-esx-vmkernel-3.0.2-62488.i386.rpm
VMware-esx-vmx-3.0.2-62488.i386.rpm

To c:\vmware\patches\esx3.all

From c:\vmware\patches\esx-1002424\headers

Files

VMware-esx-apps-0-3.0.2-62488.i386.hdr
VMware-esx-vmkernel-0-3.0.2-62488.i386.hdr
VMware-esx-vmx-0-3.0.2-62488.i386.hdr

To c:\vmware\patches\esx3.all\headers

You should delete any older file versions matching the prefixes portions of the newer files to avoid confusion. It will still function if you leave the old ones but it’s not a best practice.

The last step is to edit the header.info file, search for the text lines of the rpm file prefix and replace the info lines with the newer ones found from the updated patch header.info file.

For example search for text VMware-esx-vmx in the master header.info file which looks like the following partial example.

0:openssh-3.6.1p2-33.30.13vmw.i386=openssh-3.6.1p2-33.30.13vmw.i386.rpm
0:net-snmp-libs-5.0.9-2.30E.20.i386=net-snmp-libs-5.0.9-2.30E.20.i386.rpm
0:libtermcap-2.0.8-35.i386=libtermcap-2.0.8-35.i386.rpm
0:VMware-esx-vmx-3.0.2-62488.i386=VMware-esx-vmx-3.0.2-62488.i386.rpm
0:acl-2.2.3-1.i386=acl-2.2.3-1.i386.rpm
0:dev-3.3.12.3-1.i386=dev-3.3.12.3-1.i386.rpm
0:ethtool-1.8-3.3.i386=ethtool-1.8-3.3.i386.rpm

Replace it completely with the newer header.info text entry.

I tested my configuration on serveral base install versions and is was nice to see that on an esx 3.0.2-61818 it only downloaded and installed the addtional patches that were added to the master repo esx3.all

You should keep in mind that based on the cron schedule your VM’s may need to go down on that host and you must plan those outage times and gracefully shutdown where applicable.

Till next time …  happy esx patching.

Regards,

Mike

Provisioning Disaster Recovery with ZFS, iSCSI and VMware

OpenSolaris, ZFS, iSCSI and VMware are a great combination for provisioning Disaster Recovery (DR) systems at exceptionally low cost. There are some fundamentally well suited features of ZFS and VMFS volumes that provide a relatively simply and very efficient recovery process for VMware hosted non-zero RPO crash consistent recovery based environments. In this weblog I will demonstrate this capability and provide some step by step howto’s for replicating a ZFS, iSCSI and VMFS VMware based environment securely over a WAN or whatever you may have to a single ESXi remote server hosting a child OpenSolaris VM which provisions ZFS and iSCSI VMFS LUN’s to the parent ESXi host. The concept is to wrap the DR services into a single low cost self contained DR box that can be expanded out in the event of an actual DR incident while allowing for regular testing and validation processing without the high costs normally associated with standby DR systems. As one would expect this method becomes a very appealing solution for small to medium businesses who would normally abstain from DR provisioning activity due to the inherently high cost and complexity of DR.

The following diagram illustrates this DR system architecture.

 

 Disaster Recovery Architecture by Mike La Spina

When we have VMFS volumes backed by iSCSI based ZFS targets we gain the powerful replication capability of ZFS send and receive commands. This ZFS feature procures the ability to send an entire VMFS volume by way of a raw iSCSI target ZFS backing store. And once sent initially we can base all subsequent sends as a delta of change from a previous send snapshot which are respectively referred as snapshot deltas. Thus if we initially snapshot an iSCSI backing store and send the stream to a remote ZFS file system we can then send all the changed object data from that previous snapshot point to the current snapshot point and whatever else may be in between those snapshots. The result is a constant update of VMFS changes from the source ZFS file system to the remote ZFS file system of which can be completely different hardware. This ZFS hardware autonomy gift allows us to provision a much lower cost system on the DR remote side to host the VMFS volumes. For example the target system which is presented in this weblog is an IBM x3500 and the source is a SUN X4500 detailed in a previous blog.

There are some important considerations that should be kept in mind when we use snapshots to create any DR process. One of the most important areas to consider is the data change rates on the VMFS volumes that are to be included in the DR send/receive process. When we have VMware servers or VM’s that have low memory allocations (a.k.a. over committed memory) or application behaviors that swap to disk frequently we will observe high volumes of what I call disk noise or disk data change that has no permanent value. High amounts of disk noise will consume more storage and bandwidth on both systems when snapshots are present. In cases where the disk noise reaches a rate of 1GB/Day or more per volume it would be prudent to isolate the noise sources on a VMFS volume that will not be part of the replication strategy. You could for example create a VMFS LUN for swap and temp files on the local ESX host which can be ignored in the replication scope. Another important area is the growth rate of the overall storage may require routine pruning of older snapshots to reduce the total consumption of disk. For example if we have high change rates from database sources which can not be isolated we can at monthly intervals destroy all but one of the last months snapshots to conserve the available storage on both systems. This method still provisions a good DR process and as well provides a level of continuous data protection (CDP) and is simmilar to a grandfather/father/son preservation scheme.

Since we are handling valuable information we must use secure methods to access and handle the data transfers. This can be provisioned by using ssh and dedicated service accounts that will perform this one specific function. ZFS send and receive functions use an inherently secure approach by employing ssh as a transport tunnel when transmitting storage data to the target ZFS file system. This is just what we need to provision a secure exchange to a DR environment. Conversly we could use IPSec but this would be significantly more complex to achieve and complexity is not a good this when short implementation time is a priority.With this explanation wrapped up in our minds lets begin some of the detailed tasks that are required to procure this real world DR solution.

ESXi Server  

The DR VMware server is based on the free ESXi product and with this we can encapsulate the entire DR functionallity in one server hardware platform. Within the ESXi engine we need to install and configure an OpenSolaris 5.11 snv_98 or higher VM using VMFS as the storage provider. This ESXi server configuration consists of a single SATA boot LUN and this LUN also stores the OpenSolaris iSCSI VM. In addition to the boot LUN we will create the ZFS iSCSI stores on serveral additional SATA disks that are presented to the OpenSolaris VM as separate VMFS datastores which we will use to create large vmdk’s. The virtual vmdk disks will be assigned as vdev’s for our receiving ZFS zpool. Talk about rampent layering. At this point we have a OpenSolaris VM defined on a hardware platform on which OpenSolaris would normally never work with natively in this example. You have goto love what you can do with VMware virtualization. By the way when SUN’s xVM product is more mature it could provision the same fuctionallity with native ZFS provisioning and that alone really is worth a talk, but lets continue our focus on this platform for now.

There are many configuration options available on the network provisioning side of our ESXi host. In this case VLAN’s are definetly a solid choice for this application and is my prefered approach to controlling iSCSI data flow. We initially would only need to provide iSCSI access for the local OpenSolaris VM as this will provision a virtual SAN to the parent ESXi host. The parent ESXi host needs to be able mount the iSCSI target LUN’s that were available in the production environmant and validate that the DR process works. In the event of DR activation we would need to add external ESXi hosts and VLAN’s will provide both locally isolated iSCSI networks with easy expansion if these services are required externally all with out need to purchase external switch hardware for the system until it is required. Thus within the ESXi host we need to define a VLAN for the iSCSI SAN and an isolated VLAN for production VM validations and finally we need to define the replication and management network which can optionally use a VLAN or be untagged depending on your environment.

This virtualized DR environment grants advanced capabilties to perform rich system tests at commodity prices. Very attracive indeed. For example you can now clone the replicated VMFS LUN’s on the DR engine and with a liitle Solaris SMF iSCSI target service magic provision the clone as a duplicated ESX environment which does not impact the ongoing replication. As well we have network isolation and virtualization that allows the environment to exist in a closed fully functional remotely accessible world. This world can also be extended out as a production mirror test environment with dynamic revert back in time and repeat functionallity.

There are many possible ESXi network and disk configurations that would meet the DR server’s requirements. At the minimum we should provision the following elements.

  •  Provision a bootable single separate SATA disk with a minimum of 16G available for the VMFS LUN that will store the OpenSolaris iSCSI VM.
  •  Provision a minimum of three (optimally six) additional SATA disks or more if required as VMFS LUN’s to host the ZFS zpool vdev’s with vmdk’s.
  •  Provision a minimum of two 1Gb Ethernet adaptors, teamed would be preferable if more are available.
  •  Define vSwitch0 with a VLAN tagged VM Network portgroup to connect the replication side of the OpenSolaris iSCSI VM and a Service Console portgroup to manage the ESXi host.
  •  Define vSwitch1 with a VLAN tagged iSCSI VM kernel portgroup to service the iSCSI data plane and also define a VM Network portgroup on the same VLAN to connect with the target interface of the OpenSolaris iSCSI VM.
  •  Define the required isolated VLAN tagged identically named portgroups  as production on vSwitch0 and use a separated VLAN numbering set for them for isolation.
  •  Define the OpenSolaris VM with one adapter to connected to the production network portgroup and one adapter to attached to the iSCSI data plane portgroup to serve the iSCSI target IP.

Here is an example of what the VM disk assignments should look like.

VMFS LUN example by Mike La Spina

Once the ESXi server is successfully built and the Opensolaris iSCSI VM is installed and functional we can create the required elements for enabling ZFS replication.

Create Service Accounts

On the systems that will act as replication partners create zfsadm ID’s as service accounts using the provided commands.

# useradd -s /usr/bin/bash -d /export/home/zfsadm -P ‘ZFS File System Management’ zfsadm
# mkdir /export/home/zfsadm
# mkdir /export/home/zfsadm/backup
# cp /etc/skel/* /export/home/zfsadm
# echo PATH=/usr/bin:/usr/sbin:/usr/ucb:/etc:. > /export/home/zfsadm/.profile
# echo export PATH >> /export/home/zfsadm/.profile
# chown –R zfsadm /export/home/zfsadm
# passwd zfsadm

Note the parameter -P ‘ZFS File System Management’, This will grant the account an RBAC profile association to administratively manage our ZFS file system unlike root which is much too powerful and is all to often used by many of us. 

The next step is to generate some crypto keys for ssh connectivity we start this with a login as the newly created zfsadm user and run a secure shell locally to ensure you have a .ssh directory and key files created in the home drive for the zfsadm user. Note this directory is normally hidden. 

# ssh localhost
The authenticity of host ‘localhost (127.0.0.1)’ can’t be established.
RSA key fingerprint is 0c:aa:64:72:84:b5:04:1c:a2:d0:42:8e:9f:4e:09:9d.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added ‘localhost’ (RSA) to the list of known hosts.
Password:
# exit

Now that we have the .ssh directory we can create a crypto key pair and configure a relatively secure login without the need to enter a password for the remote host using this account.

Do not enter a passphrase, it needs to be blank.

# ssh-keygen -t dsa
Generating public/private dsa key pair.
Enter file in which to save the key (/export/home/zfsadm/.ssh/id_dsa):
Enter passphrase (empty for no passphrase):
Enter same passphrase again:
Your identification has been saved in /export/home/zfsadm/.ssh/id_dsa.
Your public key has been saved in /export/home/zfsadm/.ssh/id_dsa.pub.
The key fingerprint is:
bf:58:7b:97:8d:b5:d2:31:26:14:4c:9f:ce:72:a7:20 zfsadm@ss1

The id_dsa file should not be exposed outside of this directory as it contains the private key of the pair, only the public key file id_dsa.pub needs to be exported. Now that our key pair is generated we need to append the public portion of the key pair to a file named authorized_keys2. 

# cat $HOME/.ssh/id_dsa.pub >> $HOME/.ssh/authorized_keys2

Repeat all the Create Service Accounts steps and crypto key steps on the remote server as well.

We will use the Secure Copy command to place the public key file on each opposing hosts zfsadm users home directory so that when the ssh tunnel is started the remote host can decrypt the encrypted connection request completing the tunnel which is generated with the private part of the pair. This is why we must protect the private part of the pair from exposure. Granted we have also defined an additional layer of security here by defining a dedicated user for the ZFS send activity it is very important that the private key is secured properly and it is not necessary to back it up as you can regenerate them if required.

From the local server here named ss1 (The remote server is ss2)

# scp $HOME/.ssh/id_dsa.pub ss2:$HOME/.ssh/ss1.pub
Password:
id_dsa.pub      100% |**********************************************|   603       00:00
# scp ss2:$HOME/.ssh/id_dsa.pub $HOME/.ssh/ss2.pub
Password:
id_dsa.pub      100% |**********************************************|   603       00:00
# cat $HOME/.ssh/ss2.pub >> $HOME/.ssh/authorized_keys2

And on the remote server ss2

# ssh ss2
password:
# cat $HOME/.ssh/ss1.pub >> $HOME/.ssh/authorized_keys2
# exit

This completes the trusted key secure login configuration and you should be able to secure shell from either system to the other without a password prompt using the zfsadm account. To further limit security exposure we could employe ipaddress restrictions and as well enable a firewall but this is beyond the scope of this blog.

 Target Pool and ZFS rights

As a prerequisite you need to create the receiving zpool on the target to allow the zfs sends to occur. The receiving zpool name should be the same as the source to allow ease in the re-serving of iSCSI targets. Earlier we granted the “ZFS File System Management” profile to this zfsadm user. This RBAC profile allows us to run a pfexec command which pre checks what profiles the user is assigned and then executes appropriately based on this assignment. The bonus here is you do not have to create granular rights assignments to the ZFS file system.

On the target server create your receiveing zpool.

# zpool create rp1 <your vdev’s>

Create a Cron Job

Using a cron job we will invoke our ZFS snapshots and send tasks to the target host with the execution of a bash script named zfs-daily.sh. We need to use the crontab command to create a job that will execute it as the zfsadm user, no other user except root can access this job and that a good thing considering it has the ability to shell to another host!

As root add the zfs user name to the /etc/cron.d/cron.allow file.

# echo zfsadm >> /etc/cron.d/cron.allow
# crontab –e zfsadm
59 23 * * * ./zfs-daily.sh zfs-daily.rpl

Hint: crontab uses vi – http://www.kcomputing.com/kcvi.pdf  “vi cheat sheet”

The key sequence would be hit “i” and key in the line then hit “esc :wq” and to abort “esc :q!”

Be aware of the timezone the cron service runs under, you should check it and adjust it if required. Here is a example of whats required to set it. 

# pargs -e `pgrep -f /usr/sbin/cron`

8550:   /usr/sbin/cron
envp[0]: LOGNAME=root
envp[1]: _=/usr/sbin/cron
envp[2]: LANG=en_US.UTF-8
envp[3]: PATH=/usr/sbin:/usr/bin
envp[4]: PWD=/root
envp[5]: SMF_FMRI=svc:/system/cron:default
envp[6]: SMF_METHOD=start
envp[7]: SMF_RESTARTER=svc:/system/svc/restarter:default
envp[8]: SMF_ZONENAME=global
envp[9]: TZ=PST8PDT

Let’s change it to CST6CDT

# svccfg -s system/cron:default setenv TZ CST6DST

Also the default environment path for cron may cause some script “command not found” issues, check for a path and adjust it if required.

# cat /etc/default/cron
#
# Copyright 1991 Sun Microsystems, Inc.  All rights reserved.
# Use is subject to license terms.
#
#pragma ident   “%Z%%M% %I%     %E% SMI”
CRONLOG=YES

This one has no default path, add the path using echo.

# echo PATH=/usr/bin:/usr/sbin:/usr/ucb:/etc:. > /etc/default/cron
# svcadm refresh cron
# svcadm restart cron
 

Create Snapshot Replication Script

Here is the link for the zfs-daily.sh replication script you will need to grant exec rights to this file e.g.

# chmod 755 zfs-daily.sh 

The replcation script needs to live in the zfsadm home directory /export/home/zfsadm at this point I only have the one script built but other ones are in the works like a grandfather/father/son snapshot rollup script. The first run of the script can take a considerable amount of time depending on the available bandwidth and size of the VMFS luns. This cron job runs at midnight and took 6 hours over 100MB’s of bandwidth the first time and less that 5 min thereafter. A secondary script that runs hourly and is rolled up at days end would be beneficial. I will get it around to that one and the grandfather/father/son script later.

At this point we have an automated DR process that provides a form of CDP. But we do not have a way to access it so we need to perform some additional steps. In order for VMware to use the relocated VMFS iSCSI targets we need to reinstate some critical configuration info that was stored on the source Service Management Facility (SMF) repository. Within the iscsitgtd service properties we have the Network Address Authority (NAA) value which is named GUID in the properties list. This value is very important, when a VMFS is initialized the NAA is written to the VMFS volume header and this will need to be redefined on the DR target so that VMware will recognize the data store as available. If the NAA on the header and target do not match, the volume will not be visible to the DR VMware ESXi host. To protect this configuration info we need to export it from the source host and send it to the target host.

Export SMF iSCSI configuration

The iscstgtd service configuration elements can be easily exported using the following  command.

# svccfg export iscsitgt > /export/home/zfsadm/backup/ss1-iscsitgt.xml
 

Once exported to the backup directory we can Secure Copy this directory to the target system and this directory may also contain other useful info like installation instructions and so forth.

# scp ss1:/export/home/zfsadm/backup/* ss2:/export/home/zfsadm/backup/

This scp directory copy can be added to the zfs-daily.sh crontab script after it is performed once manually as it requires an interactive key signature trust authorization alternately it can be done manually after a configuration change occurs. I prefer the automated method so it is included in the script.

SMF iscsitgt import and iSCSI configuration details

To import the production service we would issue the following commands.

# svcadm disable iscsitgt
# svccfg delete iscsitgt
# svccfg import /export/home/zfsadm/backup/ss1-iscsitgt.xml

Importing the iscsitgt service configuration is a simple task but it does have some elements that will be problematic if they are left unchecked. For example iSCSI Target Portal Group Tag values are included with the exported/inport function and thus you may need to change the portal groups values to correct discovery failure when the ip addresses are different on the target system. Another potential issue is leaving the existing SMF config in place and then importing the new one on top of it. This is not a best practice as you may create an invalid SMF for the iscsitgt service with elements that are orphaned out etc. The SMF properties will have the backing store path from the source server and if the target server does not have the same zpool name this will need to be fixed. And lastly make sure you have the same iscsitgtd version on each end since it will have potential changes between the versions.

You will also need to add the ESXi software initiator to the iSCSI target(s) on the receiving server and grant access with an acl entry and chap info if used.

# iscsitadm create initiator –iqn iqn.1998-01.com.vmware:vh0.0 vh0.0
# iscsitadm modify target –acl vh0.0 ss1-zstore0

To handle a TPGT configuration change its simply a matter of re-adding them with the iscsitadm utility as demonstrated here or possibly deleting the one that are not correct.

# iscsitadm create tpgt 1
# iscsitadm modify tpgt -i 10.10.0.1 1
# iscsitadm modify tpgt -i 10.10.0.2 1
# iscsitadm modify target -p 1 ss1-zstore0

To delete a tpgt that is not correct is very strait forward.

# iscsitadm delete target -p 1 ss1-zstore0
# iscsitadm delete tpgt -A 1

Where 10.20.0.1 and 2 are the target interfaces that should participate in portal group 1 and ss1-zstore0 is the target alias. In some cases you may have to remove the tpgt  all together. The backing store is editable as well as many other SMF properties. To change a backing store value in the SMF we use the svccfg command as follows.

Here is an example of listing all the backing stores and then changing the /dev/zvol/rdsk/sp2/iscsi/lun0 so its on zpool sp1 instead of sp2

# svcadm enable iscsitgt
# svccfg -s iscsitgt listprop | grep backing-store

param_dr-zstore0_0/backing-store                astring  /dev/zvol/rdsk/sp2/iscsi/lun0
param_dr-zstore0_1/backing-store                astring  /dev/zvol/rdsk/sp1/iscsi/lun1

# svccfg -s iscsitgt setprop param_dr-zstore0_0/backing-store=/dev/zvol/rdsk/sp1/iscsi/lun0
# svccfg -s iscsitgt listprop | grep backing-store

param_dr-zstore0_0/backing-store                astring  /dev/zvol/rdsk/sp1/iscsi/lun0
param_dr-zstore0_1/backing-store                astring  /dev/zvol/rdsk/sp1/iscsi/lun1

Changing the backing store value is instrumental if you wish to mount the VMFS LUN’s to provision system validation or online testing. However do not attach the  file system from the active replicated zfs backing store to the ESXi server for validation or testing as it will fail any additional replications once it is modified outside of the active replication stream. You must first create a clone of a chosen snapshot and then modify the backing store to use this new backing store path. This method will present a read/write clone through the iscsitgt service and will have the same iqn names so no reconfiguration would be required to create different time windows into the data stores or reversion to a previous point.

Here is an example of how  this would be accomplished.

# zfs create sp1/iscsi/clones
# zfs clone sp1/iscsi/lun0@10-10-2008-23:45 sp1/iscsi/clones/lun0
# svcadm refresh iscsitgt
# svcadm restart iscsitgt

To change to a different snapshot time you would simply need to destroy or rename  the current clone and replace it with a new or renamed clone of an existing snapshot on the same clone backing store path.

# zfs destroy sp1/iscsi/clones/lun0
# zfs clone sp1/iscsi/lun0@10-11-2008-23:45 sp1/iscsi/clones/lun0
# svccfg -s iscsitgt setprop param_dr-zstore0_0/backing-store=/dev/zvol/rdsk/sp1/iscsi/clones/lun0
# svcadm refresh iscsitgt
# svcadm restart iscsitgt

VMware Software iSCSI configuration

The ESXi iSCSI software configuration is quite strait forward. In this architecture we need to place an interface of the OpenSolaris iSCSI target host on vSwitch1 which is where we defined the iSCSI-Net0 VM kernel network. To do this we create a VM Network portgroup on the same VLAN ID as the iSCSI VM kernel interface.

Here is an example of what this configuration looks like.

DR Net example by Mike La Spina

For more deatail on how to configure the iSCSI VM interfaces see this blog http://blog.laspina.ca/ubiquitous/running_zfs_over_iscsi_as in this case you would not need to define an aggregate since there is only one interface for the iSCSI vSAN.

The final step in the configuration is to define a discovery target on the iSCSI software configuration panel and then rescan the vmhba for new devices.

Hopefully this blog was of interest for you.

Til next time….

Mike

Running ZFS over iSCSI as a VMware vmfs store

The first time I looked at ZFS it totally floored me. This is a file system that has changed storage system rules as we currently know them and continues to do so. It is with no doubt the best architecture to date and now you can use it for your VMware stores.

Previously I had explored using it for a VMware store but ran into many issues which were real show stoppers. Like the VPD page response issue which made VMware see only one usable iSCSI store. But things are soon to be very different when Sun releases the snv_93 or above to all. I am currently using the unreleased snv_93 iscsitgt code and it works with VMware in all the ways you would want. Many thanks to the Sun engineers for adding NAA support on the iSCSI target service. With that being said let me divulge the details and behaviors of the first successful X4500 ZFS iSCSI VMware implementation in the real world.

Lets look at the Architectural view first.

X4500 iSCSI Architecture by Mike La Spina

 
The architecture uses a best practice approach consisting of completely separated physical networks for the iSCSI storage data plane. All components have redundant power and network connectivity. The iSCSI storage backplane is configured with an aggregate and is VLAN’d off from the server management network. Within the physical HP 2900’s an inter-switch ISL connection is defined but is not critical. This allows for more available data paths if additional interfaces were assigned on the ESX host side.
The Opensolaris aggregate and network components are configured as follows:

For those of you using Indiana….By default nwam is enabled on Indiana and this needs to be disabled and the physical network service enabled.

svcadm disable svc:/network/physical:nwam
svcadm enable svc:/network/physical:default

The aggregate is defined using the data link adm utility but first any bindings need to be cleared by unplumbing the interfaces.

e.g. ifconfig e1000g0 unplumb

Once cleared the assignment of the physical devices is possible using the following commands

dladm create-aggr –d e1000g0 –d e1000g1 –P L2,L3 1
dladm create-aggr –d e1000g2 –d e1000g3 –P L2,L3 2

Here we have set the policy allowing layer 2 and 3 and defined two aggregates aggr1 and aggr2. We can now define the VLAN based interface shown here as VLAN 500 instances 1 are 2 respective of the aggr instances. You just need to apply the following formula for defining the VLAN interface.

(Adaptor Name) + vlan * 1000 + (Adaptor Instance)

ifconfig aggr500001 plumb up 10.1.0.1 netmask 255.255.0.0
ifconfig aggr500002 plumb up 10.1.0.2 netmask 255.255.0.0

To persist the network configuration on boot you will need to create hostname files and hosts entries for the services to apply on startup.

echo ss1.iscsi1 > /etc/hostname.aggr500001
echo ss1.iscsi2 > /etc/hostname.aggr500002

Edit /etc/hosts to have the following host entries.

::1 localhost
127.0.0.1 ss1.local localhost loghost
10.0.0.1 ss1 ss1.domain.name
10.1.0.1 ss1.iscsi1
10.1.0.2 ss1.iscsi2

On the HP switches its a simple static trunk definition on port 1 and 2 using the following at the CLI.

trunk 1-2 trk1 trunk 

Once all the networking components are up and running and persistent, its time to define the ZFS store and iSCSI targets. I chose to include both mirrored and raidz pools. I needed to find and organize the cxtxdx device names using the cfgadm command or you could issue a format command as well to see the controller, target, disk names if you’re not using an X4500. I placed the raidz devices across controllers to improve I/O and distribute the load. It would not be a prudent to place one array on a single SATA controller. So here is what it ends up looking like from the ZFS command view.

zpool create –f rp1 raidz1 c4t0d0 c4t6d0 c5t4d0 c8t2d0 c9t1d0 c10t1d0
zpool add rp1 raidz1 c4t1d0 c4t7d0 c5t5d0 c8t3d0 c9t2d0 c10t2d0
zpool add rp1 raidz1 c4t2d0 c5t0d0 c5t6d0 c8t4d0 c9t3d0 c10t3d0
zpool add rp1 raidz1 c4t3d0 c5t1d0 c5t7d0 c8t5d0 c9t5d0 c11t0d0
zpool add rp1 raidz1 c4t4d0 c5t2d0 c8t0d0 c8t6d0 c9t6d0 c11t1d0
zpool add rp1 raidz1 c4t5d0 c5t3d0 c8t1d0 c8t7d0 c10t0d0 c11t2d0
zpool add rp1 spare c11t3d0
zpool create –f mp1 mirror c10t4d0 c11t4d0
zpool add mp1 mirror c10t5d0 c11t5d0
zpool add mp1 mirror c10t6d0 c11t6d0
zpool add mp1 spare c9t7d0

It only takes seconds to create terabytes of storage, wow it truly is a thing of beauty (geek!). Now it’s time to define a few pools and stores in preparation for the creation of the iSCSI targets. I chose to create units of 750G since VMware would not perform well with much more than that. This is somewhat dependant on the size of the VM and type of I/O but generally ESX host will serve a wide mix so try I keep it to a reasonable size or it ends up with SCSI reservation issues (that’s a bad thing chief).

You must also consider I/O block size before creating a ZFS store this is not something that can be changed later so now is the time. It’s done by adding the –b 64K to the ZFS create command. I chose to use 64k for the block size which aligns with VMWare default allocation size thus optimizing performance. The –s option enables a sparse volume feature aka thin provisioning. In this case the space was available but it is my favorite way to allocate storage.

zfs create rp1/iscsi
zfs create -s -b 64K -V 750G rp1/iscsi/lun0
zfs create -s -b 64K -V 750G rp1/iscsi/lun1
zfs create -s -b 64K -V 750G rp1/iscsi/lun2
zfs create -s -b 64K -V 750G rp1/iscsi/lun3
zfs create mp1/iscsi
zfs create -s -b 64K -V 750G mp1/iscsi/lun0

Originally I wanted to build the ESX hosts using a local disk but thanks to some bad IBM x346 engineering I could not use the QLA4050C and an integrated Adaptec controller on the ESX host server hardware. So I decided to give boot from iSCSI a go thus here is the boot LUN definition that I used for it. The original architectural design requires local disk to prevent an ESX host failure in the event of an iSCSI path outtage.

zfs create rp1/iscsi/boot
zfs create -s -V 16G rp1/iscsi/boot/esx1

Now that the ZFS stores are complete we can create the iSCSI targets for the ESX hosts to use. I have named the target alias to reflect something about the storage system which makes it easier to work with. I also created an iSCSI configuration store so we can persist the iSCSI targets on reboots. (This may now be included with Opensolaris Indiana but I have not tested it)

mkdir /etc/iscsi/config
iscsitadm modify admin –base-directory /etc/iscsi/config
iscsitadm create target -u 0 -b /dev/zvol/rdsk/rp1/iscsi/lun0 ss1-zrp1
iscsitadm create target -u 1 -b /dev/zvol/rdsk/rp1/iscsi/lun1 ss1-zrp1
iscsitadm create target -u 2 -b /dev/zvol/rdsk/rp1/iscsi/lun2 ss1-zrp1
iscsitadm create target -u 3 -b /dev/zvol/rdsk/rp1/iscsi/lun3 ss1-zrp1
iscsitadm create target -b /dev/zvol/rdsk/mp1/iscsi/lun0 ss1-zmp1
iscsitadm create target -b /dev/zvol/rdsk/rp1/iscsi/boot/esx1 ss1-esx1-boot

Most blog examples of enabling targets show the ZFS command line method as shareiscsi=on. This works well for a new iqn but if you want to allocate additional LUN under that iqn then you need to use this –b backing store method.

Now that we have some targets you should be able to list them using:

iscsitadm list target

Notice that we only see one iqn for ss1-zrp1, you can use the –v option to show all the LUN’s if required.

Target: ss1-zrp1
iSCSI Name: iqn.1986-03.com.sun:02:eb9c3683-9b2d-ccf4-8ae0-85c7432f3ef6.ss1-zrp1
Connections: 2
Target: ss1-zmp1
iSCSI Name: iqn.1986-03.com.sun:02:36fd5688-7521-42bc-b65e-9f777e8bfbe6.ss1-zmp1
Connections: 2
Target: ss1-esx1-boot
iSCSI Name: iqn.1986-03.com.sun:02:d1ecaed7-459a-e4b1-a875-b4d5df72de40.ss1-esx1-boot
Connections: 2

It would be prudent to create some target initiator entries to allow authorization control of what initiator iqn’s can connect to a particular target.
This is an important step. It will create the ability to use CHAP or at least only allow named iqn’s to connect to that target. iSNS also provides a similar service.

iscsitadm create initiator –iqn iqn.2000-04.com.qlogic:qla4050c.esx1.1 esx1.1
iscsitadm create initiator –iqn iqn.2000-04.com.qlogic:qla4050c.esx1.2 esx1.2

Now we can assign these initiators to a target and then the target will only accept those initiators. You can also add CHAP authentication as well, but that’s beyond the scope of this blog.

iscsitadm modify target –acl esx1.1 ss1-esx1-boot
iscsitadm modify target –acl esx1.2 ss1-esx1-boot
iscsitadm modify target –acl esx1.1 ss1-zrp1
iscsitadm modify target –acl esx1.2 ss1-zrp1
iscsitadm modify target –acl esx1.1 ss1-zmp1
iscsitadm modify target –acl esx1.2 ss1-zmp1

In order to boot from the target LUN we need to configure the QLA4050C boot feature. You must do this from the ESX host using the ctrl Q sequence during the boot cycle. It is simply a matter of entering the primary boot target IP set the mode to manual and enter the iqn exactly as it was listed from the iscsitadm list targets command. e.g.

iqn.1986-03.com.sun:02:d1ecaed7-459a-e4b1-a875-b4d5df72de40.ss1-esx1-boot

Once the iqn is entered the ESX host software can be installed and configured.
Till next time….

Automated VMware Tools upgrades using your VC and Perl

I was upgrading a VI 2.0.2/3.0.2 system to VI 2.5/3.5 and found that upgrading VMware Tools was becoming a real pain. So I decided to look for a better way. I first did the usual search hoping to find a quickie but no luck. I searched the VMware forums to find that everyone was using the MSI package in a manner external to the VI system. This did not address any Linux or Unix systems so here is what I came up with to address the lack of automation.

First I installed the VMware VI perl Tool Kit on the VC.

I changed the NTFS rights to the Tool Kit directory to an dedicated admin user ID and SYSTEM only, I then populated the visdk.rc config file in the same directory with the following parms

VI_PROTOCOL=https
VI_SERVER=theVCserverFQDN
VI_SERVICEPATH=/sdk
VI_USERNAME=adminid
VI_PASSWORD=thepassword

I also added the following ENV variable to the SYSTEM Context.

System variable ss

I wrote a short simple Perl script to make use of the power of the VC and have it take care of the task using it’s own built capability. Frankly I just don’t see why VMware did not have a scheduled task item to handle this job in the VC.  

#!/usr/bin/perl -w
#
# Requires a text file named VMToolsUpgradeList.txt
# containing VM names which the last of must not end with a CR
# Author: Mike La Spina
#   Date: May 24, 2008
# upgradetools.pl –host TheVC

use strict;
use warnings;
use VMware::VIRuntime;

my %opts = (
   ‘host’ => {
      type => “=s”,
      help => “Host is the VIM API service target”,
      required => 0,
   },
);

Opts::add_options(%opts);
Opts::parse();
Opts::validate();
my ($property_name, $property_value);
   $property_name = “config.name”;
   $property_value = ”;
Util::connect();

# get all VMs with out of date tools
my $vm_views = Vim::find_entity_views(view_type => ‘VirtualMachine’,
                  filter => { ‘guest.toolsStatus’ => ‘toolsOld’ });

foreach (@$vm_views) {
      my $nameVM = $_->name;
      #print $nameVM . “\n”;
      open FILE, “<”, “VMToolsUpgradeList.txt” or die $!;
      my @lines = <FILE>;
      foreach my $vmname (@lines) {
         if ($nameVM eq $vmname) {
            $_->UpgradeTools_Task();
            print “Issued VMware Tools upgrade task on ” . $nameVM . “\n”;
         };
         next;
      };
   next ;
}
print “Tools Upgrade scan completed\n”;
Util::disconnect();

The script requires a text file which I populated all of the VM names that I wanted to upgrade VMware Tools on. It’s noted in the script.

I created a windows task to run the perl script at 3:30AM and set it up to login in with the dedicated adminID and password.

I can now add any VM names to the VMToolsUpgradeList.txt and schedule the event to run after hours.

There are no issues with MSI deployements or Linux installs etc.

Till next time…