Posts Tagged ‘Oracle’

Oracle Clusterware as a 3rd party HA framework

Friday, June 12th, 2009

Oracle begin to push their Clusterware as a 3rd party HA framework. In this article we will review a quick example of how to do it. I will refer to this post as a quick-guide, as this is by no means any full-scale guide.

This article assumes you have installed Oracle Clusterware following one of the few links and guides available on the net. This quick-guide applies to both Clusterware 10 and Clusterware 11.

We will discuss the method of adding an additional NFS service on Linux.

In order to do so, you will need a shared storage – assuming the goal of the exercise is to supply the clients with a consistent storage services based on NFS. I, for myself, prefer to use OCFS2 as the choice file system for shared disks. This goes well with Oracle Clusterware, as this cluster framework does not handle disk mounts very well, and unless you are to write/search an agent which will make sure that every mount and umount behave correctly (you wouldn’t want to get a file system corruption, would you?), you will probably prefer to do the same. The lack of need to manage the disk mount actions will both save time on planned failover, and will guarantee storage safety. If you have not placed your CRS and Vote on OCFS2, you will need to install OCFS2 from here and here, and then to configure it. We will not discuss OCFS2 configuration in this post.

We will need to assume the following prerequisites:

  • Service-related IP address: 1.2.3.4. Netmask 255.255.255.248. You need this IP to be member of the same class as your public network card is.
  • Shared Storage: Formatted to OCFS2, and mounted on both nodes on /shared
  • Oracle Clusterware installed and working
  • Cluster nodes names are “node1″ and “node2″
  • Have $CRS_HOME point to your CRS installation
  • Have $CRS_HOME/bin in your $PATH

We need to create the service-related IP resource first. I would recommend to have an entry in /etc/hosts for this IP address on both nodes. Assuming the public NIC is eth0, The command would be

crs_profile -create nfs_ip -t application -a $CRS_HOME/bin/usrvip -o oi=eth0,ov=1.2.3.4,on=255.255.255.248

Now you will need to set running permissions for the oracle user. In my case, the user name is actually “oracle”:

crs_setperm nfs_ip -o root
crs_serperm nfs_ip -u user:oracle:r-x

Test that you can start the service as the oracle user:

crs_start nfs_ip

Now we need to setup NFS. For this to work, we need to setup the NFS daemon first. Edit /etc/exports and add a line such as this:

/shared *(rw,no_root_sqush,sync)

Make sure that nfs service is disabled during startup:

chkconfig nfs off
chkconfig nfslock off

Now is the time to setup Oracle Clusterware for the task:

crs_profile -create share_nfs -t application -B /etc/init.d/nfs -d “Shared NFS” -r nfs_ip -a sharenfs.scr -p favored -h “node1 node2″ -o ci=30,ft=3,fi=12,ra=5
crs_register share_nfs

Deal with permissions:

crs_setperms share_nfs -o root
crs_setperms share_nfs -u user:oracle:r-x

Fix the “sharenfs.scr” script. First, find it. It should reside in $CRS_HOME/crs/scripts if everything is OK. If not, you will be able to find it in $CRS_HOME using find.

Edit the “sharenfs.scr” script and modify the following variables which are defined relatively in the beginning of the script:

PROBE_PROCS=”nfsd”
START_APPCMD=”/etc/init.d/nfs start
START_APPCMD2=”/etc/init.d/nfslock start”
STOP_APPCMD=”/etc/init.d/nfs stop”
STOP_APPCMD2=”/etc/init.d/nfslock stop”

Copy the modified script file to the other node. Verify this script has execution permissions on both nodes.

Start the service as the oracle user:

crs_start sharenfs

Test the service. The following command should return the export path:

showmount -e 1.2.3.4

Relocate the service and test again:

crs_relocate -f sharenfs
showmount -e 1.2.3.4

Done. You now have HA NFS service above Oracle Clusterware framework.

I used this web page as a reference. I thank him for his great work!

Persistent raw devices for Oracle RAC with iSCSI

Saturday, December 6th, 2008

If you’re into Oracle RAC over iSCSI, you should be rather content – this configuration is a simple and supported. However, working with some iSCSI target devices, order and naming is not consistent between both Oracle nodes.

The simple solutions are by using OCFS2 labels, or by using ASM, however, if you decide to place your voting disks and cluster registry disks on raw devices, you are to face a problem.

iSCSI on RHEL5:

There are few guides, but the simple method is this:

  1. Configure mapping in your iSCSI target device
  2. Start the iscsid and iscsi services on your Linux
    • service iscsi start
    • service iscsid start
    • chkconfig iscsi on
    • chkconfig iscsid on
  3. Run “iscsiadm -m discovery -t st -p target-IP
  4. Run “iscsiadm -m node -L all
  5. Edit /etc/iscsi/send_targets and add to it the IP address of the target for automatic login on restart

You need to configure partitioning according to the requirements.

If you are to setup OCFS2 volumes for the voting and for the cluster registry, there should not be a problem as long as you use labels, however, if you require raw volumes, you need to change udev to create your raw devices for you.

On a system with persistent disk naming, follow this process, however, on a system with changing disk names (every reboot names are different), the process can become a bit more complex.

First, detect your scsi_id for each device. While names might change upon reboots, scsi_ids do not.

scsi_id -g -u -s /block/sdc

Replace sda with the device name you are looking for. Notice that /block/sda is a reference to /sys/block/sdc

Use the scsi_id generated by that to create the raw devices. Edit /etc/udev/rules.d/50-udev.rules and find line 298. Add a line below with the following contents:

KERNEL==”sd*[0-9]“, ENV{ID_SERIAL}==”14f70656e66696c000000000004000000010f00000e000000″, SYMLINK+=”disk/by-id/$env{ID_BUS}-$env{ID_SERIAL}-part%n” ACTION==”add” RUN+=”/bin/raw /dev/raw/raw%n %N”

Things to notice:

  1. The ENV{ID_SERIAL} is the same scsi_id obtained earlier
  2. This line will create a raw device in the name of raw and number in /dev/raw for each partition
  3. If you want to differtiate between two (or more) disks, change the name from raw to an aduqate name, like “crsa”, “crsb”, etc, for example:

KERNEL==”sd*[0-9]“, ENV{ID_SERIAL}==”14f70656e66696c000000000005000000010f00000e000000″, SYMLINK+=”disk/by-id/$env{ID_BUS}-$env{ID_SERIAL}-part%n” ACTION==”add” RUN+=”/bin/raw /dev/raw/crs%n %N”

Following these changes, run “udevtrigger” to reload the rules. Be advised that “udevtrigger” might reset network connection.

Raw devices for Oracle on RedHat (RHEL) 5

Tuesday, October 21st, 2008

There is a major confusion among DBAs regarding how to setup raw devices for Oracle RAC or Oracle Clusterware. This confusion is caused by the turn RedHat took in how to define raw devices.

Raw devices are actually a manifestation of character devices pointing to block devices. Character devices are non-buffered, so they act as FIFO, and have no OS cache, which is why Oracle likes them so much for Clusterware CRS and voting.

On other Unix types, commonly there are two invocations for each disk device – a block device (i.e /dev/dsk/c0d0t0s1) and a character device (i.e. /dev/rdsk/c0d0t0s1). This is not the case for Linux, and thus, a special “raw”, aka character, device is to be defined for each partition we want to participate in the cluster, either as CRS or voting disk.

On RHEL4, raw devices were setup easily using the simple and coherent file /etc/sysconfig/rawdevices, which included an internal example. On RHEL5 this is not the case, and customizing in a rather less documented method the udev subsystem is required.

Check out the source of this information, at this entry about raw devices. I will add it here, anyhow, with a slight explanation:

1. Add to /etc/udev/rules.d/60-raw.rules:

ACTION==”add”, KERNEL==”sdb1″, RUN+=”/bin/raw /dev/raw/raw1 %N”

2. To set permission (optional, but required for Oracle RAC!), create a new /etc/udev/rules.d/99-raw-perms.rules containing lines such as:

KERNEL==”raw[1-2]“, MODE=”0640″, GROUP=”oinstall”, OWNER=”oracle”

Notice this:

  1. The raw-perms.rules file name has to begin with the number 99, which defines its order during rules apply, so that it will be used after all other rules take place. Using lower numbers might cause permissions to be incorrect.
  2. The following permissions have to apply:
  • OCR Device(s): root:oinstall , mode 0640
  • Voting device(s): oracle:oinstall, mode 0666
  • You don’t have to use raw devices for ASM volumes on Linux, as the ASMLib library is very effective and easier to manage.

    Oracle RAC with EMC iSCSI Storage Panics

    Tuesday, October 14th, 2008

    I have had a system panicking when running the mentioned below configuration:

    • RedHat RHEL 4 Update 6 (4.6) 64bit (x86_64)
    • Dell PowerEdge servers
    • Oracle RAC 11g with Clusterware 11g
    • EMC iSCSI storage
    • EMC PowerPate
    • Vote and Registry LUNs are accessible as raw devices
    • Data files are accessible through ASM with libASM

    During reboots or shutdowns, the system used to panic almost before the actual power cycle. Unfortunately, I do not have a screen capture of the panic…

    Tracing the problem, it seems that iSCSI, PowerIscsi (EMC PowerPath for iSCSI) and networking services are being brought down before “killall” service stops the CRS.

    The service file init.crs was never to be executed with a “stop” flag by the start-stop of services, as it never left a lock file (for example, in /var/lock/subsys), and thus, its existence in /etc/rc.d/rc6.d and /etc/rc.d/rc0.d is merely a fake.

    I have solved it by changing /etc/init.d/init.crs script a bit:

    • On “Start” action, touch a file called /var/lock/subsys/init.crs
    • On “Stop” action, remove a file called /var/lock/subsys/init.crs

    Also, although I’m not sure about its necessity, I have changed init.crs script SYSV execution order in /etc/rc.d/rc0.d and /etc/rc.d/rc6.d from wherever it was (K96 in one case and K76 on another) to K01, so it would be executed with the “stop” parameter early during shutdown or reboot cycle.

    It solved the problem, although future upgrades to Oracle ClusterWare will require being aware of this change.

    Oracle ASM and EMC PowerPath

    Wednesday, May 28th, 2008

    Setting up an Oracle ASM disks is rather simple, and the procedure can be easily obtained from here, for example. This is nice and pretty, and works well for most environments.

    EMC PowerPath creates meta devices which utilize the underlying paths, as mod_scsi sees them in Linux, without hiding them (unlike IBM’s RDAC, for example). This results in the ability to view and access each LUN either through the PowerPath meta device (/dev/emcpower*) or through the underlying SCSI disk device (/dev/sd*). You can obtain the existing paths of a single meta devices through running the command

    powermt display dev=emcpowera

    where ‘emcpowera’ is an example. It can be any of your power meta devices. You will see the underlying SCSI devices.

    During startup, Oracle ASM (startup script: /etc/init.d/oracleasm) scans all block devices for ASM headers. On a system with many LUNs, this can take a while (half an hour, and sometimes much more). Not only that, but since ASM scans the available block devices in a semi-random order, the chances are very high that the /dev/sd* will be used instead of the /dev/emcpower* block device. This results in degraded performance, where active-active configuration has been set for PowerPath (because it will not be used), and moreover – a failure of that specific link will result in failure to access the specific LUN through that path, with disregard to any other existing paths to the LUN.

    To "set things right", you need to edit /etc/sysconfig/oracleasm, and exclude all ’sd’ devices from ASM scan.

    To verify that you’re actually using the right block device:

    /etc/init.d/oracleasm listdisks

    Select any one of the DG disks, and then

    /etc/init.d/oracleasm querydisk DATA1
    Disk “DATA1″ is a valid ASM disk on device [120, 6]

    The numbers are the major and minor of the block device. You can easily find the device through this command:

    ls -la /dev/ | grep MAJOR | grep MINOR

    In our example, the MAJOR will be 120, and the MINOR will be 6. The result would look like a single block device.

    If you’re using EMC PowerPath, your block device major would be 120 and around that number. If you’re (mistakenly) using one of the underlying paths, your major would be 8 and nearby numbers. If you’re using Linux LVM, your major would be around the number 253. The expected result, when using EMC PowerPath is always with major of 120 – always using the /dev/emcpower* devices.

    This also decreases the boot time rather dramatically.

    HP EVA bug – Snapshot removed through sssu is still there

    Friday, May 2nd, 2008

    This is an interesting bug I have encountered:

    The output of an sssu command should look like this:

    EVA> DELETE STORAGE “\Virtual Disks\Linux\oracle\SNAP_ORACLE”

    EVA>

    It still leaves the snapshot (SNAP_ORACLE in this case) visible, until the web interface is used to press on “Ok”.

    This happened to me on HP EVA with HP StorageWorks Command View EVA 7.0 build 17.

    When sequential delete command is given, it looks like this:

    EVA> DELETE STORAGE “\Virtual Disks\Linux\oracle\SNAP_ORACLE”

    Error: Error cannot get object properties. [ Deletion completed]

    EVA>

    When this command is given for a non-existing snapshot, it looks like this:

    EVA> DELETE STORAGE “\Virtual Disks\Linux\oracle\SNAP_ORACLE”

    Error: \Virtual Disks\Linux\oracle\SNAP_ORACLE not found

    So I run the removal command twice (scripted) on an sssu session without “halt_on_errors”. This removes the snapshots correctly.

    Incorrect dependencies for installation of packages on AIX 5.3

    Wednesday, December 5th, 2007

    Following an upgrade of AIX 5.3 to level 07, with SP1 technology upgrade, I had encountered a problem installing a package required for Oracle 11g – rsct.basic.rte 2.4.8.0

    This rsct.basic.rte package requires rsct.basic.rte version 2.4.0.0 from AIX CD, however, to install it, I am required to install xlC.aix61 version 9.0.0.1, which should not be here, and following that, bos.rte version 6.0.0.0, which should be part of AIX 6.x.

    Some elaboration on the bos family of packages – bos stands for Base Operating System. rte stands for RunTime Environment. It means that bos.rte version 6.0.0.0 would be the base operating system runtime components of AIX version 6. This was far from my desire, as you cannot replace the system’s bos.rte package…

    I have attempted to force installation of the baseline version of rsct.* from the cd, by running the command

    installp -aF -d /dev/cd0 rsct.basic.rte

    but for no avail. I have removed all rsct.* packages (this time I used smit), and still – I was unable to install the baseline package rsct.basic.rte, since it had dependencies from AIX 6…

    I was able to solve it using the following method:

    1. Installed all bos.adt baseline packages missing, using the following command

    installp -aXg -d /dev/cd0 bos.adt

    2. Extracted the combined package of SP1 technology update, and upgrade package to a specific directory

    3. Copied the contents of baseline rsct packages from the cdrom to that same directory mentioned above:

    mount /mnt/cdrom
    cp /mnt/cdrom/installp/ppc/rsct.* ./

    4. Created new .toc file

    inutoc .

    5. Installed (and succeeded this time) rsct.basic.rte. This time all dependencies were fulfilled

    installp -aXg -d . rsct.basic.rte

    6. Updated the entire level back to the latest os level

    smit update_all

    This worked fine, and I write it down for the next sucker who would be required to fulfill an impossible requirement in order to install one small package.

    HP MSA1000 controller failover

    Tuesday, March 27th, 2007

    HP MSA1000 is an entry-level disk storage capable of communicating via different types of interfaces, such as SCSI and FC, and can allow FC failover. This FC failover, however, is controller failover and not path failover. It means that if the primary controller fails entirely, the backup controller will “kick in”. However, if a multi-path capable client will fail its primary interface, there is no guarantee that communication with the disks through the backup controller.

    The symptom I have encountered was that the secondary path, while exposing the disks (while the primary path was down for one of the servers) to the server, did not allow any SCSI I/O operations. This prevented the Linux server’s SCSI layer from accessing the disks. So they did appear when doing “cat /proc/scsi/scsi“, however, they were not detected using, for example, “fdisk -l“, and the system logs got filled with “SCSI Error” messages.

    About a month ago, after almost two years, a new firmware update has been released (can be found here). Two versions exist – Active/Passive and Active/Active.

    I have upgraded the MSA1000 storage device.

    After installing the Active/Active firmware upgrade (Notice Linux users – You must have X to run the “msa1500flash” utility), and after power cycling the MSA1000 device, things start to look good.

    I have tested performance with a person on-site disconnecting fiber connections on-demand, and it worked great. About 2-5 seconds failover time.

    Since this system run Oracle RAC, and it uses OCFS2, I had to update the failed-node timeout to be 31 seconds (per this Oracle’s OCFS site, which includes some really good tips).

    So real High Availability can be archived after upgrading MSA1000 firmware.

    A note about Startup/Shutdown scripts – dbora

    Thursday, January 11th, 2007

    Per the last post in this thread, I have created a startup script to an Oracle setup I’ve had.

    The script is rather simple – you “su -” to the Oracle user, and you just start the DB. Same goes for shutting it down. I have tested it and it worked well.

    Due to the customer’s demands, we’ve relocated the DB data to an NFS on NetApp. Afterwards, reboots didn’t go quite will. It seemed that the DB never went down correctly, and that we’ve managed, somehow, to umount the NFS volumes without shutting it down.

    After a short investigation, it was revealed that /etc/rc.d/rc, which controls the startup and shutdown of services on a system, checks to verify that there is a lock file with the same name as the startup service. In our case – we should have created an empty file in /var/lock/subsys/ named dbora during the startup of our dbora script.

    When changing the script to do it (and to remove the file during shutdown), things started working correcly.

    Troubleshooting weird networking problem

    Wednesday, August 9th, 2006

    Problem as follows: A Linux server is connected to a 1Gb/s LAN using 1Gb/s interface.

    I was told that SSH to the machine fails with “socket error” when done from Windows/Putty. One of the tests was done using Linux/ssh client, and it went fine. A switch was replaced, and other methods of detection showed weird results.

    When I came to the place I have started with the usual procedure – ifconfig, and to see there are no TX or RX errors, dmesg, checking /var/log/messages, ethtool. All produced the results expected when everything is working fine. I even switched network interfaces (using the 2nd Ethernet port on the server), but for no avail.

    The actual results looked a bit different – clients were unable to connect to the server using SSH for the first time (in general), but were able to connect the next time. You can’t run your Oracle server on such a setup…

    I have escalated my tests into tcpdump, which showed only part of the information expected, but gave too much junk to be readable enough to fetch anything out of it.

    Using remote desktop from another server to client’s desktop we’ve encountered that same problem – first time failure, and then success, and then it hit me! On another (it was third or fourth desktop) I have looked in the output of “arp -a” (Windows Desktop) right after the first failure, and saw that the MAC address assigned to the server’s IP is a wrong one. Some other machine on the network had this same IP address. Replacing the Linux Server’s IP address to a free one solved everything, as it seems, and resulted in a fine working server, and some free time devouted to hunting down the renegade spoofing machine.