Posts Tagged ‘storage devices’

Hot-resize disks on Linux

Monday, April 6th, 2020

After major investigations around, I came to the conclusion that a full guide describing the procedure required for online disk resize on Linux (especially – expanding disks). I have created a guide for RHEL5/6/7/8 (works the same for Centos or OEL or ScientificLinux – RHEL-based Linux systems) which takes into account the following four scenarios:

  • Expanding a disk where there is a filesystem directly on disk (no partitioning used)
  • Expanding a disk where there is LVM PV directly on disk (no partitioning used)
  • Expanding a disk where there is a filesystem on partition (a single partition taking all the disk’s space)
  • Expanding a disk where there is an LVM PV on partition (a single partition taking all the disk’s space)

All four scenarios were tested with and without use of multipath (device-mapper-multipath). Also – notes about using GPT compared to MBR are given. The purpose is to provide a full guideline for hot-extending disks.

This document does not describe the process of extending disks on the storage/virtualisation/NAS/whatever end. Updating the storage client configuration to refresh the disk topology might differ in various versions of Linux and storage communication methods – iSCSI, FC, FCoE, AoE, local virtualised disk (VMware/KVM/Xen/XenServer/HyperV) and so on. Each connectivity/OS combination might require different refresh method called on the client. In this lab, I use iSCSI and iSCSI software initiator.

The Lab

A storage server running Linux (Centos 7) with targetcli tools exporting 5GB (or more) LUN through iSCSI to Linux clients running Centos5, Centos6, Centos7 and Centos8, with the latest updates (5.11, 6.10, 7.7, 8.1). See some interesting insights on iSCSI target disk expansion using linux LIU (targetcli command line) in my previous post.

The iSCSI clients all see the disk as ‘/dev/sda’ block device. When using LVM, the volume group name is tempvg and the logical volume name is templv. When using multipath, the mpath name is mpatha. On some systems the mpath partition would appear as mpatha1 and on others as mpathap1.

iSCSI client disk/partitions were performed like this:

Centos5:

* Filesystem on disk

1
2
mkfs.ext3 /dev/sda
mount /dev/sda /mnt

* LVM on disk

1
2
3
4
5
pvcreate /dev/sda
vgcreate tempvg /dev/sda
lvcreate -l 100%FREE -n templv tempvg
mkfs.ext3 /dev/tempvg/templv
mount /dev/tempvg/templv /mnt

* Filesystem on partition

1
2
3
parted -s /dev/sda "mklabel msdos mkpart primary 1 -1"
mkfs.ext3 /dev/sda1
mount /dev/sda1 /mnt

* LVM on partition

1
2
3
4
5
6
parted -s /dev/sda "mklabel msdos mkpart primary 1 -1 set 1 lvm on"
pvcreate /dev/sda1
vgcreate tempvg /dev/sda1
lvcreate -l 100%FREE -n templv tempvg
mkfs.ext3 /dev/tempvg/templv
mount /dev/tempvg/templv /mnt

Centos6:

* Filesystem on disk

1
2
mkfs.ext4 /dev/sda
mount /dev/sda /mnt

* LVM on disk

1
2
3
4
5
pvcreate /dev/sda
vgcreate tempvg /dev/sda
lvcreate -l 100%FREE -n templv tempvg
mkfs.ext4 /dev/tempvg/templv
mount /dev/tempvg/templv /mnt

* Filesystem on partition

1
2
3
parted -s /dev/sda "mklabel msdos mkpart primary 1 -1"
mkfs.ext4 /dev/sda1
mount /dev/sda1 /mnt

* LVM on partition

1
2
3
4
5
6
parted -s /dev/sda "mklabel msdos mkpart primary 1 -1 set 1 lvm on"
pvcreate /dev/sda1
vgcreate tempvg /dev/sda1
lvcreate -l 100%FREE -n templv tempvg
mkfs.ext4 /dev/tempvg/templv
mount /dev/tempvg/templv /mnt

Centos7/8:

* Filesystem on disk

1
2
mkfs.xfs /dev/sda
mount /dev/sda /mnt

* LVM on disk

1
2
3
4
5
pvcreate /dev/sda
vgcreate tempvg /dev/sda
lvcreate -l 100%FREE -n templv tempvg
mkfs.xfs /dev/tempvg/templv
mount /dev/tempvg/templv /mnt

* Filesystem on partition

1
2
3
parted -a optimal -s /dev/sda "mklabel msdos mkpart primary 1 -1"
mkfs.xfs /dev/sda1
mount /dev/sda1 /mnt

* LVM on partition

1
2
3
4
5
6
parted -a optimal -s /dev/sda "mklabel msdos mkpart primary 1 -1 set 1 lvm on"
pvcreate /dev/sda1
vgcreate tempvg /dev/sda1
lvcreate -l 100%FREE -n templv tempvg
mkfs.xfs /dev/tempvg/templv
mount /dev/tempvg/templv /mnt

Some variations might exist. For example, use of ‘GPT’ partition layout would result in a parted command like this:

1
parted -s /dev/sda "mklabel gpt mkpart ' ' 1 -1"

Also, for multipath devices, replace the block device /dev/sda with /dev/mapper/mpatha, like this:

1
parted -a optimal -s /dev/mapper/mpatha "mklabel msdos mkpart primary 1 -1"

There are several common tasks, such as expanding filesystems – for XFS, using xfs_growfs <mount target> ; for ext3fs and ext4fs using resize2fs <device path>. Same goes for LVM expansion – using pvresize <device path>, followed by lvextend command, followed by the filesystem expanding command as noted above.

The document layout

The document will describe the client commands for each OS, sorted by action. The process would be as following:

  • Expand the visualised storage layout (storage has already expanded LUN. Now we need the OS to update to the change)
  • (if in use) Expand the multipath device
  • (if partitioned) Expand the partition
  • Expand the LVM PV
  • Expand the filesystem

Actions

For each OS/scenario/mutipath combination, we will format and mount the relevant block device, and attempt an online expansion.

Operations following disk expansion

Expanding the visualised storage layout

For iSCSI, it works quite the same for all OS versions. For other transport types, actions might differ.

1
iscsiadm -m node -R

Expanding multipath device

If using multipath device (device-mapper-multipath), an update to the multipath device layout is required. Run the following command (for all OSes)

1
multipathd -k"resize map mpatha"

Expanding the partition (if disk partitions are in use)

This is a bit complicated part. It differs greatly both in the capability and the commands in use between different versions of operation systems.

Centos 5/6

Online expansion of partition is impossible, except if used with device-mapper-multipath, in which case we force the multipath device to refresh its paths to recreate the device. It will result in an I/O error if there is only a single path defined. For non-multipath setup, a umount and re-mount is required. Disk partition layout cannot be read while the disk is in use.

Without Multipath
1
2
fdisk /dev/sda # Delete and recreate the partition from the same starting point
partprobe # Run when disk is not mounted, or else it will not refresh partition size
With Multipath
1
2
3
4
5
6
fdisk /dev/mapper/mpatha # Delete and recreate the partition from the same starting point
partprobe
multipathd -k"reconfigure" # Sufficient for Centos 6
multipathd -k"remove path sda" # Required for Centos 5
multipathd -k"add path sda" # Required for Centos 5
# Repeat for all sub-paths of expanded device
Centos 7/8
Without Multipath
1
2
fdisk /dev/sda # Delete and recreate partition from the same starting point. Sufficient for Centos 8
partx -u /dev/sda # Required for Centos 7
with Multipath
1
2
fdisk /dev/mapper/mpatha # Delete and recreate the partition from the same starting point. Sufficient for Centos 8
kpartx -u /dev/mapper/mpatha # Can use partx

Expanding LVM PV and LV

1
pvresize DEVICE
Device can be /dev/sda ; /dev/sda1 ; /dev/mapper/mpatha ; /dev/mapper/mpathap1 ; /dev/mapper/mpatha1 – according to the disk layout and LVM choice. lvextend -l +100%FREE /dev/tempvg/templv

Expanding filesystem

For ext3fs and ext4fs
1
resize2fs DEVICe
Device can be /dev/sda ; /dev/sda1 ; /dev/mapper/mpatha ; /dev/mapper/mpathap1 ; /dev/mapper/mpatha1 – according to the disk layout and LVM choice.
For xfs
1
xfs_growfs /mnt

Additional Considerations

MBR vs GPT

On most Linux versions (For Centos – up and including version 7) the command ‘fdisk’ is incapable of handling GPT partition layout. If using GPT partition layout, use of gdisk is recommended, if it exists for the OS. If not, parted is a decent although somewhat limited alternative.

gdisk command can also modify a partition layout (at your own risk, of course) from MBR to GPT and vice versa. This is very useful in saving large data migrations where legacy MBR partition layout was used on disks which are to be expanded beyond the 2TB limits.

GPT backup table is located at the end of the disk, so when extending a GPT disk, it is require to repair the GPT backup table. Based on my lab tests – it is impossible to both extend the partition and repair the GPT backup table location in a single call to gdisk. Two runs are required – one to fix the GPT backup table, and then – after the changes were saved – another to extend the partition.

Storage transport

I have demonstrated use of iSCSI software initiator on Linux. Different storage transport exist – each may require its own method of ‘notifying’ the OS of changed storage layout. See RedHat’s article about disk resizing (RHN access required). This article explains how to refresh the storage transport for a combination of various transports and RHEL versions. and sub-versions.

targetcli extend fileio backend

Friday, April 3rd, 2020

I am working on an article which will describe the procedures required to extend LUN on Linux storage clients, with and without use of multipath (device-mapper-multipath) and with and without partitioning (I tend to partition storage disks, even when this is not exactly required). Also – it will deal with migration from MBR to GPT partition layout, as part of this process.

During my lab experiments, I have created a dedicated Linux storage machine for this purpose. This is not my first, of course, and not likely my last either, however, one of the challenges I’ve had to confront was how to extend or resize in general an iSCSI LUN from the storage point of view. This is not as straight-forward as one might have expected.

My initial setup:

  • Centos 7 or later is used.
  • Using targetcli command-line (meaning – using LIO mechanism).
  • I am using ZFS for the purpose of easily allocating block devices and files on filesystems. This is not a must – LVM can do just right.
  • targetcli is using automatic saveconfig (default configuration).

I will not go over the whole process of setting up and running iSCSI target server. You can find this in so many guides around the web, such as this and that, as well as so many more. So, skipping that – we have a Linux providing three LUNs to another Linux over iSCSI. Currently – using a single network link.

Now comes the interesting part – if I want to expand/resize my LUN on the storage, there are several branches of possibilities.

Assuming we are using the ‘block’ backstore – there is nothing complicated about it – just extend the logical volume, or the ZFS volume, and you’re done with that. Here is an example:

LVM:

lvextend -L +1G /dev/storageVG/lun1

ZFS:

zfs set volsize=11G storage/lun1 # volsize should be the final size

Extremely simple. Starting at this point, LIO will know of the updated sizes, and will just notify any relevant party. The clients, of course, will need to rescan the iSCSI storage, and adept according to the methods in use (see my comment at the beginning of this post about my project).

It is as simple as that if using ‘fileio’ backstore with a block device. Although this is not the best recommended setup, it allows for (default) more aggressive write-back cache, and might reduce disk load. If this is how your backstore is defined (fileio + block device) – same procedure applies as before – extend the block device, and everyone is notified about it.

It becomes harder when using a real file as the ‘fileio’ backstore. By default, fileio will create a new file when defined, or use an existing one. It will use thin provisioning by default, which means it will not have the exact knowledge of the file’s size. Extending or shrinking the file, except for the possibility of data corruption, would have no impact.

Documentation about how to do is is non-existing. I have investigated it, and came to the following conclusion:

  • It is a dangerous procedure, so do it at your own risk!
  • It will result in a short IO failure because we will need to restart the service target.service

This is how it goes. Follow this short list and you shall win:

  • Calculate the desired size in bytes.
  • Copy to a backup the file /etc/target/saveconfig.json
  • Edit the file, and identify the desired LUN – you can identify the file name/path
  • Change the size from the specified size to the desired size
  • Restart the target.service service

During the service restart all IO would fail, and client applications might get IO errors. It should be faster than the default iSCSI retransmission timeout, but this is not guaranteed. If using multipath (especially with queue_if_no_path flag) the likeness of this to affect your iSCSI clients is nearly zero. Make sure you test this on a non-production environment first, of course.

Hope it helps.

IBM DS3400 expand Logical Drive (LUN)

Wednesday, May 6th, 2009

I have always liked IBM DS series management suite. I have claimed once that your first storage (and with it – your way of thinking about storage abstraction, I assume) is your favorite storage. I have been using the Storage Manager 9 for years now, even before it was 9 (I think that it was 7 then), and way before it has become 10-something version.

SM9 is the tool to manage the DS4xxx storage series (previously fastT series, if you can remember…), however, their “smaller” brothers, the DS3xxx series have arrived with something else claimed to be “Storage Manager” as well. It has began as SM2, and I was sure surprised to see it under the name “Storage Manager 10” today.

It seems as if someone wanted to make life easy, so everything is hidden behind some messy-looking dashboard. Finding the so-well-known abstraction logic is almost impossible for me, no matter how many times I will use this disaster of a product. Can’t complain about the storage devices – they are rather good, and although IBM has recently had some firmware issues and stability issues with some of its storage products – I can’t say I hate this device. Storage Manager 2 (or 10, today), however, is something I prefer to avoid.

After all this ranting and crying, I can share this piece of technical information. Storage Manager 10 (2, whatever) does not allow for Logical Drive to be expanded. You can expand the Raid Array – no problem. Just add some more disks to it, however, the free space created in that specific array will not be available for your reallocation pleasure – you cannot expand any of your existing Logical Drives with it.

Google has pointed me to that specific post which enlightened me with a new understanding of the DS series – the SM10, while being nothing more than a limited GUI for the storage-handicap IT persons, does not completely cancel the storage device’s abilities. The device can expand LUNs, so SM10 can allow it, secretly.

You need to know what you’re doing. I will copy/paste the contents of that blog entry for future use:

Expanding LUNs on a DS3000 series disk system:

First make sure you have enough space in your array. Next go to the script editor in the Storage Manager gui. Choose script editor. Type in a command like this:

set logicalDrive [“FINKOPAS01-BOOT”] addCapacity=50GB;

This would add 50GB’s to the disk “FINKOPAS01-BOOT”.

Now choose run command from the menu. After the disk system has expanded your LUN you should see the extra capacity in the disk management console window.

*NOTE*: This information was originally found here: http://www.systemx.co.nz/xcelerator/?q=node/99

The target link is dead there, unfortunately, however, the “secret” information is available for all.

Do not forget the semi-comma at the end of the line. It won’t work otherwise. IBM… 🙂

HP EVA SSSU and fixed LUN WWID

Monday, July 14th, 2008

Linux works perfectly well with multiple storage links using dm-multipath. Not only that, but HP has released their own spawn of dm-multipath, which is optimized (or so claimed, but, anyhow, well configured) to work with EVA and MSA storage devices.

This is great, however, what do you do when mapping volume snapshots through dm-multipath? For each new snapshot, you enjoy a new WWID, which will remap to a new “mpath” name, or raw wwid (if “user_friendly_name” is not set). This can, and will set chaos to remote scripts. On each reboot, the new and shiny snapshot will aquire a new name, thus making scripting a hellish experience.

For the time being I have not tested ext3 labels. I suspect that using labels will fail, as the dm-multipath over layer device does not hide the under layered sd devices, and thus – the system might detect the same label more than once – once for each under layered device, and once for the dm-multipath over layer.

A solution which is both elegant and useful is to fixate the snapshots’ WWID through a small alteration to SSSU command. Append a string such as this to the snap create command:

WORLD_WIDE_LUN_Name="6300-0000-0000-0000-0010-0000"

Don’t use the numbers supplied here. “invent” your own 🙂

Mind you that you must use dashes, else the command will fail.

Doing so will allow you to always use the same WWID for the snapshots, and thus – save tons of hassle after system reboot when accessing snapshots through dm-multipath.

Fabric Mess, or how to do things right

Tuesday, May 29th, 2007

When a company is about to relocate to a new floor or a new building, that is the time when the little piles of dirt swept under the rug come back to hunt you.

In several companies I have been to, I have stressed the need of an ordered environment. This is valid to networking and hardware serial numbers as well as it is valid for FC hardware, but when it comes to FC, things always get somewhat more complicated, and when the fit hit the shan, the time you require for tracking down a single faulty cable, or a link led turned off is a time you need for other things.

I can sum it up to a single sentence – Keep you SAN tidy.

Unless you have planned your entire future SAN deployment ahead (and this can be planning ahead for years and years), your SAN environment will grow up. Unlike network, where short cable disconnections have only small influence on the overall status of a server (and this allows you to tidy up network cables on the fly after rush hours when traffic volume is low – without downtime), tidying up FC cables is a matter for a planned downtime, and let me see the high-level manager who would approve 30 minutes downtime (at least) with some risk (as there is always when changing cables) for "tidying up"…

So, your SAN looks like this, and this is the case, and this can be considered quite good:

Cables length is always an issue

You can track down cables, but it requires time, and time is an issue when there is a problem, or when changes are to take place. Wait – which server is connected to switch1 port 12? Donno?

The magic is, like most magic trick, non magical at all. Keep track of every cable, every path, every detail. You will not be sorry.

I have found out that the spreadsheet with the following columns would do the work for me, and I’ve been to some quire large SAN sites:

1. Switch name and port

2. Server Name (if server has multiple FC ports, add 0, 1, etc. Select a fixed convention for directions, for example – 0 is the leftmost, when you look directly at the back of the server). Same goes for storage devices.

3. SAN Zone of VSAN, if valid.

4. Patch panel port. If you go through several patch panels, write down all of them, one after the other.

5. Server’s port PWWN

On another spreadsheet I have the following information:

1. Server name

2. Storage ports accessible to it (using the same convention as mentioned above)

3. LUN ID on the storage described above.

If you keep these two spreadsheets up-to-date, you will be able to find your hands and legs anytime, anywhere in your own SAN. Maintaining the spreadsheet is the actual wizardry in all this.

Additional tip – If your company relocates to another building, and your SAN is pretty much fixed and known, hiring a person to manufacture by-the-length FC cables per device can be one of the greatest things you could do. If every cable has its own exact length, your SAN environment would look much better. This is a tip for the lucky ones. Most of us are not luck enough to either affort such a person, or to relocate with a never-changing SAN environment.