Posts Tagged ‘hot resize’

Extend /boot from within a Linux system

Saturday, March 27th, 2021

This is a tricky one. In order to resize /boot, which is, commonly the first partition, you need to push forward the beginning of the next partition. This is not an easy task, especially if you are not using LVM – then you have to use external partitioning modification tools, like PQMagic (if it still exists, who knows?), or other such offline tools.

However, if you are using LVM, there is a (complex) trick to it. We need to evict some of the first few PEs, resize the partition to begin at a new location, and then re-sign (and restore) the LVM meta-data in a way which will reflect the relative change in data blocks position (aka – the new PEs). To have some additional grasp of LVM and its meta-data, I recommend you read my article here.

Also, and this is an important note – you cannot change an open (in-use) partition on systems prior to RHEL 8 (on which I have not tested my solution just yet) – meaning – you can change the partition layout on the disk, but the kernel will not refresh that information and would not act accordingly until reboot.

If you have not tried this before, or not sure about all the details in this post, I urge you to use a VM for testing purposes. A failure in this process might leave your data inaccessible, and you do not want that.

So, we have a complex set of tasks:

  • If there is some empty space somewhere on the LVM PV, migrate the first X blocks out.
  • Export the LVM meta-data so we could edit it afterwards
  • Recreate the partition (delete and recreate) with a new starting location
  • (here comes the tricky part) – sign the partition’s updated beginning with LVM meta-data, with the updated relative block locations.

Assumptions:

  • The disk partition layout is /boot as /dev/sda1 and LVM PV on /dev/sda2
  • The LVM VG name is ‘VG’
  • We are using modern dracut-capable system, such as RHEL/CentOS version 6 and above (not tested on version 8 yet)
  • We use basic (msdos) partition layout and not GPT

Clear 500MB for further use, if not enough free space in PV:

In order to do so, we will need 500MB of free space in our PV. If space is an issue, you can easily clean up space from the swap space, by stopping swap, reducing the LV size, signing swap with ‘mkswap’ and starting swap again. This is in a nutshell, and I will not go further into it.

Move the first 500MB out of the beginning of the PV:

We need to do some math. The size of a single PE is defined in the LVM VG settings. By default it is 4MB today, and it can be checked using ‘vgdisplay’ command. Look for the field ‘PE Size’. So – 500MB is 125 PEs. So our command would be:

pvmove --alloc anywhere /dev/sda2:0-124

Which will migrate the first 125 PEs starting at position 0 to 124 away somewhere in the VG.

Export LVM meta-data to a file, and edit it for future handling:

vgcfgbackup -f /tmp/vg-orig.txt VG 

This command will create a file called /tmp/vg-orig.txt which will contain the original VG meta-data copy. We will clone this file and edit it:

cp /tmp/vg-orig.txt /tmp/vg.txt

Now comes the more complex part. We need to adjust the meta-data file to reflect the relative change in the block location. We will edit the new /tmp/vg.txt file. Initially – find the block describing ‘pv0’, which is the first PV in your VG (and maybe the only one), and verify that ‘pv0’ is the correct device, by verifying the ‘device’ directive in this block.
Now comes the harder part – Each LV block in the meta-data file has a sub-section describing disk segments. These blocks describe the relative location of the LV in the PVs. I have already pointed at my article describing the meta-data file and how to read it. The task is to find the ‘stripes’ directive in each LV sub-segment, and reduce the amount of PEs – in our case – 125. It needs to be done for all LVs which reside on our ‘pv0’ – one after the other. An example would look like this:

lvswap { ### Another LV
			id = "E3Ei62-j0h6-cGu5-w9OB-l9tU-0Qf5-f09bvh"
			status = ["READ", "WRITE", "VISIBLE"]
			flags = []
			creation_host = "localhost.localdomain"
			creation_time = 1594157749	# 2019-01-01 08:42:29 +0000
			segment_count = 1

			segment1 {
				start_extent = 0  ### Tee LE of the LV. On LEs - later ###
				extent_count = 94	# 2.9375 Gigabytes

				type = "striped"
				stripe_count = 1	# linear

				stripes = [
					# Was: "pv0", 1813. Now:
					"pv0", 1688 ### reduced 125 PEs ###
				]
			}
		}

Copy the resulting file /tmp/vg.txt (after double-checking it!) to /boot. We will use /boot later on to re-sign the PV meta-data.

Recreate the partition:

Another tricky part. You cannot just resize a partition, or at least – without the tool (parted, fdisk – depending on your OS version) attempting to resize the over-layer, and failing to do so. Most tools do not allow changes to the size of the partitions at all, so we will need to delete and recreate the partition layout. Now, depending if you are using GPT or msdos partition, your tools might vary, but in this case, I handle only msdos partition layout, so the tools will be in accordance. Other tools can apply for GPT layout, and the process, in general, will work on GPT as well.

So – we will backup the partition layout before we change it. The command ‘sfdisk’ will allow us to do so, so we can call

sfdisk -d /dev/sda > /boot/original-disk-layout.txt

I am leaving quite a lot of stuff on /boot partition, because this partition is not a member of the LVM volume group, and will remain, mostly, unaffected during our process. You can use an external USB disk, or any other non-LVM partition, as long as you verify you can access it from within the boot process, directly from initrd/initramfs, or dracut. /boot is, commonly, accessible from within the boot process.

Now we modify the partition layout. To do so, I recommend to document the original start point of the two interesting partitions – /boot (usually /dev/sda1) and our PV (in this example: /dev/sda2). I prefer using ‘sector’ directives. An example would be:

parted -s /dev/sda "unit s p"

It is common, for modern Linux systems, to have /boot starting at sector 2048 (which is 1MB into the disk). This is due to block alignment, however, I will not discuss this here. The interesting part is the size of a sector (commonly 512b, but can be 4K for ‘advanced format’ disks), so we will be able to calculate the new partitions starting positions and sizes.

Now, using ‘parted’ we need to remove the 2nd partition (in my example, note. It might vary on your setup) and recreate it at a newer location – 125PEs further, or 500MB further, or 1024000 sectors ahead. So, if our starting sector is 411648(s), then we will have to create the partition starting at sector 1435648 (=411648+1024000), with the original ending location. Don’t forget to set this partition to LVM. Assuming you have saved the starting point of the partition in the variable StartOfPart, and the original ending in EndOfPart, your command would look like this:

parted -s /dev/sda "unit s rm 2 mkpart primary $(( StartOfPart + 1024000 )) ${EndOfPart} set 2 lvm on"

Now, we need to recreate the /boot partition (partition #1 in my example) to include the new size. Again – we need to document its beginning, and now recreate it. Assuming we have kept the same variables as before, the command would look like this:

parted -s /dev/sda "unit s rm 1 mkpart primary ${StartOfPart} $(( EndOfPart + 1024000 - 1 )) set 1 boot on"

The kernel will not update the new partitions sizes because they are in use. We will need a reboot, however – when we reboot (do not do that just yet), we will no longer have access to our LVM. This is because it will not have meta-data anymore, and we will need to recreate it.

Prepare a script to place in /boot, called vgrecover.sh which will hold the following lines:

#!/bin/sh
sed -i 's/locking_type = 4/locking_type = 0/g' /etc/lvm/lvm.conf
lvm pvcreate -u ${PVID} --restorefile /mnt/vg.txt /dev/sda2
lvm vgcfgrestore -f /mnt/vg.txt VG

You need to save the PVID for /dev/sda2 and replace this value in this script. This is the field ‘PV UUID’ in the output of the command:

pvdisplay /dev/sda2

Some more explanations: The device in our example is /dev/sda2 (change it to match your device name), and the VG name is ‘VG’ (again – change to match your setup). This script needs to be placed on /boot and be made executable.

Before our reboot:

We need to verify the following files exist on our /boot:

  • vg.txt
  • vgrecover.sh
  • original-disk-layout.txt

If any of these files is missing, you will not be able to boot, you will not be able to recover your system, and you will not be able to access the data there ever again!

I also recommend you keep your original-disk-layout.txt file somewhere external. If you have made a partitioning mistake and changed the beginning of /boot, you will not have access to /boot and all its files, and having this file elsewhere (on external disk, for example) will help you recover the partition layout quickly and with no frustration.

Now comes another risky part: reboot and get into recovery shell used by GRUB. See my article here to understand how to enter recovery shell. If you have a different OS version, your boot arguments might differ. An external boot media (like RHEL/CentOS recovery boot, or Ubuntu live) could also suffice to complete the task, but it is preferred to use the GRUB recovery console to reduce the change of some unknown automatic task or detection process doing stuff for you.

We need to break the boot sequence in the pre-mount phase. We will have a minimal shell on which we need to run the following commands:

mkdir /mnt
mount /dev/sda1 /mnt
/mnt/vgrecover.sh

We are mounting /dev/sda1 (our /boot) on /mnt, which we have just created. Then we call the vgrecover.sh script we have created before. It will use LVM recovery commands to re-sign the PV on /dev/sda2, and then recover the VG meta-data using our modified meta-data file, describing a new relative positions of LVs.

When done, assuming no problems happened there, just umount /mnt and reboot. The system should boot up successfully, however, /boot will not have the designated size just yet.

Extending /boot :

The partition /dev/sda1 is of the updated size now, however, the filesystem is not. You can verify that using ‘fdisk -l /dev/sda’ of ‘parted -s /dev/sda unit s p’ or any other command. If this is not the case, then check your process.

Extending the filesystem depends on the type of filesystem. You can run ‘df -hPT /boot’ to identify the filesystem type. If it is XFS, use the command:

xfs_growfs /boot

If the filesystem is of type ext3 or ext4, use

resize2fs /dev/sda1

Other filesystems will require different tools, and since I cannot cover it all, I leave it to you. This is an online process, and as soon as it is over, the new size will show in the ‘df’ command.

Recovery:

If, for some reason, the disk partitioning or PV re-signing failed, and the system cannot boot, you can use the original-disk-layout.txt file in /boot to recover the original disk layout. Boot into GRUB rescue mode as shown above, and run:

mkdir /mnt
mount /dev/sda1 /mnt
sfdisk -f /dev/sda < /mnt/original-disk-layout.txt

If your /boot is inaccessible, and the file original-disk-layout.txt was kept on an external storage, you can use a live Ubuntu, or any other live system to run the ‘sfdisk’ command as shown above to recover /dev/sda original partitioning layout.

Bottom line:

This is a possible, although complex, task, and you should practice it on a VM, with disk snapshots before you attempt to kill production servers. Leave me a comment if it worked, or if there is anything I need to add or correct in this post. Thanks, and good luck!

targetcli extend fileio backend

Friday, April 3rd, 2020

I am working on an article which will describe the procedures required to extend LUN on Linux storage clients, with and without use of multipath (device-mapper-multipath) and with and without partitioning (I tend to partition storage disks, even when this is not exactly required). Also – it will deal with migration from MBR to GPT partition layout, as part of this process.

During my lab experiments, I have created a dedicated Linux storage machine for this purpose. This is not my first, of course, and not likely my last either, however, one of the challenges I’ve had to confront was how to extend or resize in general an iSCSI LUN from the storage point of view. This is not as straight-forward as one might have expected.

My initial setup:

  • Centos 7 or later is used.
  • Using targetcli command-line (meaning – using LIO mechanism).
  • I am using ZFS for the purpose of easily allocating block devices and files on filesystems. This is not a must – LVM can do just right.
  • targetcli is using automatic saveconfig (default configuration).

I will not go over the whole process of setting up and running iSCSI target server. You can find this in so many guides around the web, such as this and that, as well as so many more. So, skipping that – we have a Linux providing three LUNs to another Linux over iSCSI. Currently – using a single network link.

Now comes the interesting part – if I want to expand/resize my LUN on the storage, there are several branches of possibilities.

Assuming we are using the ‘block’ backstore – there is nothing complicated about it – just extend the logical volume, or the ZFS volume, and you’re done with that. Here is an example:

LVM:

lvextend -L +1G /dev/storageVG/lun1

ZFS:

zfs set volsize=11G storage/lun1 # volsize should be the final size

Extremely simple. Starting at this point, LIO will know of the updated sizes, and will just notify any relevant party. The clients, of course, will need to rescan the iSCSI storage, and adept according to the methods in use (see my comment at the beginning of this post about my project).

It is as simple as that if using ‘fileio’ backstore with a block device. Although this is not the best recommended setup, it allows for (default) more aggressive write-back cache, and might reduce disk load. If this is how your backstore is defined (fileio + block device) – same procedure applies as before – extend the block device, and everyone is notified about it.

It becomes harder when using a real file as the ‘fileio’ backstore. By default, fileio will create a new file when defined, or use an existing one. It will use thin provisioning by default, which means it will not have the exact knowledge of the file’s size. Extending or shrinking the file, except for the possibility of data corruption, would have no impact.

Documentation about how to do is is non-existing. I have investigated it, and came to the following conclusion:

  • It is a dangerous procedure, so do it at your own risk!
  • It will result in a short IO failure because we will need to restart the service target.service

This is how it goes. Follow this short list and you shall win:

  • Calculate the desired size in bytes.
  • Copy to a backup the file /etc/target/saveconfig.json
  • Edit the file, and identify the desired LUN – you can identify the file name/path
  • Change the size from the specified size to the desired size
  • Restart the target.service service

During the service restart all IO would fail, and client applications might get IO errors. It should be faster than the default iSCSI retransmission timeout, but this is not guaranteed. If using multipath (especially with queue_if_no_path flag) the likeness of this to affect your iSCSI clients is nearly zero. Make sure you test this on a non-production environment first, of course.

Hope it helps.

Reduce Oracle ASM disk size

Tuesday, January 21st, 2020

I have had a system with Oracle ASM diskgroup from which I needed some space. The idea was to release some disk space, reduce the size of existing partitions, add a new partition and use it – all online(!)

This is a grocery list of tasks to do, with some explanation integrated into it.

Overview

  • Reduce ASM diskgroup size
  • Drop a disk from ASM diskgroup
  • Repartition disk
  • Delete and recreate ASM label on disk
  • Add disk to ASM diskgroup

Assuming the diskgroup is constructed of five disks, called SSDDATA01 to SSDDATA05. All my examples will be based on that. The current size is 400GB and I will reduce it to somewhat below the target size of the partition, lets say – to 356000M

Reduce ASM diskgroup size

alter diskgroup SSDDATA resize all size 365000M;

This command will reduce the expected disks to below the target partition size. This will allow us to reduce its size

Drop a disk from the ASM diskgroup

alter diskgroup SSDDATA drop disk 'SSDDATA01';

We will need to know which physical disk this is. You can look at the device major/minor in /dev/oracleasm/disks and compare it with /dev/sd* or with /dev/mapper/*

It is imperative that the correct disk is marked (or else…) and that the disk will become currently unused.

Reduce partition size

As the root user, you should run something like this (assuming a single partition on the disk):

parted -a optimal /dev/sdX
(parted) rm 1
(parted) mkpart primary 1 375GB
(parted) mkpart primary 375GB -1
(parted) quit

This will remove the first (and only) partition – but not its data – and recreate it with the same beginning but smaller in size. The remaining space will be used for an additional partition.

We will need to refresh the disk layout on all cluster nodes. Run on all nodes:

partprobe /dev/sdX

Remove and recreate ASM disk label

As root, run the following command on a single node:

oracleasm deletedisk SSDDATA01
oracleasm createdisk SSDDATA01 /dev/sdX1

You might want to run on all other nodes the following command, as root:

oracleasm scandisks

Add a disk to ASM diskgroup

Because the disk is of different size, adding it without specific size argument would not be possible. This is the correct command to perform this task:

alter diskgroup SSDDATA add disk 'ORCL:SSDDATA01' size 356000M;

To save time, however, we can add this disk and drop another at the same time:

alter diskgroup SSDDATA add disk 'ORCL:SSDDATA01' size 356000M drop disk 'SSDDATA02';

In general – if you have enough space on your ASM diskgroup – it is recommended to add/drop multiple disks in a single command. It will allow for a faster operation, with less repeating data migration. Save time – save efforts.

The last disk will be re-added, without dropping any other disk, of course.

I hope it helps 🙂

Hot resize Multipath Disk – Linux

Friday, August 19th, 2011

This post is for the users of the great dm-multipath system in Linux, who encounter a major availability problem when attempting a resize of mpath devices (and their partitions), and find themselves scheduling a reboot.

This documented is based on a document created by IBM called "Hot Resize Multipath Storage Volume on Linux with SVC", and its contents are good for any other storage. However - it does not cover the procedure required in case of a partition on the mpath device (for example - mpath1p1 device).

I will demonstrate with only two paths, but, with understanding this process, it can be well used for any amount of paths for a device.

I do not explain how to reduce a LUN size, but the apt viewer will be able to generate a method out of this document. I, for myself, try to avoid as much as I can from shrinking LUNs. I prefer backup, LUN recreation, and then restore. In many case - it's just faster.

So - back to our topic - first - increase the size of your LUN on the storage.

Now, you need to collect the paths used for your mpath device. Check this example:

mpath1 (360a980005033644b424a6276516c4251) dm-2 NETAPP,LUN
[size=200G][features=1 queue_if_no_path][hwhandler=0][rw]
_ round-robin 0 [prio=4][active]
_ 2:0:0:0 sdc 8:32  [active][ready]
_ round-robin 0 [prio=1][enabled]
_ 1:0:0:0 sdb 8:16  [active][ready]

The devices marked in bold are the ones we will need to change. Lets get their current size:

blockdev --getsz /dev/sdb
419430400

Keep this number somewhere safe. We can (and should!) assume that sdc has the same values, otherwise, this is not the same exact path.

Collect this info for the partition as well. It will be smaller by a tiny bit:

blockdev --getsz /dev/sdb1
419424892

Keep this number as well.

Now we need to reread the current (storage-based) size parameters of the devices. We will run

blockdev --rereadpt /dev/sdb
blockdev --rereadpt /dev/sdc

Now, our size will be slightly different:

blockdev --getsz /dev/sdb
734003200

Of course, the partition size will not change. We will deal with it later. Keep the updated values as well. Of course, the multipath still holds the disks with their original size values, so running 'multipath -ll' will not reveal any size change. Not yet.

We now need to create editable dmsetup map. Use the current command to create two files: cur and org containing this map:

dmsetup table mpath1 | tee org cur
0 419424892 multipath 1 queue_if_no_path 0 2 1 round-robin 0 1 1 8:32 128 round-robin 0 1 1 8:16 128

Important part - explaining some of these values. The map shows the device's size in blocks - 419424892. It shows some parameters, it shows path groups info (0 2 1), and both sub devices - sdc being 8:32 and sdb being 8:16. Try it with 'ls -la /dev/sdb' to see the minor and major. At this point, if you are not familiar with majors and minors, I would recommend you do some reading about it. Not mandatory, but will make your life here safer.

We need to delete one of the paths, so we can refresh it. I have decided to remove sdb first:

multipathd -k"del path sdb"

Now, running the multipath command, we will get:

mpath1 (360a980005033644b424a6276516c4251) dm-2 NETAPP,LUN
[size=200G][features=1 queue_if_no_path][hwhandler=0][rw]
_ round-robin 0 [prio=4][active]
_ 2:0:0:0 sdc 8:32  [active][ready]

Only one path. Good. We will need to edit the 'cur' file created earlier to reflect the new settings we are to introduce:

0 419424892 multipath 1 queue_if_no_path 0 1 1 round-robin 0 1 1 8:32 128

The only group left was the one containing 'sdc' (8:32), and since one group down, the bold number was changed from 2 to 1 (as there is only a single path group now!)

We need to reload multipath with these settings:

dmsetup suspend mpath1; dmsetup reload mpath1 cur; dmsetup resume mpath1

The correct response for this line is 'ok'. We pause mpath1, reload and then resume it. It is best to be in a single line, as this process freezes IO for a short period of time on the device, and we prefer it to be as short as possible.

Now, as /dev/sdb is not a part of the multipath managed devices, we can modify it. I usually use 'fdisk' - deleting the old partition, and recreating it in the new size, but you must make sure, if your device requires LUN alignment, that you recreated the partition from the same start point. I will dedicate a post some time to LUN alignment, but not at this particular time. Just a hint - if you're not sure, run fdisk in expert mode and get a printout of your partition table (fdisk /dev/sdb and then x and then p). If your partition starts at 128 or 64, it is aligned. If not (usually for large LUNs - at 63), you are not, and you should either be worried about it, but not now, or should not care at all.

Back to our task.

We need to grab the size of the newly created partition, for later use. Write it down somewhere.

blockdev --getsz /dev/sdb1
733993657

Following the partition recreation, we need to introduce the device to the multipath daemon. We do this by:

multipathd -k"add path sdb"

followed by immediately removing the remaining device:

multipathd -k"del path sdc"

We need to have our 'cur' file updated, so we can release the device to our uses. This time, we update both the size section with the new size, and the new, remaining path. Now, the file looks like this:

0 734003200 multipath 1 queue_if_no_path 0 1 1 round-robin 0 1 1 8:16 128

As mentioned before - the large number in bold is the new size of the block device. The amount of failure groups is one (1), also in bold, and the device name is 'sdb' which is 8:16. Save this modified file, and run:

dmsetup suspend mpath1; dmsetup reload mpath1 cur; dmsetup resume mpath1

Running the command 'multipath -ll' you will get the real size of the device.

mpath1 (360a980005033644b424a6276516c4251) dm-2 NETAPP,LUN
[size=350G][features=1 queue_if_no_path][hwhandler=0][rw]
_ round-robin 0 [prio=1][active]
_ 1:0:0:0 sdb 8:16  [active][ready]

We will need to reread the partition layout of /dev/sdc. The quickest way is by running:

partprobe

This should do it. We can now add it back in:

multipathd -k"add path sdc"

and then run

multipath

(which should result in all the available paths, and the correct size).

Our last task is to update the partition size. The partition, normally, is called mpath1p1, so we need to read its parameters. Lets keep it in a file:

dmsetup table mpath1p1 | tee partorg partcur

We should now edit the newly created file 'partcur' with the new size. You should not change anything else. Originally, it looked like this:

0 419424892 linear 253:2 128

and it was modified to look like this:

0 733993657 linear 253:2 128

Notice that the size (in bold) is the one obtained from /dev/sdb1 (!!!) and not /dev/sdb.

We need to reload the device. Again - attempt to do it in a single line, or else you will freeze IO for a long time (which might cause things to crush):

dmsetup suspend mpath1p1; dmsetup reload mpath1p1 partcur; dmsetup resume mpath1p1

Do not mistaked mpath1 with mpath1p1.

Our device is updated, our paths are online. We are happy. All left to do is to online resize the file system. With ext3, this is done like this:

resize2fs /dev/mapper/mpath1p1

The mount will increase in size online, and all left for us is to wait for it to complete, and then go home.

I hope this helps. It helped me.