Splitting archive and combining later on the fly

Byetzion 18/07/2007

Many of us use tar (many times with gzip or bzip2) for archiving purposes. When performing such an action, a large file, usually, too large, remains. To extract from it, or to split it becomes an effort.

This post will show an example of a small script to split an archive and later on, to directly extract the data out of the slices.

Let’s assume we have a directory called ./Data . To archive it using tar+gzip, we can perform the following action:

tar czf /tmp/Data.tar.gz Data

For verbose display (although it’s could slow down things a bit), add the flag ‘v’.

Now we have a file called /tmp/Data.tar.gz

Lets split it to slices sized 10 MB each:

cd /tmp
mkdir slices
i=1 # Our counter
skip=0 # This is the offset. Will be used later
chunk=10 # Slice size in MB
let size=$chunk * 1024 # And in kbytes
file=Data.tar.gz # Name of the tar.gz file we slice
while true ; do
# Deal with numbers lower than 10
if [ $i -lt “10” ]; then
j=0${i}
else
j=${i}
fi
dd if=${fie} of=slices/${file}.slice${j} bs=1M count=${chunk} skip=${skip}
# Just to view the files with out own eyes
ls -s slices/${file}.slice${j}
if [ `ls -s slices/${file}.slice${j} | awk ‘{print $1}’` -lt “${size}” ]; then
echo “Done”
break
fi
let i=$i+1
let skip=$skip+$chunk
done

This will break the tar.gz file to a files with running numbers added to their names. It assumes that the number of slices would not exceed 99. You can extend the script to deal with three digits numbers. The sequence is important for later. Stay tuned 🙂

Ok, so we have a list of files with a numerical suffix, which, combined, include our data. We can test their combined integrity:

cd /tmp/slices
i=1
file=Data.tar.gz
for i in `ls`; do
cat ${file}.slice${i} >> ../Data1.tar.gz
done

This will allow us to compare /tmp/Data.tar.gz and /tmp/Data1.tar.gz. I tend to use md5sum for such tasks:

md5sum Data.tar.gz
d74ba284a454301d85149ec353e45bb7 Data.tar.gz
md5sum Data1.tar.gz
d74ba284a454301d85149ec353e45bb7 Data1.tar.gz

They are similar. Great. We can remove Data1.tar.gz. We don’t need it anymore.

To recover the contents of the slices, without actually wasting space by combining them before extracting their contents (which requires time, and disk space), we can run a script such as this:

cd /tmp/slices
file=Data.tar.gz
(for i in `ls ${file}.slice*`; do
cat $i
done ) | tar xzvf –

This will extract the contents of the joined archive to the current directory.

This is all for today. Happy moving of data 🙂

bash | Scripting/Programming

Bash – Variable indirection – Using variable contents as a(nother) variable name

Byetzion 20/03/2007

This was a tricky action. Assume I have a list of variables, obtained by an external source: var1=a var2=b var3=c I cannot use loop and in it the phrase ${var$i} (where i is the integer counter). It just doesn’t work. I used this instead to assign the values to an array: var[$i]=$(eval echo "${var${i}}") That…

Linux | Scripting/Programming

Rotate Beryl/Compiz cube from command line

Byetzion 19/02/2008

We are about to have a stand in a show in Israel. To pull some attention, I have searched for a method to automate a random rotation of the famous Beryl/Compiz cube. An extension of the method provided in here (using macros) is demonstrated below, using a script. This is a bit more complicated, as…

Nabaztag

More on the Nabaztag/tag

Byetzion 13/06/200710/12/2023

Actually, this post has become less of the non-technical type and more of the technical type, however, for the sake of the cute little Nabaztag (you can send me messages too! Go here and send a message to “fatutchi”!), I keep it still in this category as well. Today is a busy day, so I’ll…

Disk Storage

HP EVA SSSU and fixed LUN WWID

Byetzion 14/07/2008

Linux works perfectly well with multiple storage links using dm-multipath. Not only that, but HP has released their own spawn of dm-multipath, which is optimized (or so claimed, but, anyhow, well configured) to work with EVA and MSA storage devices. This is great, however, what do you do when mapping volume snapshots through dm-multipath? For…

Disk Storage | Linux

A note about Startup/Shutdown scripts – dbora

Byetzion 11/01/2007

Per the last post in this thread, I have created a startup script to an Oracle setup I’ve had. The script is rather simple – you “su –” to the Oracle user, and you just start the DB. Same goes for shutting it down. I have tested it and it worked well. Due to the…

Linux

Multihomed routing (split access load balancing) and OpenVPN

Byetzion 25/06/2006

We have one connection via ATM like interface and we have one PPP connection via xDSL (described here), and we want load balancing for this whole party. Following this specific part of lartc.org guide, we’ve managed to get this to work. The idea goes like this (Centos 4.3): 1. Do not state default route for…

2 Comments

PKHG says:

09/08/2008 at 10:27 am

The numbering of files gets *.$i

but later the . is MISSING!

Reply
admin says:

11/08/2008 at 12:21 am

True.
I have fixed it.

Thanks for commenting!

Ez

Reply

Related posts:

Similar Posts

2 Comments

Leave a Reply Cancel reply