LinuxQuestions.org
Visit Jeremy's Blog.
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Distributions > Slackware
User Name
Password
Slackware This Forum is for the discussion of Slackware Linux.

Notices


Reply
  Search this Thread
Old 03-18-2015, 03:20 AM   #1
lazardo
Member
 
Registered: Feb 2010
Location: SD Bay Area
Posts: 196

Rep: Reputation: Disabled
raid1 recovery w bitmap + PAR2


Code:
recovery = 49.9% (724112960/1448225792) finish=1.0min speed=11321520K/sec
The above is from an ancient Shuttle SN21G5 (with the problematic nvidia MCP51 chipset) recovering from an as yet unidentified raid member drop out. The system was born a roll-your-own media server nearly 10 years ago and contains 872GB of ripped DVD quality video, audio CD transcodes and a large image gallery.

And yes, mdadm is showing an effective recovery rate of 11.3GB/s thanks to the write-intent internal bitmap. The whole process of recovering the 1.4TB took about 100 seconds on 64bit 14.0, 3.2.45 kernel, RAID10f2.

After many hours and much angst looking at ZoL/btrfs current buglists I could not pull the trigger on conversion and so stuck with ext4, added the bitmaps and went with PAR2 for bitrot protection.
  • At 2% coverage, the tradeoff is about 2% disk space, eg, American_Beauty_1999.MP4 is 1967MB, the par2 files 41MB. And at 11GB/s recovery rate I could care less if bitmapped writes are a bit slower.
  • The source VOBs are on a separate machine also with 2% PAR2 recovery files. Note that VOBs compress by about 2.5%, so good protection at net zero disk utilization (pigz and a few cores helps too).

Here's the quick script used to create checksums. Run a small test and see if it meets your needs, or even works. It has not been cleaned up.

I used the parallel par2 http://slackbuilds.org/repository/14...r2cmdline-tbb/ which is why the rudimentary simultaneous job management.

Cheers,

[Update] Root cause was a marginal SATA cable, old logs show over a year of sporadic SATA resets. The above recovery was a simple hot-add of the same partition back in to the array. A subsequent scrub took 7.07 hours on an otherwise unused system, so the bitmap in this case effectively reduced rebuild by a factor of 250 (100 vs 25.5K seconds).
Code:
#!/bin/bash

PARLOC=/tmp/PAR2		# par2 file repository
MAXJOBS=3			# more disk I/O than horsepower
PERCENT=2			# par2 redundancy, 2% s/b bitrot safe

################################################################

MAXMEM=2048			# max per instance par2 mem
GAP=0.2				# seconds sleep between starts

NUMCPU=$( awk '/^processor/{ ++NUMCPU }; END{print NUMCPU}' /proc/cpuinfo )
RAWMEM=$( awk '/^MemTotal:/{ print int($2/1000); exit }' /proc/meminfo )
RAWMEM=$(( RAWMEM / MAXJOBS / NUMCPU ))

for ((i=2; i<=$MAXMEM; i*=2)); do
	if [ $((RAWMEM % i)) -eq $RAWMEM ]; then
		PAR2MEM=$(( i / 2 ))
		break
	fi
done 
PAR2MEM=${PAR2MEM:-$MAXMEM}

################################################################

case "$1" in
	-h) echo "use: crc [-fp] file [ file ... ] - creates par2 recovery set"; exit
		;;
	-f|-force) FORCE="1"; shift
		;;
	-p) shift; PARLOC="$1"; shift
		;;
	*)
esac

FORCE=${FORCE:-0}

[ "$PARLOC" == "${PARLOC%%PAR2*}" ] && PARLOC=$PARLOC/PAR2
[ ! -d $PARLOC ] && mkdir -pv $PARLOC

echo "start `date`"
for i in "$@"; do
	(( CNT++ ))
	j=$( basename $i )

	while [ 1 ]; do
		sleep $GAP
		JOBS=( $PARLOC/par2.* )
		# NUMJOBS is never less than 1
		if [ ${#JOBS[*]} -lt $MAXJOBS ]; then
			mkdir $PARLOC/par2.$j
			break
		fi
	done

	[ $(( CNT % 5 )) -eq 0 ] && echo ""
	echo -n "$i,"

	(
	ionice -c 3 nice par2 create -r$PERCENT -m$PAR2MEM $PARLOC/$j.par2 $i > /dev/null 2>&1

	if [ $? -ne 0 ]; then
		logger -st $0 "par2 create failed for $i"
	else
		md5sum $i $PARLOC/$j*.par2 > $PARLOC/$j.md5
		MD5=$( head -1 $PARLOC/$j.md5 )

		cd $PARLOC
		tar cpf - $j.*{md5,par2} > $j.${MD5:0:5}.par2.tar && rm $j.*{md5,par2}
	fi

	rmdir par2.$j
	) &
done

echo -e "\nfinish `date`"

Last edited by lazardo; 03-20-2015 at 01:31 AM. Reason: update
 
Old 04-12-2015, 05:47 AM   #2
unSpawn
Moderator
 
Registered: May 2001
Posts: 29,393
Blog Entries: 55

Rep: Reputation: 3565Reputation: 3565Reputation: 3565Reputation: 3565Reputation: 3565Reputation: 3565Reputation: 3565Reputation: 3565Reputation: 3565Reputation: 3565Reputation: 3565
Thanks for posting your update, off of the 0-reply list now.


//NTLB
 
Old 02-15-2016, 08:45 PM   #3
lazardo
Member
 
Registered: Feb 2010
Location: SD Bay Area
Posts: 196

Original Poster
Rep: Reputation: Disabled
Quote:
Originally Posted by lazardo View Post
...
After many hours and much angst looking at ZoL/btrfs current buglists I could not pull the trigger on conversion and so stuck with ext4, added the bitmaps and went with PAR2 for bitrot protection.
End of scene:

After full recovery I went ahead w btrfs on one disk and duplicated files from ext4. After almost a year, 2 kernel and 1 btrfs-progs updates, there were zero failure/corruption or other incidents on either disk, however it was write performance that finally triggered replacing btrfs with ext4+PAR2:

Streaming writes from ext4 -> btrfs averaged just under 49MB/s while ext4 -> ext4 averaged 111MB/s even during ext4 lazy initialization.

Cheers,
 
2 members found this post helpful.
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
Is there any way to do a partial par2 recovery? The Lightning Stalker Linux - General 1 09-15-2013 09:24 PM
[SOLVED] RAID1 recovery complications MultiSith Linux - Hardware 5 06-17-2013 01:37 PM
RAID1 array rebuild fails at 99.9% recovery apomatix Linux - Hardware 3 06-06-2008 07:30 AM
raid1 - recovery after crash proNick Linux - Newbie 8 01-21-2008 05:37 PM
Software RAID1 Recovery Issues (SARGE) jimbo1954 Linux - Server 2 04-24-2007 03:47 AM

LinuxQuestions.org > Forums > Linux Forums > Linux - Distributions > Slackware

All times are GMT -5. The time now is 09:58 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration