Slackware for Raspberry Pi zero W?

leeeoooooo · 09-09-2019, 04:29 AM

Greetings!

I have a Raspberry Pi zero W that I would like to install Slackware onto. Debian just doesn't do it for me.

I was looking at SARPi but it doesn't support installation on the RPi zero W.

I would like to have the most complete and current release of Slackware.

I understand that as this has a Broadcom BCM2835 it has support for hard float; VFPv2, and is internally a 32-bit processor with 64-bit data lines.

What would you recommend?

glorsplitz · 09-09-2019, 07:41 PM

Maybe look here Slackware 14.1 on Raspberry Pi Zero
or here Slackware ARM on the Raspberry Pi 1

leeeoooooo · 09-11-2019, 05:40 AM

Thanks glorsplitz!

Those links look to be very helpful. I missed those when I searched the forums.

glorsplitz · 09-11-2019, 07:51 PM

post back if you get anywhere

gus3 · 09-12-2019, 09:39 PM

If you do get Slackware ARM 14.1 to work on your RPi Zero, you can get a small boost by following the instructions at https://mindplusplus.wordpress.com/2...-raspberry-pi/ (updated to reflect Slackware ARM 14.1 instead of -current). It will rebuild glibc using "-mfloat-abi=softfp"; this lets glibc use the VFP floating-point coprocessor, internally to glibc itself, without breaking the soft-float ABI that other programs and libraries are expecting.

abga · 09-13-2019, 03:59 PM

@leeeoooooo

If you follow the "Manual installation method" instructions from the link glorsplitz suggested:
https://docs.slackware.com/howtos:ha...rm:raspberrypi
you'll get the Slackware 14.2 ARM SoftFloat working on your Pi Zero W. I own a bunch of Pi Zeros (without WiFi) on which I installed and used Slackware 14.2 and can confirm that the "Manual installation method" works.
Note that in the output of fdisk -l you'll get different Start - End entries, adapt the rest of the instructions accordingly.
The opt/vc libs are useless (compiled for HardFloat) on your Pi Zero W and if you need them, you have to get the source files and compile them on your own. You can follow the section "___VC-USERLAND___START" from here:
https://www.linuxquestions.org/quest...on-4175612537/

If you use the latest Raspbian image to extract the kernel&firmware, your WiFi/Bluetooth chip should also work properly.
The CPU from your Pi Zero, as you correctly mentioned, does support hard float (VFP), but Slackware 14.2, for compatibility considerations, is only SoftFloat.
Slackware ARM -current is HardFloat, but compiled to support only armv7(and above) and useless for your armv6 Pi Zero W.

You can find some extra hints for your Slackware 14.2 ARM installation in this post:
https://www.linuxquestions.org/quest...9/#post5846280

I suggest to get the smaller Raspbian Buster Lite, you only need the kernel+modules+firmware+videocore(/opt/vc) libs:
https://www.raspberrypi.org/downloads/raspbian/

Grab the latest slack-14.2-miniroot*.xz from here:
ftp://ftp.slackware.uk/slackwarearm/...irootfs/roots/
- the root password is contained in the corresponding slack-14.2-miniroot_details.txt

You might want to add a swap partition (not included in the Slackware doc)

Code:

# use mkswap to format it & add it manually to /etc/fstab - change X to reflect the actual partition
/dev/mmcblk0pX   swap             swap        defaults         0   0

- I'd recommended to set the swappiness on 1 - add the following to your /etc/rc.d/rc.S

Code:

echo 1 > /proc/sys/vm/swappiness

From here you can download the Slackware ARM 14.2 packages:
ftp://ftp.slackware.uk/slackwarearm/...4.2/slackware/
- I'm usually installing Slackware ARM with the help of an USB flash drive, containing the Slackware packages tree, that I mount after I boot the miniroot and this is how I download the tree:

Code:

rsync --exclude '*/source/*' --delete -Pavv ftp.slackware.uk::slackwarearm/slackwarearm-14.2/ .

@gus3
According to my investigation&conclusions from here:
https://www.linuxquestions.org/quest...ml#post5753118
the softfp might create more overhead instead of performance improvements. Can you prove different?

gus3 · 09-13-2019, 08:15 PM

Quote:

Originally Posted by abga

According to my investigation&conclusions from here:
https://www.linuxquestions.org/quest...ml#post5753118
the softfp might create more overhead instead of performance improvements. Can you prove different?

Yes, I can prove different.

First, the kernel is completely unaffected by userspace's hard- or soft-float ABI. The kernel's only concern w.r.t. FP coprocessors is to save/restore the FPU state on task switch. The kernel does not use the FPU otherwise. If an ARM processor has no FPU, the kernel saves/restores nothing, but that's a decision made when the kernel is built, not at runtime.

As for userspace: Slackware ARM up to 14.1 was soft-float, putting all FP arguments on the program stack before calling a subroutine. The subroutine then extracted the FP arguments from the stack, did whatever calculations were necessary, then put a FP result (if any) onto the stack before returning. This is what happens when you pass "-mfloat-abi=soft" or "-mfloat-abi=softfp" as a gcc option. But that's where the similarity ends.

If the float-abi is "soft", gcc avoids all FPU co-processor instructions, generating calls to routines in glibc that emulate an FPU in pure ARM code instead. The infrastructure for all this is included in the glibc source code. So every FADDS, FADDD, FMULS, FMULP turns into "push, push, call, (fetch op1 from stack, fetch op2 from stack, emulate FPU op, store result in stack, return), pop result". That's a minimum of nine instructions per emulated FPU instruction, likely many more to carry out the actual emulation. A hypotenuse calculation gets fairly hairy, with two multiplications, an addition, and a square root.

If the float-abi is "softfp", glibc gets built with actual FPU instructions, obviating all that emulation infrastructure. FP values are still passed back and forth between routines via the stack, but internally to a routine, calculations are carried out without emulation. So instead of several thousand pure ARM instructions to calculate a hypotenuse, it's reduced to just a few: "push x, push y, call, (fetch op1, FMUL, fetch op2, FMUL, FADD, push sum-of-squares, call sqrt, (fetch square, ... Newton-Raphson algorithm here(*)... store root, return), fetch result, store as final result, return), pop hypotenuse". Note that FMUL and FADD correspond to the actual VFP/Neon co-processor instructions, not calls to emulation code.

It still conforms to the "soft" ABI, so it doesn't look any different to outside code. Internally, the "softfp" code uses far fewer instructions, and more silicon in parallel, to do the same work.

(*)Newton-Raphson, in an optimized form, will itself call an external log10() function a couple times per iteration... but remember, that routine also uses VFP/Neon directly, where possible! Otherwise, N-R uses multiplication, division, subtraction, and absolute values, all of which are supplied directly in VFP/Neon.

EDIT: I didn't have the ARM ARM available to look this up as I composed the above, but VFP and Neon do support square roots directly, both single- and double-precision. So there's no need to call a Newton-Raphson algorithm at all. This makes the hypotenuse function in softfp much more direct: fetch, FMUL, fetch, FMUL, FADD, FSQRT, store result, return. 7 simple steps, 10 ARM instructions. Even better than my sub-optimal concoction from last night.

abga · 09-14-2019, 12:03 AM

Quote:

Originally Posted by gus3

Yes, I can prove different.
...
If the float-abi is "softfp", glibc gets built with actual FPU instructions, obviating all that emulation infrastructure. FP values are still passed back and forth between routines via the stack, but internally to a routine, calculations are carried out without emulation. So instead of several thousand pure ARM instructions to calculate a hypotenuse, it's reduced to just a few: "push x, push y, call, (fetch op1, FMUL, fetch op2, FMUL, FADD, push sum-of-squares, call sqrt, (fetch square, ... Newton-Raphson algorithm here(*)... store root, return), fetch result, store as final result, return), pop hypotenuse". Note that FMUL and FADD correspond to the actual VFP/Neon co-processor instructions, not calls to emulation code.

Thank you for your very informative analysis. I really appreciate your effort & insights.
I wish we could have gone through this discussion in the appropriate thread:
https://www.linuxquestions.org/quest...987/page2.html
and not pollute this Pi Zero W installation

In my question, asking you for proof, I was expecting some empirical performance measurements because also in my question to you I referenced an older post where I came to the conclusion:

Quote:

softfp might create more overhead than simple soft and I'm still not sure why it was created in the first place

And it was based on this article (also available in the older post):
https://wiki.debian.org/ArmHardFloat...#A.22softfp.22
Stating:

Quote:

The caveat is that copying data from integer to floating point registers incurs a pipeline stall for each register passed (rN->fN) or a memory read for stack items. This has noticable performance implications in that a lot of time is spent in function prologue and epilogue copying data back and forth to FPU registers. This could be 20 cycles or more.

followed by some more details and explanations.

Should you choose to reply, let's move to the appropriate thread I mentioned in the beginning. Thanks again for your explanations.

gus3 · 09-14-2019, 01:51 PM

Point taken. Yes, I did some emperical testing on the math routines involved, and yes, trig and log functions ran visibly faster with "softfp" than with just "soft", without having to re-compile the test harness. If the speed-up had been negligible, or even questionable, I wouldn't have posted the article. (I no longer have a 2835-based RPi running Slackware, so I can't give you the speed figures, sorry.)

However, compared to the latest link you provide, that's frankly apples vs. oranges. Gcc's '-mfloat-abi=hard' uses a totally different ABI, one that isn't compatible with "soft" or "softfp". And yes, it is the fastest running FP library, short of hand-coding it in assembly. The "hard" ABI passes FP values directly in the VFP/Neon registers, instead of through the stack. This means no (or, at least *fewer*) memory access when calling a floating-point subroutine, and no memory access to return a bare FP value. That's also why it's incompatible with "soft" and "softfp".

abga · 09-14-2019, 05:15 PM

Thanks again for your inputs. The last link I provided, again:
https://wiki.debian.org/ArmHardFloat...#A.22softfp.22
contains the section "softfp" and that's what I was pointing at (also quoted). softfp is the subject of our discussion and no "apples vs. oranges".
I take your word for the performance improvements you measured with softfp, but I'm not really convinced. I fear that the the overhead created by "copying data from integer to floating point registers" could actually cancel out (or even worsening) the use of available HW VFP functions.
I do still own a bunch of 2835 Pi Zeros but I gradually moved them all to an armv6 HardFloat distro after failing to get a toolchain for recompiling the whole Slackware:
https://www.linuxquestions.org/quest...v6-4175612701/
I use these boards solely for Kodi (multimedia), not really needing a full distro for that purpose and just maintaining (compiling & updating) now the applications I need to expose the system to the Internet (ssl&curl&co). I only noticed a slight performance improvement in Kodi's OSD display - it's snappier - between Slackware SoftFloat & new distro HardFloat. No other subjective performance improvements noticed.

I've tried to reference and continue the discussion on these last 3 posts in the more appropriate slackware-arm-faq-soft-float-and-hard-float thread, but I wasn't able to, that thread looks locked. Weird!
https://www.linuxquestions.org/quest...987/page2.html

I'll stop here and apologize to leeeoooooo for the slightly off-topic discussiom.

gus3 · 09-14-2019, 08:03 PM

Um, I'm still not making it clear, apparently.

Slackware ARM 14.1, as built, didn't use the HW VFP, since it was built with '-mfloat-abi=soft'. The parameter passing during function call/return went through the stack already. So memory access penalty for the function arguments is there, just as much as with 'softfp'. Add to that the much larger code base to provide FP emulation, and you're looking at a massive slow-down on the machine level.

The overhead of copying from integer to FP register, is more than offset by the speedup of VFP/Neon. That "overhead" is a pittance, compared to FP emulation.

abga · 09-14-2019, 09:17 PM

Clear now!

leeeoooooo · 09-19-2019, 09:46 AM

Thanks so much for the helpful discussion.

I'm a -current guy so I was really hoping the hard-float kernel would work for me on this little thing.

I haven't started my attempt yet but I'm convinced that I now have the best instructions and explanation of the issues involved.

I love Slackware, on top of everything else, for the really great support!

Thanks again!!

gus3 · 09-19-2019, 08:10 PM

If I'm understanding the timeline correctly, the last Slackware ARM that will run on an original RPi is 14.2. After that, Stuart Winter started using the following GCC flags for -current:

Code:

-march=armv7-a -mfpu=vfpv3-d16 -mfloat-abi=hard

The first two flags result in code that won't run on the BCM2708/2835, used in the original RPi as well as the RPi Zero.

drmozes · 09-20-2019, 07:07 AM

Quote:

Originally Posted by gus3

If I'm understanding the timeline correctly, the last Slackware ARM that will run on an original RPi is 14.2. After that, Stuart Winter started using the following GCC flags for -current:

Code:

-march=armv7-a -mfpu=vfpv3-d16 -mfloat-abi=hard

The first two flags result in code that won't run on the BCM2708/2835, used in the original RPi as well as the RPi Zero.

That's right. I think that -current (to be 15.0) will only run on the RPi2 and upwards.