Opened 3 weeks ago

Closed 8 days ago

Last modified 7 days ago

#70859 closed defect (invalid)

gmp @6.3.0: tests fail when built with clang on Intel only, but pass when assembly is disabled. Forcing ld_classic appears to fix the issue.

Reported by: haberg-1 (Hans Åberg) Owned by: MarcusCalhoun-Lopez (Marcus Calhoun-Lopez)
Priority: Normal Milestone:
Component: ports Version:
Keywords: ventura Cc: eric-j-ason, cooljeanius (Eric Gallager), cjones051073 (Chris Jones), markmentovai (Mark Mentovai)
Port: gmp

Description (last modified by ryandesign (Ryan Carsten Schmidt))

GMP 'make check' fails when built with later versions of Clang, but passes when built with GCC, so that should be the build dependency. Tried on MacOS 14. See:
https://gmplib.org/list-archives/gmp-bugs/2024-June/005505.html
https://gmplib.org/list-archives/gmp-bugs/2024-July/005506.html

Attachments (1)

gmp-6.3.0_libtool-2.4.7.patch (151.2 KB) - added by markmentovai (Mark Mentovai) 9 days ago.

Download all attachments as: .zip

Change History (68)

comment:1 Changed 3 weeks ago by kencu (Ken)

If you download the current gmp software from here:

https://gmplib.org/download/gmp/gmp-6.3.0.tar.xz

decompress it and just build in on a current arm64 Mac system, outside of MacPorts, just with standard tools (Xcode and CLT installed, clang and standard tools used) you get a 100% pass:

========================================
   GNU MP 6.3.0: tests/test-suite.log
========================================

# TOTAL: 8
# PASS:  8
# SKIP:  0
# XFAIL: 0
# FAIL:  0
# XPASS: 0
# ERROR: 0

.. contents:: :depth: 2

I realize, looking at the above message and <https://gmplib.org/list-archives/gmp-bugs/2024-June/005494.html>, there is an alignment error with the x86_64 assembly code that now flags on MacOS as recent MacOS versions are intolerant of unaligned pointers. We have several MacPorts tickets about software trying to force unaligned pointers and Xcode clang / linker rejecting that.

The unaligned pointer issue seemed to be in one assembly file bdiv_q_1.asm. I noticed that file was forcing a low alignment of 8:

https://github.com/gmp-mirror/gmp/blob/14fe69d7f56e00917e9fd9ab616afc798a1af6c1/mpn/x86_64/bdiv_q_1.asm#L137

I wondered if that might be the problem. I haven't as yet tried to fix it though.

So it might be premature to say clang is broken here.

comment:2 Changed 3 weeks ago by kencu (Ken)

there is a newer version of gmp here, not yet released:

https://gmplib.org/download/snapshot/gmp-next/gmp-6.3.0-20240515185115.tar.zst

The arm64 build test summary looks good there too (I included the cxx tests in this test run):

============================================================================
Testsuite summary for GNU MP 6.3.0
============================================================================
# TOTAL: 22
# PASS:  22
# SKIP:  0
# XFAIL: 0
# FAIL:  0
# XPASS: 0
# ERROR: 0

I'll try the x64_64 run next.

comment:3 Changed 3 weeks ago by kencu (Ken)

you can force an arm64 Mac into Intel mode like this:

gmp-6.3.0 % arch -arch x86_64 zsh

and from then on, the system thinks it's running on an Intel processor:

% ./configure  --enable-cxx  && make -j 10        
checking build system type... westmere-apple-darwin24.0.0
checking host system type... westmere-apple-darwin24.0.0
checking for a BSD-compatible install... /usr/bin/install -c
...

as per the ticket, there are certainly plenty of errors with the x86_64 build. This was the last bit that printed:

============================================================================
Testsuite summary for GNU MP 6.3.0
============================================================================
# TOTAL: 53
# PASS:  20
# SKIP:  1
# XFAIL: 0
# FAIL:  32
# XPASS: 0
# ERROR: 0
============================================================================
See tests/mpn/test-suite.log

disabling assembly on Intel fixes all of the errors (this was suggested by a gmp developer but I don't see that anyone ever tested it):

% ./configure  --enable-cxx  --disable-assembly && make -j 10
============================================================================
Testsuite summary for GNU MP 6.3.0
============================================================================
# TOTAL: 22
# PASS:  22
# SKIP:  0
# XFAIL: 0
# FAIL:  0
# XPASS: 0
# ERROR: 0

So the error is in the way the handwritten gmp assembly files are being handled / compiled by clang. I suspect it is related to the ALIGN(8) above, so I'll see if changing that makes any difference. Here, we might be in tricky territory, however.

I haven't tried a gcc build as yet.

comment:4 Changed 3 weeks ago by kencu (Ken)

I tried changing the ALIGN(8) to ALIGN(16) in two places in mpn/x86_64/bdiv_q_1.asm and that did nothing to improve the test errors.

There are a lot of assembly files, it turns out, that set ALIGN(8) -- perhaps others need fixing, or perhaps we are barking up the wrong tree altogether.

More to sort out.

You can't really set gmp to build with gcc14, as gmp is a build dependency of gcc14. <https://ports.macports.org/port/gcc14/details/>. So really, this has to be properly sorted out by upstream gmp, and just saying "clang is hosed" or "Apple is silly" is not really going to be the solution.

For now -- I would personally lean towards disabling assembly for Intel builds of gmp until they sort it out.

MacPorts has lots of smart people around -- perhaps someone knows enough x86_64 assembly to see what gmp is doing to make these errors happen.

comment:5 Changed 3 weeks ago by kencu (Ken)

Now unfortunately it is currently difficult to build with gcc in MacPorts (or HomeBrew I believe) targeting a non-native arch. The gcc compilers installed by either package manager only compile for the native arch, and current gcc cross-compilers have not been properly set up.

So there is no way for me to build gmp using gcc on this arm64 Mac as an Intel build to test it.

I would have to build a custom intel cross compiling gcc to do that -- which is actually not all that hard, just have to do it manually.

Let me see if any of my Intel machines can run a new enough MacOS to demonstrate the problem. They are generally running OpenCore now, but even with that, I don't think I can get to a new enough OS to show this issue.

comment:6 Changed 3 weeks ago by ryandesign (Ryan Carsten Schmidt)

Description: modified (diff)
Owner: set to MarcusCalhoun-Lopez
Status: newassigned
Summary: GMP build with GCC, not Clanggmp @6.3.0: tests fail when built with clang

comment:7 Changed 3 weeks ago by kencu (Ken)

Summary: gmp @6.3.0: tests fail when built with clanggmp @6.3.0: tests fail when built with clang on Intel only

comment:8 Changed 3 weeks ago by kencu (Ken)

Summary: gmp @6.3.0: tests fail when built with clang on Intel onlygmp @6.3.0: tests fail when built with clang on Intel only, but pass when assembly is disabled

comment:9 Changed 3 weeks ago by kencu (Ken)

As a data point, I tried a quick test on 10.6 Intel using clang-15 to build gmp, including the assembly files, and it was a 100% pass there.

============================================================================
Testsuite summary for GNU MP 6.3.0
============================================================================
# TOTAL: 22
# PASS:  22
# SKIP:  0
# XFAIL: 0
# FAIL:  0
# XPASS: 0
# ERROR: 0

comment:10 Changed 2 weeks ago by haberg-1 (Hans Åberg)

You can set gmp to depend on gcc14 which now installs; cf. #70866: The GMP developers primarily focus on GCC, so it is safest to have as a dependency.

comment:11 Changed 2 weeks ago by eric-j-ason

Cc: eric-j-ason added

comment:12 Changed 2 weeks ago by kencu (Ken)

as above, #70859#comment:4 gmp is a build dep of gcc-14, so as it stands now, you can’t do that.

Last edited 2 weeks ago by kencu (Ken) (previous) (diff)

comment:13 Changed 2 weeks ago by kencu (Ken)

the handwritten intel assembly files in gmp need to be updated.

comment:14 Changed 2 weeks ago by haberg-1 (Hans Åberg)

Such recursive dependencies may require recursive builds (I have done such by hand). The GMP developers do not have much incentive doing Clang workarounds, so don't expect a fix to the assembly code anytime soon.

It might be possible to build GCC with a for the build only GMP version in turn built using Clang with the assembly code turned off, assuming 'make check' passes, and then the public GMP is built with GCC, and assembly turned on. The assembly is only there to boost performance on certain platforms, so it should make no difference other than that the GCC build becomes somewhat slower.

comment:15 Changed 2 weeks ago by kencu (Ken)

they will have to fix gmp as the primary xcode compiler needs to be able to build it. someone who has a current MacOS system and knows how to debug needs to help them.

we can’t build it with gcc for many reasons. Deps is one. No universal gcc builds is another.

gmp is a core project…it won’t take long once someone skilled gets involved, which is likely to be soon.

until then, simply disable asm on Intel. current compilers optimize so well the handwritten assembly most likely adds little anyway.

comment:16 Changed 2 weeks ago by haberg-1 (Hans Åberg)

Fixing GMP for regular Clang is difficult and time consuming, that is why they do not do it, and it gets worse with Apple Clang, an hacked and old version, which is entirely off the list: GCC is the primary compiler for GNU projects, all else extras.

A public version of GMP with assembly turned off means that one cannot recommend using it, as it is used in numerics requiring high performance.

comment:17 in reply to:  1 Changed 11 days ago by eric-j-ason

Replying to kencu:

I realize, looking at the above message and <https://gmplib.org/list-archives/gmp-bugs/2024-June/005494.html>, there is an alignment error with the x86_64 assembly code that now flags on MacOS as recent MacOS versions are intolerant of unaligned pointers.

Would the code even work with such an error present?

comment:18 Changed 11 days ago by haberg-1 (Hans Åberg)

It may be a Clang issue, its developers ditching the C/C++ standards in favor of promoting mainstream programming, whereas the GMP developers are focused on optimizations that do not fall into that picture.

You might take up the issue on the GMP bugs list: https://gmplib.org/mailman/listinfo/gmp-bugs

comment:19 Changed 11 days ago by haberg-1 (Hans Åberg)

I am checking on the GCC Help list if they have some suggestions: https://gcc.gnu.org/pipermail/gcc-help/2024-September/143751.html

comment:20 Changed 11 days ago by haberg-1 (Hans Åberg)

There is a download_prerequisites script, that will build and link GMP statically as a part of the GCC build: https://gcc.gnu.org/wiki/InstallingGCC

Also see: https://gcc.gnu.org/pipermail/gcc-help/2024-September/143754.html

comment:21 Changed 11 days ago by cooljeanius (Eric Gallager)

Cc: cooljeanius added

comment:22 Changed 10 days ago by kencu (Ken)

Let me see if I can update one of my Intel machines to a version that shows the error. If I can, perhaps we can sort out what the issue is with their assembly files.

In the modern world of compiler optimizations, it is not a given that their assembly is actually any faster than the optimized fallback code. You’d have to benchmark it to know for sure.

comment:23 Changed 10 days ago by kencu (Ken)

macports already knows how to make a gcc with an embedded gmp, by the way.

We just don’t want to do that for the mainstream gcc ports.

And building the gmp port with gcc would mean it can’t be universal, which we don’t want either.

best to sort this out properly, as the whole world needs gmp to build properly with the primary macos compiler, not just macports.

comment:24 Changed 10 days ago by kencu (Ken)

in the meantime, disabling asm on Intel builds for the newest systems is a pretty trivial fix.

I know what you mean about a possible speed issue..

it looks like you can benchmark gmp like this:

https://gmplib.org/gmpbench

would you like to do that with asm (gcc built) and without asm (clang built) and compare them so we can see what the cost really is?

Last edited 10 days ago by kencu (Ken) (previous) (diff)

comment:25 Changed 10 days ago by kencu (Ken)

just as another data point, the gcc10-bootstrap port does not build on arm64 Sequoia. So that pathway is unavailable.

comment:26 Changed 10 days ago by kencu (Ken)

MacOS 12.7 Intel with Xcode clang (1400.x)and asm enabled passes all tests.

 % clang -v
Apple clang version 14.0.0 (clang-1400.0.29.202)
Target: x86_64-apple-darwin21.6.0
Last edited 10 days ago by kencu (Ken) (previous) (diff)

comment:27 Changed 10 days ago by kencu (Ken)

on MacOS 12.7 Intel, building with clang-18 and asm enabled passes all tests.

comment:28 Changed 10 days ago by kencu (Ken)

(aside for later: might be an idea to try the classic linker on Sequoia.)

comment:29 Changed 10 days ago by kencu (Ken)

it looks like using the classic linker might be the fix, for now.

If we do this:

export LDFLAGS='-Wl,-ld_classic'

then I can build and run gmp on a current system in Intel mode, with all the assembly enabled, and it passes all the tests.

Why this is the fix is a question to be answered later. I still think it will turn out to be an alignment thing, but smarter people than me can sort it out.

Please give that a try on your machine as well. It's pretty easy to add that flag to the gmp build.

THis is a much better fix than trying to build it with gcc.

(NB. Up until 3 days ago, gcc14 was configured to always use ld_classic -- so it may be that the reason gcc worked was because it was using ld_classic).

Last edited 10 days ago by kencu (Ken) (previous) (diff)

comment:30 Changed 10 days ago by kencu (Ken)

Summary: gmp @6.3.0: tests fail when built with clang on Intel only, but pass when assembly is disabledgmp @6.3.0: tests fail when built with clang on Intel only, but pass when assembly is disabled. Forcing ld_classic appears to fix the issue.

comment:31 Changed 10 days ago by kencu (Ken)

I have been trying to show that gcc builds gmp incorrectly when using the standard linker too, just like clang does...

but no matter what I do, I can't seem to force gcc14 to use the standard linker. It always uses ld_classic, eg:

libtool: link: gcc-mp-14 -O2 -pedantic -fomit-frame-pointer -m64 -mtune=nehalem -march=nehalem -o .libs/t-toom6h t-toom6h.o  ../../tests/.libs/libtests.a /Users/cunningh/gmp-6.3.0/.libs/libgmp.dylib ../../.libs/libgmp.dylib
ld: warning: -ld_classic is deprecated and will be removed in a future release

I thought gcc had been changed recently to no longer use ld_classic -- that is what the commits said

https://github.com/macports/macports-ports/commits/master/lang/gcc14/Portfile

but there have been some fixups and refixups in gcc and I"m not sure just now what it is doing.

what I do know is that these gcc14 versions:

% port -v installed gcc14 libgcc14
The following ports are currently installed:
  gcc14 @14.2.0_3+stdlib_flag (active) requested_variants='' platform='darwin 23' archs='x86_64' date='2024-09-30T22:25:34-0700'
  libgcc14 @14.2.0_3+stdlib_flag (active) requested_variants='' platform='darwin 23' archs='x86_64' date='2024-09-30T22:25:29-0700'

which are current, are using ld_classic as above.

comment:32 Changed 10 days ago by kencu (Ken)

Oh, I bet the CLTs and XCode on the buildbots haven't been updated to 16 yet.

That's why we're still using ld_classic.

Let me built gcc from source.

comment:33 Changed 10 days ago by haberg-1 (Hans Åberg)

It may be the removal from GCC of the ld option -ld_classic that causes the 'make check' fails, as this option is deprecated on MacOS 15. See: https://trac.macports.org/ticket/70951

With 'make check' tests on gmp-6.3.0 using both port clang-18 and gcc14, on MacOS 15, the fails are the same.

I have reported this on the GMP Bugs list: https://gmplib.org/list- archives/gmp-bugs/2024-October/005537.html

comment:34 Changed 10 days ago by cjones051073 (Chris Jones)

The changes here

https://github.com/macports/macports-ports/commit/2453011ee18c25153b716a2ae42bed85ed52752a

only remove the explicit reference to the classic linker option on the Macports for Xcode 16 or newer. Internally, GCC still knows about the option and uses it, and removing that will require an upstream GCC fix.

Last edited 10 days ago by cjones051073 (Chris Jones) (previous) (diff)

comment:35 Changed 10 days ago by kencu (Ken)

OK, confirmed. This issue has absolutely nothing to do with building with either gcc or clang, and is 100% related to ld_classic vs ld_prime.

If gcc14 is built to use the new linker, and gmp is built with gcc14, then gmp fails every bit as badly as it does when built with clang and the new linker:

/bin/sh ../../libtool  --tag=CC   --mode=link gcc-mp-14  -O2 -pedantic -fomit-frame-pointer -m64 -mtune=nehalem -march=nehalem -no-install  -o t-gcdext_1 t-gcdext_1.o ../../tests/libtests.la ../../libgmp.la 
libtool: warning: '-no-install' is ignored for westmere-apple-darwin23.6.0
libtool: warning: assuming '-no-fast-install' instead
libtool: link: gcc-mp-14 -O2 -pedantic -fomit-frame-pointer -m64 -mtune=nehalem -march=nehalem -o .libs/t-gcdext_1 t-gcdext_1.o  ../../tests/.libs/libtests.a /Users/cunningh/gmp-6.3.0/.libs/libgmp.dylib ../../.libs/libgmp.dylib
/Applications/Xcode.app/Contents/Developer/usr/bin/make  check-TESTS
PASS: t-asmtype
PASS: t-aors_1
../../test-driver: line 107: 36828 Segmentation fault: 11  "$@" > $log_file 2>&1
FAIL: t-divrem_1
PASS: t-mod_1
../../test-driver: line 107: 36866 Segmentation fault: 11  "$@" > $log_file 2>&1
FAIL: t-fat
PASS: t-get_d
PASS: t-instrument
PASS: t-iord_u
PASS: t-mp_bases
PASS: t-perfsqr
PASS: t-scan
PASS: logic
../../test-driver: line 107: 37022 Illegal instruction: 4  "$@" > $log_file 2>&1
FAIL: t-toom22
../../test-driver: line 107: 37041 Illegal instruction: 4  "$@" > $log_file 2>&1
FAIL: t-toom32
../../test-driver: line 107: 37060 Segmentation fault: 11  "$@" > $log_file 2>&1
FAIL: t-toom33
../../test-driver: line 107: 37079 Segmentation fault: 11  "$@" > $log_file 2>&1
FAIL: t-toom42
../../test-driver: line 107: 37098 Illegal instruction: 4  "$@" > $log_file 2>&1
FAIL: t-toom43
../../test-driver: line 107: 37117 Illegal instruction: 4  "$@" > $log_file 2>&1
FAIL: t-toom44
../../test-driver: line 107: 37136 Segmentation fault: 11  "$@" > $log_file 2>&1
FAIL: t-toom52
../../test-driver: line 107: 37155 Illegal instruction: 4  "$@" > $log_file 2>&1
FAIL: t-toom53
../../test-driver: line 107: 37174 Illegal instruction: 4  "$@" > $log_file 2>&1
FAIL: t-toom54
../../test-driver: line 107: 37193 Illegal instruction: 4  "$@" > $log_file 2>&1
FAIL: t-toom62
../../test-driver: line 107: 37212 Illegal instruction: 4  "$@" > $log_file 2>&1
FAIL: t-toom63
../../test-driver: line 107: 37231 Segmentation fault: 11  "$@" > $log_file 2>&1
FAIL: t-toom6h
../../test-driver: line 107: 37250 Illegal instruction: 4  "$@" > $log_file 2>&1
FAIL: t-toom8h
PASS: t-toom2-sqr
PASS: t-toom3-sqr
PASS: t-toom4-sqr
../../test-driver: line 107: 37326 Segmentation fault: 11  "$@" > $log_file 2>&1
FAIL: t-toom6-sqr
../../test-driver: line 107: 37345 Segmentation fault: 11  "$@" > $log_file 2>&1
FAIL: t-toom8-sqr
../../test-driver: line 107: 37364 Illegal instruction: 4  "$@" > $log_file 2>&1
FAIL: t-div
../../test-driver: line 107: 37383 Segmentation fault: 11  "$@" > $log_file 2>&1
FAIL: t-mul
../../test-driver: line 107: 37402 Segmentation fault: 11  "$@" > $log_file 2>&1
FAIL: t-mullo
../../test-driver: line 107: 37421 Segmentation fault: 11  "$@" > $log_file 2>&1
FAIL: t-sqrlo
../../test-driver: line 107: 37440 Trace/BPT trap: 5       "$@" > $log_file 2>&1
FAIL: t-mulmod_bnm1
../../test-driver: line 107: 37459 Trace/BPT trap: 5       "$@" > $log_file 2>&1
FAIL: t-sqrmod_bnm1
PASS: t-mulmid
../../test-driver: line 107: 37497 Segmentation fault: 11  "$@" > $log_file 2>&1
FAIL: t-mulmod_bknp1
../../test-driver: line 107: 37516 Segmentation fault: 11  "$@" > $log_file 2>&1
FAIL: t-sqrmod_bknp1
SKIP: t-addaddmul
../../test-driver: line 107: 37554 Illegal instruction: 4  "$@" > $log_file 2>&1
FAIL: t-hgcd
../../test-driver: line 107: 37573 Illegal instruction: 4  "$@" > $log_file 2>&1
FAIL: t-hgcd_appr
../../test-driver: line 107: 37592 Segmentation fault: 11  "$@" > $log_file 2>&1
FAIL: t-matrix22
../../test-driver: line 107: 37611 Trace/BPT trap: 5       "$@" > $log_file 2>&1
FAIL: t-invert
../../test-driver: line 107: 37630 Illegal instruction: 4  "$@" > $log_file 2>&1
FAIL: t-bdiv
../../test-driver: line 107: 37649 Illegal instruction: 4  "$@" > $log_file 2>&1
FAIL: t-fib2m
PASS: t-broot
PASS: t-brootinv
PASS: t-minvert
../../test-driver: line 107: 37725 Segmentation fault: 11  "$@" > $log_file 2>&1
FAIL: t-sizeinbase
PASS: t-gcd_11
PASS: t-gcd_22
PASS: t-gcdext_1
============================================================================
Testsuite summary for GNU MP 6.3.0
============================================================================
# TOTAL: 53
# PASS:  20
# SKIP:  1
# XFAIL: 0
# FAIL:  32
# XPASS: 0
# ERROR: 0
============================================================================
See tests/mpn/test-suite.log
Please report to gmp-bugs@gmplib.org (see https://gmplib.org/manual/Reporting-Bugs.html)
============================================================================
make[5]: *** [test-suite.log] Error 1
make[4]: *** [check-TESTS] Error 2
make[3]: *** [check-am] Error 2
make[2]: *** [check-recursive] Error 1
make[1]: *** [check-recursive] Error 1
make: *** [check] Error 2
cunningh@macpro gmp-6.3.0 % 
Last edited 10 days ago by kencu (Ken) (previous) (diff)

comment:36 Changed 10 days ago by cjones051073 (Chris Jones)

Cc: cjones051073 added

comment:37 Changed 10 days ago by kencu (Ken)

Cc: cjones051073 removed

This of course is a well known issue:

https://www.scivision.dev/xcode-ld_classic/

So the fix for gmp is to use ld_classic, whatever the compiler, and then the Intel assembly will be properly done.

We can stop talking about forcing builds with gcc.

I will push this through.

comment:38 Changed 10 days ago by kencu (Ken)

Cc: cjones051073 added

comment:39 in reply to:  34 Changed 10 days ago by kencu (Ken)

Replying to cjones051073:

Internally, GCC still knows about the option and uses it, and removing that will require an upstream GCC fix.

A change we certainly hope nobody suggests to upstream or tries to implement any time soon, until all these projects get sorted out with the new linker.

comment:40 in reply to:  33 ; Changed 10 days ago by kencu (Ken)

Replying to haberg-1:

It may be the removal from GCC of the ld option -ld_classic that causes the 'make check' fails,

No, it's not that.

The new linker is doing something different with the Intel assembly than ld_classic did, and that is what is making "make check" fail.

Edit: Sorry, I misunderstood you. Yes, it is exactly that gcc14 now doesn't use ld_classic that is now causing "make check" to fail when gmp is built with gcc14. Exactly that.

Last edited 10 days ago by kencu (Ken) (previous) (diff)

comment:41 Changed 10 days ago by cjones051073 (Chris Jones)

My point was more when Apple completely removes the classic option (and now they have officially depreciated it I would say its on the cards from Xcode 17 onwards) GCC will have to adapt at that point. The sooner they start planing for this the better.

comment:42 in reply to:  40 Changed 10 days ago by cjones051073 (Chris Jones)

Replying to kencu:

Replying to haberg-1:

It may be the removal from GCC of the ld option -ld_classic that causes the 'make check' fails,

No, it's not that.

The new linker is doing something different with the Intel assembly than ld_classic did, and that is what is making "make check" fail.

Edit: Sorry, I misunderstood you. Yes, it is exactly that gcc14 now doesn't use ld_classic that is now causing "make check" to fail when gmp is built with gcc14. Exactly that.

Perhaps turning it off for Xcode 16 already is a bit early. I could be persuaded to roll back a bit the change in GCC14 and only limit it for Xcode 17 or newer.

comment:43 Changed 10 days ago by kencu (Ken)

indeed -- hopefully folks like us can get gmp and similar projects up to speed.

Unfortunately right now gmp etc consider this everyone else's problem to fix -- and maybe it is. I don't know what is exactly causing the new linker to generate these errors, and whose error it is to fix,exactly.

comment:44 Changed 10 days ago by cjones051073 (Chris Jones)

... But then we would be back to having to live with the linker warning, which causes issues in itself in some cases.

comment:45 Changed 10 days ago by kencu (Ken)

Indeed so -- we need someone like Jeremy around here with close ties to Apple to sort out whether this is a linker bug or a gmp bug...

comment:46 Changed 10 days ago by kencu (Ken)

I don't even know where to usefully report this. Opening RADARs is always such a black hole, it seems...

comment:47 Changed 10 days ago by cjones051073 (Chris Jones)

Jeremy has sadly been MIA for awhile now.

comment:48 Changed 10 days ago by cjones051073 (Chris Jones)

I can almost guarantee you filing an Apple radar about a linker issue specific to GMP (GPL-3 code) will go absolutely no where.

comment:49 Changed 10 days ago by cjones051073 (Chris Jones)

b.t.w. is not blacklisting Xcode clang for 16 and above and just falling back to using a macports clang build (18 say) not a viable workaround for now as well ?

comment:50 Changed 10 days ago by kencu (Ken)

I think it's not the compiler at all.

Just the linker that is chosen, whatever the compiler might be.

(The asm files are just assembled anyway, and passed to the linker, and the compiler is doing very little here).

comment:51 Changed 10 days ago by cjones051073 (Chris Jones)

True, but if by blacklisting Xcode clang and using a macports version, you also side step using Xcode provied linker and use the one for that clang build, you by-pass the issue. Did you not say above GMP builds fine if you use clang-18 ? What linker is actually used in that case ?

comment:52 Changed 9 days ago by kencu (Ken)

You are correct that a linker is installed with all our recent clang ports. However, it is not used, by default at least. I believe that even on the most recent llvm builds that linker is not considered "ready for prime time". I believe the new xcode linker (which is causing our troubles here) is a fork of that project, however.

The linker used by default by macports-clang versions continues to be the one pointed to by the ld64 port with it's shim in ${prefix}/bin/ld, and most often that points to ld64 +xcode to pick up the xcode-supplied linker.

I did mention that on MacOS 12.7 gmp built without troubles using clang-18 to build it. MacOS 12.7 uses the xcode linker, which at that stage of things is equivalent to "ld_classic". I was using that example more to support the idea that it wasn't something with newer clangs that was causing the build errors, it was something else (like the linker).

comment:53 Changed 9 days ago by markmentovai (Mark Mentovai)

Cc: markmentovai added

comment:54 Changed 9 days ago by markmentovai (Mark Mentovai)

(MacPorts-specific: This is a message that I’m trying to post to gmp-bugs@…, but it hasn’t landed there yet. There is nothing wrong with any compiler, either in Xcode or MacPorts. There is a bug in Apple’s new linker and it can occur using any compiler, but it’s not a bug that gmp needs to suffer, and it’s possible to avoid the bug without opting for the deprecated linker.)

If you read nothing else, read this:

gmp-6.3.0 ships libtool-2.4.6 (2015-02-16). Update to libtool-2.4.7 (2022-03-17) to solve this problem.

Details:

There does appear to be a bug in Apple’s new linker (ld-new or ld-prime) when targeting x86_64, producing a Mach-O dynamic library (clang -dynamiclib), and using the flat namespace option (-flat_namespace). I observed this as a variety of crashes in make check. I investigated t-bdiv raising SIGILL in particular:

% lldb tests/mpn/.libs/t-bdiv
(lldb) target create "tests/mpn/.libs/t-bdiv"
Current executable set to '…/gmp-6.3.0.build/tests/mpn/.libs/t-bdiv' (x86_64).
(lldb) env DYLD_LIBRARY_PATH=.libs
(lldb) run
Process 19802 launched: '…/gmp-6.3.0.build/tests/mpn/.libs/t-bdiv' (x86_64)
Process 19802 stopped
* thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_INSTRUCTION (code=EXC_I386_INVOP, subcode=0x0)
    frame #0: 0x00000001000de806 libgmp.10.dylib`__gmpn_sub_n + 3
Target 0: (t-bdiv) stopped.
(lldb) disassemble
libgmp.10.dylib`:
    0x1000de803 <+0>: jmpq   *0x11ecf(%rip)            ; (void *)0x00000001000a1a00: __gmpn_sub_n
(lldb) disassemble -s 0x1000de803 -e 0x1000de80f
libgmp.10.dylib`:
    0x1000de803 <+0>: jmpq   *0x11ecf(%rip)            ; (void *)0x00000001000a1a00: __gmpn_sub_n

libgmp.10.dylib`:
    0x1000de809 <+0>: jmpq   *0x11ed1(%rip)            ; (void *)0x00000001000a1aae: __gmpn_sub_nc

With the fault address at 0x1000de806 falling partway through the instruction at 0x1000de803, this certainly would be a bad instruction. This code was assembled from https://gmplib.org/repo/gmp-6.3/file/62abbaeaab13/mpn/x86_64/core2/aors_n.asm, at the bottom of the file has __gmpn_sub_nc jumping to within (but not the beginning of) __gmpn_sub_n. Duplicating that structure in a reduced testcase:

% cat ts_x86-64.s    
.text
.globl _F
.p2align 4, 0x90
_F:
  movl $1, %eax
Lcommon:
  shll %eax
  retq

.globl _G
.p2align 4, 0x90
_G:
  movl $2, %eax
  jmp Lcommon
% cat tc.c
int F();
int G();

int main(int argc, char* argv[]) {
  return G();
}

The problem is easily reproduced:

% clang -dynamiclib -flat_namespace -o libt.dylib ts_x86-64.s
% clang -o t tc.c libt.dylib
% ./t
zsh: segmentation fault  ./t

This dylib is small enough to observe what’s going on inside directly:

% objdump -d libt.dylib  

libt.dylib: file format mach-o 64-bit x86-64

Disassembly of section __TEXT,__text:

0000000000000f80 <_F>:
     f80: b8 01 00 00 00               movl $1, %eax
     f85: d1 e0                         shll %eax
     f87: c3                           retq
     f88: 0f 1f 84 00 00 00 00 00       nopl (%rax,%rax)

0000000000000f90 <_G>:
     f90: b8 02 00 00 00               movl $2, %eax
     f95: e9 05 00 00 00               jmp 0xf9f

Disassembly of section __TEXT,__stubs:

0000000000000f9a <__stubs>:
     f9a: ff 25 60 00 00 00             jmpq *96(%rip)               ## 0x1000

The jump at 0xf95 is bad: 0xf9f is a bad jump target. As before, that address lies within another instruction (in this case, the last byte of the instruction at 0xf9a). In fact, that’s the very last byte of the section:

% otool -l libt.dylib
[…]
Section
  sectname __stubs
   segname __TEXT
      addr 0x0000000000000f9a
      size 0x0000000000000006
[…]
Section
  sectname __unwind_info
   segname __TEXT
      addr 0x0000000000000fa0
      size 0x0000000000000058
[…]

The jump at 0xf95 should target 0xf85, or _G + 0x5. For some reason, the linker created a stub for this jump (which itself shouldn’t be necessary) and then, instead of arranging for the stub to resolve and jump to _G + 0x5, jumped to offset 0x5 within the stub.

This is a clear bug in the linker, and I’ll report it to Apple, but don’t know that anyone could expect much traction.

That doesn’t need to be the end of the story. There’s another concern here: this bug only occurs with -flat_namespace. gmp shouldn’t need -flat_namespace, and in fact it’s undesirable to enable it. It’s coming into this build from configure, via aclocal.m4, having been included from libtool.m4. In libtool-2.4.6, which gmp-6.3.0 is using, that’s https://git.savannah.gnu.org/cgit/libtool.git/tree/m4/libtool.m4?h=v2.4.6#n1070. In particular, it intends to enable -flat_namespace only on very early Mac OS X versions (pre-10.4, in the PowerPC-only era). But the case that we’d like to hit, assuming MACOSX_DEPLOYMENT_TARGET is unset (as it normally would be), doesn’t match $host on a modern macOS system, because the Darwin version has marched past 20, while the pattern only contemplates versions up to 19.

https://git.savannah.gnu.org/cgit/libtool.git/commit/m4/libtool.m4?id=9e8c882517082fe5755f2524d23efb02f1522490, in libtool-2.4.7, modernizes this check in libtool, and with that in use, does not enable -flat_namespace in this situation. Upgrading libtool in gmp to that version will fix this problem. I ran autoreconf --install with autoconf-2.69, automake-1.15, and libtool-2.4.7, and observed a clean make check on macOS 14.7 x86_64 (nehalem-apple-darwin23.6.0)/Xcode 15.4 and macOS 15.0 x86_64 (nehalem-apple-darwin24.0.0)/Xcode 16.0. In both cases, the linker is ld-new/ld-prime (no -ld_classic).

Version 0, edited 9 days ago by markmentovai (Mark Mentovai) (next)

comment:55 Changed 9 days ago by haberg-1 (Hans Åberg)

Changed 9 days ago by markmentovai (Mark Mentovai)

comment:56 Changed 9 days ago by markmentovai (Mark Mentovai)

Applying gmp-6.3.0_libtool-2.4.7.patch is an “easy” way to update gmp-6.3.0 to use libtool-2.4.7 without having to fiddle with autotools.

comment:57 Changed 9 days ago by cjones051073 (Chris Jones)

Could you see if adding

use_autoconf        yes

helps, instead of that patch ? If memory serves that triggers a rerun of the autoconf utility before the build.

comment:58 in reply to:  57 Changed 8 days ago by markmentovai (Mark Mentovai)

Replying to cjones051073:

Could you see if adding

use_autoconf        yes

helps, instead of that patch ? If memory serves that triggers a rerun of the autoconf utility before the build.

MacPorts’ gmp package is unaffected, because it always builds with MACOSX_DEPLOYMENT_TARGET set, which even under the older libtool doesn’t cause -flat_namespace to be used. (Incidentally, it also causes -Wl,-undefined,dynamic_lookup to not be specified, but gmp doesn’t actually require this, so it’s fine.)

This bug as I understand it is about building gmp outside of MacPorts:

GMP 'make check' fails when built with later versions of Clang, but passes when built with GCC, so that should be the build dependency. Tried on MacOS 14. See:

https://gmplib.org/list-archives/gmp-bugs/2024-June/005505.html

https://gmplib.org/list-archives/gmp-bugs/2024-July/005506.html

This refers to make check and not MacPorts’ port check, and seems to mean that when building outside of MacPorts, a failure was observed with clang but not with gcc. This would have been before 2453011ee18c and 771b2dab4689, so MacPorts gcc would have been using ld -ld_classic.

We now understand:

  • The problem is in the linker, not the compiler. The bug can occur with any compiler, Xcode’s or MacPorts’. It can also be avoided with any compiler by forcing the use of ld -ld_classic, although that’s not the best solution.
  • The linker bug occurs using ld-new/ld-prime targeting x86_64 and using -flat_namespace. It’s a bug in gmp’s build system that -flat_namespace is used at all, and this bug can be fixed by gmp picking up new build dependencies (in particular, libtool) that already fixed this bug a couple of years ago.
  • MacPorts’ own build of gmp is not affected by the bug, because MACOSX_DEPLOYMENT_TARGET during its build. There is no reason to fear any particular compiler when MacPorts builds gmp.

Given the new understanding, in light of how this bug was originally filed, I think that it should be closed with no action.

I left the patch file here to “show work”, but given the above, don’t think MacPorts needs to take it. use_autoconf/use_autoreconf might not work in this instance anyway, since (as is typical for autotools-based projects) gmp seems tied to specific versions of autoconf and automake, and those are not the current versions in MacPorts. In order to regenerate these files, I had to use autoconf269 and automake115, and place symbolic links from ${prefix}/share/automake115/aclocal to ../../aclocal for a variety of libtool’s m4 files. That’s why I called it fiddly.

comment:59 Changed 8 days ago by kencu (Ken)

Resolution: invalid
Status: assignedclosed

OK, this looks sorted.

We had established there is no need to force any particular compiler, and the problem was with the new linker.

Mark elegantly determined that it was "flat-namespace" that was the killer, and that because we set MACOSX_DEPLOYMENT_TARGET in MacPorts, that flat namespace is not going to be added to our macports builds.

I can certainly confirm that there is no "flat namespace" in the macports builds:

/bin/sh ../libtool  --tag=CC   --mode=link clang  -O2 -pedantic -fomit-frame-pointer -m64 -mtune=nehalem -march=nehalem -no-install  -o libtests.la  memory.lo misc.lo refmpf.lo refmpn.lo refmpq.lo refmpz.lo spinner.lo trace.lo amd64call.lo amd64check.lo ../libgmp.la 

and upstream has updated libtool already so that users playing with this outside of macports should see their "make check" working right with the next release.

Case closed!

Of the closing options -- the only one that really fits is "invalid" as this never affected macports builds anyway, in the end.

If someone doesn't like "invalid", feel free to reclose it with whatever you want.

comment:60 Changed 8 days ago by cjones051073 (Chris Jones)

Ok. I am a bit confused why this ticket was then created in the first place if there isn’t a problem with macports build of gmp. That certainly is not clear from the original submission or discussion.

Closing as invalid is perfectly reasonable in this case.

comment:61 Changed 8 days ago by kencu (Ken)

We didn't know that the macports builds were unaffected until the details about flat_namespace came forth.

Only once it was known that flat_namespace was the key piece of the puzzle, and also knowing that macports doesn't use flat_namespace due to -- well, essentially lucky reasons, basically -- we realized macports was unaffected and we had dodged this bullet.

comment:62 Changed 8 days ago by haberg-1 (Hans Åberg)

According to the GMP Bugs list, the issue can be fixed by using recent libtool, which has been done, but not pushed in a release: https://gmplib.org/list-archives/gmp-bugs/2024-October/005540.html

comment:63 Changed 8 days ago by haberg-1 (Hans Åberg)

You can use the latest snaphot for now in the original setup, compiling GMP with Clang, and linking GCC to that, as all GMP 'make check' tests pass, with both gcc14 and clang-18.

https://gmplib.org/download/snapshot/gmp-next/gmp-6.3.0-20240515185115.tar.zst

https://gmplib.org/download/snapshot/gmp-next/

comment:64 Changed 8 days ago by kencu (Ken)

We don't need to worry about doing that, though, because the make check error does not show up in MacPorts builds even with the existing release, because of the way MacPorts builds have been configured.

you can see this for yourself, if you like, by doing this:

sudo port -v test gmp
Last edited 8 days ago by kencu (Ken) (previous) (diff)

comment:65 Changed 8 days ago by haberg-1 (Hans Åberg)

Anyway, tests pass without that special configuration.

comment:66 Changed 8 days ago by kencu (Ken)

Exactly. Everything is fine.

comment:67 Changed 7 days ago by haberg-1 (Hans Åberg)

There is an informative description of the -flat_namespace option on the GMP Bugs list: https://gmplib.org/list-archives/gmp-bugs/2024-October/005543.html

Note: See TracTickets for help on using tickets.