Debugging on FreeBSD

I have recently switched several smaller service VMs from Gentoo Linux to FreeBSD. I did so because I did not want to spend so much time on keeping a whole zoo of VMs up to date. I like FreeBSD because it is quite similar to Gentoo: it allows for easy installation of packages from source. In addition it also allows you to install precompiled packages in case you don't need them customized which saves me a lot of time.

On one of those mentioned VMs I run OpenLDAP as a directory service (for centralized user and group management). Recently this daemon seems to randomly crash with syslog messages like those:

Feb 18 17:00:00 aveta kernel: pid 28337 (slapd), uid 389: exited on signal 11
Feb 19 00:00:00 aveta kernel: pid 38057 (slapd), uid 389: exited on signal 11

By default no core dumps are created for binaries that use setuid/setgid to drop their root privileges. You can check if they are enabled via sysctl:

# sysctl kern.sugid_coredump
kern.sugid_coredump: 0

To enable:

# sysctl kern.sugid_coredump=1
kern.sugid_coredump: 0 -> 1

The next setting to have a look at is called kern.corefile which specifies where core dumps are stored. The default is to write them to a file named binary.core (where binary is the name of the program that crashed) placed in the working directory of the process. Since OpenLDAP has / as its working directory and drops its privileges and thus is not be able to write to / it creates no core dump at all. To fix this we ask the system to place the core dump into /var/tmp:

# sysctl kern.corefile=/var/tmp/%N.core
kern.corefile: %N.core -> /var/tmp/%N.core

Then we wait for the service to crash again ...

Feb 20 19:00:04 aveta kernel: Failed to write core file for process slapd (error 14)
Feb 20 19:00:04 aveta kernel: pid 46871 (slapd), uid 389: exited on signal 11

# ls -la /var/tmp/*.core
-rw-------  1 ldap  wheel  18350080 Feb 20 19:00 /var/tmp/slapd.core

It seems like there was a problem writing the core file, let us try gdb anyways:

# gdb /usr/local/libexec/slapd
GNU gdb 6.1.1 [FreeBSD]
Copyright 2004 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for details.
This GDB was configured as "amd64-marcel-freebsd"...(no debugging symbols found)...
(gdb) core /var/tmp/slapd.core
Core was generated by `slapd'.
Program terminated with signal 11, Segmentation fault.
Reading symbols from /usr/local/lib/libldap_r-2.4.so.2...(no debugging symbols found)...done.
Loaded symbols for /usr/local/lib/libldap_r-2.4.so.2
Reading symbols from /usr/local/lib/liblber-2.4.so.2...(no debugging symbols found)...done.
Loaded symbols for /usr/local/lib/liblber-2.4.so.2
Reading symbols from /usr/local/lib/libltdl.so.7...(no debugging symbols found)...done.
Loaded symbols for /usr/local/lib/libltdl.so.7
Reading symbols from /lib/libcrypt.so.5...(no debugging symbols found)...done.
Loaded symbols for /lib/libcrypt.so.5
Reading symbols from /usr/lib/libssl.so.7...(no debugging symbols found)...done.
Loaded symbols for /usr/lib/libssl.so.7
Reading symbols from /lib/libcrypto.so.7...(no debugging symbols found)...done.
Loaded symbols for /lib/libcrypto.so.7
Reading symbols from /lib/libthr.so.3...(no debugging symbols found)...done.
Loaded symbols for /lib/libthr.so.3
Reading symbols from /lib/libc.so.7...(no debugging symbols found)...done.
Loaded symbols for /lib/libc.so.7
Reading symbols from /usr/local/libexec/openldap/back_mdb-2.4.so.2...(no debugging symbols found)...done.
Loaded symbols for /usr/local/libexec/openldap/back_mdb-2.4.so.2
Reading symbols from /libexec/ld-elf.so.1...(no debugging symbols found)...done.
Loaded symbols for /libexec/ld-elf.so.1
#0  0x00000008028134f0 in mdb_search () from /usr/local/libexec/openldap/back_mdb-2.4.so.2
[New Thread 802780800 (LWP 100201/slapd)]
[New Thread 802780400 (LWP 100200/slapd)]
[New Thread 802780000 (LWP 100189/slapd)]
[New Thread 80277fc00 (LWP 100188/slapd)]
[New Thread 80277f800 (LWP 100187/slapd)]
[New Thread 80277f400 (LWP 100186/slapd)]
[New Thread 80277f000 (LWP 100185/slapd)]
[New Thread 802407800 (LWP 100184/slapd)]
[New Thread 802406400 (LWP 100154/slapd)]
(gdb) thread apply all bt

Thread 9 (Thread 802406400 (LWP 100154/slapd)):
#0  0x000000080181c8cc in __error () from /lib/libthr.so.3
Cannot access memory at address 0x7fffffffeb98

Thread 8 (Thread 802407800 (LWP 100184/slapd)):
#0  0x0000000801b71b7a in select () from /lib/libc.so.7
Cannot access memory at address 0x7fffffbfd688

Thread 7 (Thread 80277f000 (LWP 100185/slapd)):
#0  0x000000080181c8cc in __error () from /lib/libthr.so.3
Cannot access memory at address 0x7fffff3fcc08

Thread 6 (Thread 80277f400 (LWP 100186/slapd)):
#0  0x000000080181c8cc in __error () from /lib/libthr.so.3
Cannot access memory at address 0x7ffffebfbc08

Thread 5 (Thread 80277f800 (LWP 100187/slapd)):
#0  0x000000080181c8cc in __error () from /lib/libthr.so.3
Cannot access memory at address 0x7ffffe3fac08

Thread 4 (Thread 80277fc00 (LWP 100188/slapd)):
#0  0x000000080181c8cc in __error () from /lib/libthr.so.3
Cannot access memory at address 0x7ffffdbf9c08

Thread 3 (Thread 802780000 (LWP 100189/slapd)):
#0  0x000000080181c8cc in __error () from /lib/libthr.so.3
Cannot access memory at address 0x7ffffd3f8c08

Thread 2 (Thread 802780400 (LWP 100200/slapd)):
#0  0x00000008028134f0 in mdb_search () from /usr/local/libexec/openldap/back_mdb-2.4.so.2
Cannot access memory at address 0x7ffffcbf7818

Thread 1 (Thread 802780800 (LWP 100201/slapd)):
#0  0x000000080181c8cc in __error () from /lib/libthr.so.3
Cannot access memory at address 0x7ffffc3f59d8
#0  0x00000008028134f0 in mdb_search () from /usr/local/libexec/openldap/back_mdb-2.4.so.2

So it seems gdb is not able to make much out of this core dump.

A few (seemingly unrelated) updates and a few reboots (including the host system) later I have yet to see another crash. I will continue to look at this problem once it occurs again, probably with a debug build of OpenLDAP.

So to sum it up: no, I don't have a solution to this problem and I have no idea what "error 14" means. Feel free to contact me if you have any further ideas.