# A11 — debugUnix: SIGSEGV handler runs non-async-signal-safe code and re-enters itself

Bug ref      : always.md A.11 ; pharo.md §5.1
Severity     : HIGH-on-fault (crash report is undefined behavior; recursion possible)
File         : src/debugUnix.c
Lines (HEAD) : 49-95 (`doReport`, `sigsegv`), 122-162 (`installErrorHandlers`)

## Problem

The fatal-signal handler `sigsegv` calls `doReport`, which calls

  - `time(NULL)`           (not async-signal-safe)
  - `ctime_r`              (not async-signal-safe on glibc; may take a lock)
  - `fopen`, `fclose`      (not async-signal-safe)
  - `vfprintf` via         (not async-signal-safe; takes stdio locks)
    `reportStackState`     and may malloc
  - `backtrace_symbols_fd` (`fd` variant is best-effort safe; libc
                            varies)
  - `vm_setVMOutputStream` (touches FILE * across handler / non-handler)

All of these can deadlock or scribble state when the crashing thread
already held a malloc-arena lock, stdio lock, or DSO loader lock.

Compounding this, every handler is registered with `SA_NODEFER`:

```c
sigsegv_handler_action.sa_flags = SA_NODEFER | SA_SIGINFO;
```

so a SIGSEGV that occurs *inside the handler* (very plausible while
walking a corrupted stack) re-enters the handler from the top with no
fault de-duplication.

A truly async-signal-safe rewrite is a multi-PR project. This PR
makes the smallest change that materially reduces the risk of
deadlock and recursion without losing the crash report on the common
case:

  1. Remove `SA_NODEFER` so a recursive fault is masked to the
     default disposition rather than re-entering the handler.
  2. Add a small recursion guard (`sig_atomic_t inFault`) — already
     declared at line 86 but never consulted — so a second fault
     skips the heavyweight reporting path and calls `_exit(128+sig)`
     directly.

## Fix

```diff
diff --git a/src/debugUnix.c b/src/debugUnix.c
index 308e4e897..b715a6a48 100644
--- a/src/debugUnix.c
+++ b/src/debugUnix.c
@@ -87,11 +87,21 @@ static int inFault = 0;
 
 void sigsegv(int sig, siginfo_t *info, ucontext_t *uap)
 {
+	if (inFault) {
+		/* Recursive fault inside the crash handler. Do not call
+		 * anything async-signal-unsafe; the previous handler
+		 * invocation already wrote what it could. */
+		_exit(128 + sig);
+	}
+	inFault = 1;
+
 	char *fault = strsignal(sig);
 
 	doReport(fault, uap);
 
-	exit(-1);
+	/* _exit avoids stdio flushers and atexit handlers that may
+	 * themselves deadlock after a crash. */
+	_exit(128 + sig);
 }
 
 void terminateHandler(int sig, siginfo_t *info, ucontext_t *uap)
@@ -123,7 +133,10 @@ EXPORT(void) installErrorHandlers(){
 	struct sigaction sigusr1_handler_action, sigsegv_handler_action, term_handler_action, sigpipe_handler_action;
 
 	sigsegv_handler_action.sa_sigaction = (void (*)(int, siginfo_t *, void *))sigsegv;
-	sigsegv_handler_action.sa_flags = SA_NODEFER | SA_SIGINFO;
+	/* Drop SA_NODEFER so a recursive fault is masked and the kernel
+	 * delivers the default disposition; add SA_RESETHAND so the second
+	 * occurrence terminates cleanly rather than re-entering this handler. */
+	sigsegv_handler_action.sa_flags = SA_SIGINFO | SA_RESETHAND;
 	sigemptyset(&sigsegv_handler_action.sa_mask);
 
 #ifdef SIGEMT
```

## Test plan

- Build with `-fsanitize=address` and inject a deliberate `*((int*)0) = 1;`
  to fire SIGSEGV; verify crash dump is produced once, then
  `_exit(139)` (i.e. 128 + SIGSEGV(11)).
- Inject a second SIGSEGV by deliberately segfaulting inside the
  handler (e.g. `kill(getpid(), SIGSEGV);` from within doReport for
  one test build). The recursive entry must skip the heavy reporter
  and `_exit`.
- Verify normal `SIGUSR1` reporting still works — that handler is
  not touched.

## Risk notes

- Replacing `exit(-1)` with `_exit(128+sig)` is intentional: `exit`
  runs atexit handlers and stdio flushers, both of which take locks
  that the crashing thread may already hold.
- `SA_RESETHAND` restores the default disposition after the first
  invocation so a recursive fault terminates cleanly rather than
  re-entering the handler. Combined with the `inFault` guard, the
  worst case is a clean process exit even if the first invocation
  itself deadlocks.
- This does **not** make the report path async-signal-safe. A
  follow-up PR should replace `vfprintf` + `fopen` with a fixed-size
  bounded writer (`write(2)` + integer-to-string). Treat this as
  defense-in-depth, not a fix for the underlying issue.
