Nix hanging during system rebuild (Solution Included)

Just a funny story about me and NixOS. To be clear, if you’re looking for a solution for your NixOS, I wouldn’t recommend looking for advice here. The error is so absurd and niche, it’s probably not what you’re looking for.

The error

$ rb # alias of sudo nixos-rebuild switch --flake ...
building the system configuration...
^C

zsh nixos on  main [$!]              took 1h44m36s 
$

Well… actually, there’s no error message. But thanks to starship’s Command Duration module, we can see it took almost two hours… and if I haven’t stopped it manually it’d probably take forever (literally)

No high CPU usage, no high RAM usage, and no high disk usage. Which was the common response on the internet about high rebuild times. So… I didn’t know what to do.

I rolled back to a previous generation and the problem was “solved.” The system could rebuild again! Yay!

But after switching to the new configuration and THEN rebuilding… yeah, the problem wasn’t solved. I tried updating, downgrading, and doing all kinds of shenanigans to my flake.lock. Maybe it was a bad Nix update (I highly doubt they’ll ever release a version of Nix which can’t even rebuild its own system. Probably the most common use of Nix). But no… the problem persisted.

The luckiest error message ever

A few days ago, I was messing around with playerctl, I created some scripts to bind to sxhkd. Yes, not even music listening is free of the ~~blessing~~ curse of shell script automation and hotkey-heavy workflows.

I wanted to show a notification on every song or status change, so I made a little test for that.

#!/bin/sh
#test

playerctl metadata --follow --format "{{status}} {{title}}" | while read -r line; do
    ~/.local/bin/music_status.sh
done

Can you guess how this script was named? Exactly… test. I mean, I was just testing if that worked, so the name made sense in the moment. If you’re familiar with any Unix-based systems, you might be facepalming right now, you see where the error lies.

Ah, the error message, right!

/home/zekar/.local/bin/test: 2: playerctl: not found
/home/zekar/.local/bin/test: 2: playerctl: not found
updating GRUB 2 menu...
activating the configuration...
setting up /etc...
reloading user units for zekar...
restarting sysinit-reactivation.target
the following new units were started: NetworkManager-dispatcher.service
Done. The new configuration is /nix/store/[...]

Luckily, I just recently installed playerctl and the generation I rolled back didn’t have it installed so by PURE LUCK I remembered the test script I made. (The script and rebuilding didn’t even happen around the same time, so… I had no idea they were related!!). If I had playerctl installed from the beginning, I wouldn’t have gotten that error message (like at the top of the post)

Apparently, somewhere during the Nix evaluation, the test binary is called or something like that… Then I asked myself, “Why is it calling my test script???”. Finally, it hit me. THE test COMMAND EXISTS, I WAS PROBABLY OVERRIDING IT!

To be honest, I have no idea how it was able to rebuild if the test command failed so it probably messed some of its conditional logic near the end. I could research it more in depth but probably is not worth it lol.

I changed my script’s name to something else and the problem went away.

Key takeaway: Don’t name your scripts after already existing programs… I mean, it’s obvious, but when its done by accident you don’t even notice (especially, because I almost didn’t remember that test is the same as [, and of course, the use of [ is way more common in scripts and stuff)