Nix hanging during system rebuild (Solution Included)
Just a funny story about me and NixOS. To be clear, if you’re looking for a solution for your NixOS, I wouldn’t recommend looking for advice here. The error is so absurd and niche, it’s probably not what you’re looking for.
The error
$ rb # alias of sudo nixos-rebuild switch --flake ...
building the system configuration...
^C
zsh nixos on main [$!] took 1h44m36s
$ Well… actually, there’s no error message. But thanks to starship’s Command Duration module, we can see it took almost two hours… and if I haven’t stopped it manually it’d probably take forever (literally)
No high CPU usage, no high RAM usage, and no high disk usage. Which was the common response on the internet about high rebuild times. So… I didn’t know what to do.
I rolled back to a previous generation and the problem was “solved.” The system could rebuild again! Yay!
But after switching to the new configuration and
THEN rebuilding… yeah, the problem wasn’t solved. I
tried updating, downgrading, and doing all kinds of shenanigans to my
flake.lock. Maybe it was a bad Nix update (I highly doubt
they’ll ever release a version of Nix which can’t even rebuild its own
system. Probably the most common use of Nix). But no… the problem
persisted.
The luckiest error message ever
A few days ago, I was messing around with playerctl, I created
some scripts to bind to sxhkd. Yes, not even
music listening is free of the blessing curse of shell script
automation and hotkey-heavy workflows.
I wanted to show a notification on every song or status change, so I made a little test for that.
#!/bin/sh
#test
playerctl metadata --follow --format "{{status}} {{title}}" | while read -r line; do
~/.local/bin/music_status.sh
doneCan you guess how this script was named? Exactly… test.
I mean, I was just testing if that worked, so the name made sense in the
moment. If you’re familiar with any Unix-based systems, you might be
facepalming right now, you see where the error lies.
Ah, the error message, right!
/home/zekar/.local/bin/test: 2: playerctl: not found
/home/zekar/.local/bin/test: 2: playerctl: not found
updating GRUB 2 menu...
activating the configuration...
setting up /etc...
reloading user units for zekar...
restarting sysinit-reactivation.target
the following new units were started: NetworkManager-dispatcher.service
Done. The new configuration is /nix/store/[...]Luckily, I just recently installed playerctl and the
generation I rolled back didn’t have it installed so by PURE LUCK I
remembered the test script I made. (The script and
rebuilding didn’t even happen around the same time, so… I had no idea
they were related!!). If I had playerctl installed from the
beginning, I wouldn’t have gotten that error message (like at the top of
the post)
Apparently, somewhere during the Nix evaluation, the
test binary is called or something like that… Then I asked
myself, “Why is it calling my test script???”. Finally, it
hit me. THE test COMMAND EXISTS, I WAS PROBABLY OVERRIDING
IT!
To be honest, I have no idea how it was able to rebuild if the test command failed so it probably messed some of its conditional logic near the end. I could research it more in depth but probably is not worth it lol.
I changed my script’s name to something else and the problem went away.
Key takeaway: Don’t name your scripts after already existing
programs… I mean, it’s obvious, but when its done by accident you don’t
even notice (especially, because I almost didn’t remember that
test is the same as [, and of course, the use
of [ is way more common in scripts and stuff)