I am planning to build a multipurpose home server. It will be a NAS, virtualization host, and have the typical selfhosted services. I want all of these services to have high uptime and be protected from power surges/balckouts, so I will put my server on a UPS.

I also want to run an LLM server on this machine, so I plan to add one or more GPUs and pass them through to a VM. I do not care about high uptime on the LLM server. However, this of course means that I will need a more powerful UPS, which I do not have the space for.

My plan is to get a second power supply to power only the GPUs. I do not want to put this PSU on the UPS. I will turn on the second PSU via an Add2PSU.

In the event of a blackout, this means that the base system will get full power and the GPUs will get power via the PCIe slot, but they will lose the power from the dedicated power plug.

Obviously this will slow down or kill the LLM server, but will this have an effect on the rest of the system?

  • @catloaf@lemm.ee
    link
    fedilink
    English
    16
    edit-2
    5 months ago

    PCIe is absolutely plug and play. Cards have been PnP since the ISA era. You probably meant hot-plug, but it’s hot-pluggable too: https://lwn.net/Articles/767885/

    Any buffered data will sit in the buffer, and eventually be dropped. Any data sent to the buffer while the buffer is full will be dropped. I’m not intimately familiar with communicating with GPUs, but I imagine the only buffers are in the GPU driver (which would either handle the removal or crash) or in the application (which would probably not handle the removal and just crash). Buffering is not really where I would expect to see a problem.

    That said, a GPU disappearing unexpectedly will probably crash your program, if not your whole OS. Physical damage is unlikely, though I definitely wouldn’t recommend connecting two PSUs to one system due to the potential for unexpected… well, potential. Inrush current wouldn’t really be my concern, since it would be pulling from the external PSU which should have plenty of capacity (and over-current protection too, I would hope). And it’s mostly a concern for AC systems, rarely for DC.

      • @catloaf@lemm.ee
        link
        fedilink
        English
        35 months ago

        Server PSUs are designed to be identical and work on parallel (though depending on platform, they can be configured as primary/hot spare, too). I’d be concerned about potential difference in power, especially with two non-matching PSUs. It would probably be fine, but not probably enough for me to trust my stuff to it. They’re just not designed or tested to operate like that, so they may behave unexpectedly.

    • @just_another_person@lemmy.world
      link
      fedilink
      English
      -35 months ago

      You are mistaking “plug and play” with “hot swap/plug CAPABLE”. The spec allows for specifically designed hardware to come and go, like Express card, Thunderbolt, or USB4 lane-assigned devices, for example. That’s a feature built for a specific type of hardware to tolerate things like accepting current, or having a carrier chip at least communicating with the PCIE bridge that designates it’s current status. Almost all of these types of devices are not only designed for this, they are powered by the hardware they are plugged into, allowing that power to be negotiated and controlled by the bridge.

      NOT like a giant GPU that requires it’s own power supply current and ground.

      But hey, you read it on the Internet and seem to think it’s possible. Go ahead and try it out with your hardware and see what happens.

      • @Omgpwnies@lemmy.world
        link
        fedilink
        English
        45 months ago

        Dude… you’re the one that said PCIE isn’t plug and play, which is incorrect. Plug and play simply means not having to manually assign IRQ/DMA/etc before using the peripheral, instead being handled automatically by the system/OS, as well as having peripherals identify themselves allowing the OS to automatically assign drivers. PCIE is fully plug-and-play compatible via ACPI, and hot swapping is supported by the protocol, if the peripheral also supports it.

        • @just_another_person@lemmy.world
          link
          fedilink
          English
          -55 months ago

          Again…it is not. You can’t just go and unplug swap anything anywhere into a PCIE slot. The protocol supports it, but it is not by any definition any sort of live swappable by default.

          My speedometer says 200, but my car does not go that fast.

          An egg isn’t an omelet.

          The statement “humans can fly” is technically true, but not without a plane.

          A device that supports hot swap into a compatible and specifically configured slot could be though.

          I can keep going forever with this.

          • @Omgpwnies@lemmy.world
            link
            fedilink
            English
            15 months ago

            Are you slow? nobody is arguing that you can hot swap a GPU. That’s not what people are correcting you on.

            YOU claimed that PCIE is not PLUG AND PLAY

            NO. PCIE is not plug and play.

            That was your comment. It was wrong. You were wrong.

      • @catloaf@lemm.ee
        link
        fedilink
        English
        15 months ago

        Right, it requires device support. And most GPUs won’t support it. But it’s by no means impossible.

        I’ve got some junk hardware at work, I’ll try next time I’m in and let you know.

      • @just_another_person@lemmy.world
        link
        fedilink
        English
        -35 months ago

        You have multiple accounts, and are sadly so consumed with Internet points, you used both of them to downvote when you’re won’t. You’re pathetic. Get a hobby. Maybe learning about hardware!