How would you build a GPU-heavy node?

Scrubbles · 2 years ago

How would you build a GPU-heavy node?

@towerful@programming.dev · edit-2 2 years ago

If you are doing high bandwidth GPU work, then PCIe lanes of consumer CPUs are going to be the bottleneck, as they generally only support 16 lanes.
Then there are the threadrippers, xeons and all the server/professional class CPUs that will do 40+ lanes of PCIe.

A lane of PCIe3.0 is about 1GBps (Byte not bit).
So, if you know your workload and bandwidth requirements, then you can work from that.
If you don’t need full 16 lanes per GPU, then a motherboard that supports bifurcation will allow you to run 4 GPUs with 4 lanes each from a CPU that has 16 lanes if PCIe. That’s 4GBps per GPU, or 32Gbps.
If it’s just for transcoding, and you are running into limitations of consumer GPUs (which I think are limited to 3 simultaneous streams), you could get a pro/server GPU like the Nvidia quadros, which have a certain amount of resources but are unlimited in the number of streams it can process (so, it might be able to do 300 FPS of 1080p. If your content is 1080p 30fps, that’s 10 streams). From that, you can work out bandwidth requirements, and see if you need more than 4 lanes per GPU.

I’m not sure what’s required for AI. I feel like it is similar to crypto mining, massive compute but relatively small amounts of data.

Ultimately, if you think your workload can consume more than 4 lanes per GPU, then you have to think about where that data is coming from. If it’s coming from disk, then you are going to need raid0 NVMe storage which will take up additional PCIe lanes.

@grue@lemmy.world · 2 years ago

I’m not sure what’s required for AI. I feel like it is similar to crypto mining, massive compute but relatively small amounts of data.

If you’re talking about training models, I think it requires both massive compute and massive amounts of data.

Ielisa · 2 years ago

Nvidia transcode limit is 5 for consumer GPUs these days, and its very easy to lift that limit if you need with https://github.com/keylase/nvidia-patch

@towerful@programming.dev · 2 years ago

5? Holy heck, that’s amazing. I remember helping people that had built streaming rigs to use during the pandemic, and wondering why their production was stuttering and having issues with a bunch remote callers. Some of that work ended up being CPU bound.
Although, looks like that patch is for Linux? Not much use if your running vmix or some other windows-only software.
In OPs case, however, that’s not a problem

Ielisa · 2 years ago

I think you can get it to work with windows somehow , but I’ve never needed to try: https://github.com/keylase/nvidia-patch/issues/520