Discussion Compute requirement for Topaz video ai Starlight AI model require clould computing

Hulk · Feb 26, 2025

I don't know if any of you are using Topaz Labs Video AI, but the latest "Starlight" model is so large and compute hungry that it must be run in the cloud. I bought this software about 18 months ago knowing that it was essentially a beta product. I still have not actually used it to make a video look better. I've tried many times but never achieved acceptable results.

Topaz is now telling us instead of continuing to make smaller models that can be run on desktop systems with acceptable quality, they gave gone cloud based to get the best quality possible. Of course you have to buy tokens or credits or whatever to use this. Owners are generally quite upset because we paid for a "local" software and that's not what we got. Topaz tells us that when they optimize the model (someday) and/or desktop hardware gets better they can move Starlight back to desktops. There are still less powerful models available locally.

I am in contact with a Topaz support person and am telling them why not make Video ai a distributed computing application until it can be run locally. You would basically put your compute "out there" for others to use and earn credits for your own project. This way over days or weeks or even months you could leverage your credits instantly to "call in" your compute for your own projects.

I'm curious as to why AI has moved in more of a DC direction? Seems like a good application for it?

dr1337 · Feb 26, 2025

AI takes so much compute and ram that I am not sure it would actually be worth it. Imagine if there are 100 people running topaz-at-home and they all have 16gb GPUs. 1.6TB of Vram does sound like a lot but as soon as you consider a model takes like 200gb, you've only got 8 effective clusters at that point. So if more than 8 people want to prompt around the same time you're gonna see slowdowns or annoying-ish wait times. IRL it probably is much worse than this and idk if you could even really scale across various sized GPUs easily without tons of custom sized models.

Also the sheer bandwidth needed is so high that running the program would definitely have any gigabit ethernet pegged on both upload and download, along side your GPU running at 100%. And considering internet latency there actually might be too much latency for decent performance, as in the models will be more like CPU speeds.

It sounds like a lot of money/time they'd have to spend on developing all these networking technologies and especially making sure they're secure. For not much reward compared to the cloud based systems they've probably already been using to train the AIs the whole time.

Hulk · Feb 26, 2025

I'm not well-versed in these server type systems but this is what the rep told me they are using to run it. How many client based PC's (about) would this level of compute require?

"Currently, Project Starlight runs off of 6 8xh100 machines interconnected with infinitiband and a single h100 GPU is 80GB of VRAM."

yottabit · Feb 27, 2025

Hulk said:
I'm not well-versed in these server type systems but this is what the rep told me they are using to run it. How many client based PC's (about) would this level of compute require?

"Currently, Project Starlight runs off of 6 8xh100 machines interconnected with infinitiband and a single h100 GPU is 80GB of VRAM."

Would never happen. Sounds like the best you could hope for is they provided a docker compute instance or something so you could have the option to orchestrate or rent your own compute rather than being beholden to their license/credit scheme.

That being said I agree it sucks they pulled a 180 on the terms. Hopefully they are true to their word and release a more desktop friendly model later.

It’s probably a necessary direction for them to pivot because there are a lot of desktop friendly (< 24 GB VRAM) video models coming out right now.

soresu · Feb 27, 2025

IMHO this makes the kind of open source optimisation work that DeepSeek is doing for AI so important, because companies are far too willing to just throw moar compute at the problem rather than getting their hands dirty.

igor_kavinski · Feb 27, 2025

Hulk said:
How many client based PC's (about) would this level of compute require?

Main issue would be the bandwidth. Even if everyone involved in the distributed compute had gigabit internet speeds, the problem being worked on would have to be broken into many small parts, then those parts sent to different PCs and then those PCs' GPU's would have to communicate back and forth on how to work together to solve the problem, all over high latency high traffic internet pipes. The communication would slow things down to a crawl. There wouldn't be much use in it if a full result came back in several days or even weeks, assuming everyone involved is dedicated to keeping their PCs on the entire time.

Hulk · Feb 27, 2025

igor_kavinski said:
Main issue would be the bandwidth. Even if everyone involved in the distributed compute had gigabit internet speeds, the problem being worked on would have to be broken into many small parts, then those parts sent to different PCs and then those PCs' GPU's would have to communicate back and forth on how to work together to solve the problem, all over high latency high traffic internet pipes. The communication would slow things down to a crawl. There wouldn't be much use in it if a full result came back in several days or even weeks, assuming everyone involved is dedicated to keeping their PCs on the entire time.

That's exactly what I thought when this idea came to mind. Then I Googled if AI is suitable for DC and of course the AI said, "Yes! Of course it is." I should have realized the AI didn't know what it was talking about.

Hulk · Feb 27, 2025

I have some additional data and it's kind of mind blowing. It takes those 48 h100GPU's 20 minutes to render 10 seconds of video! That's 0.25fps. Or 1 rendered frame every 4 seconds.

Now I wonder if you had each client computer render 1 frame if this could work? Could a client computer render 1 frame in 20 minutes? If so they you'd need 300 client computers to equal the performance of the Topaz server.

yottabit · Feb 27, 2025

Hulk said:
I have some additional data and it's kind of mind blowing. It takes those 48 h100GPU's 20 minutes to render 10 seconds of video! That's 0.25fps. Or 1 rendered frame every 4 seconds.

Now I wonder if you had each client computer render 1 frame if this could work? Could a client computer render 1 frame in 20 minutes? If so they you'd need 300 client computers to equal the performance of the Topaz server.

It depends on if they need those 48 GPUs for the CUDA cores or the memory. If they need that much RAM to fit the model then you’ll never be able to run a single frame on a consumer client. That’s up to 3.8 TB of HBM memory if the H100 are 80 GB each,

I also would be really shocked if they aren’t using temporal data such as motion vectors in their model. This helps with anti aliasing (technique utilized by DLSS) and preserving frame to frame consistency. Splitting it up into too many separate jobs could then hurt consistency.

Hulk · Feb 27, 2025

yottabit said:
It depends on if they need those 48 GPUs for the CUDA cores or the memory. If they need that much RAM to fit the model then you’ll never be able to run a single frame on a consumer client. That’s up to 3.8 TB of HBM memory if the H100 are 80 GB each,

I also would be really shocked if they aren’t using temporal data such as motion vectors in their model. This helps with anti aliasing (technique utilized by DLSS) and preserving frame to frame consistency. Splitting it up into too many separate jobs could then hurt consistency.

I had thought about the interframe (temporal) encoding aspect as well. But I'm thinking they are not based on the enormous encoding time. I have a gut feeling each frame is a picture unto itself. But on second thought they could use temporal analysis to generate B frames. But either way the I frames would have to be generated from scratch and perhaps that could be on one a single client?

I posted the idea on the "Ideas" forum over at Topaz Labs. I'm sure it'll get blown into a million small pieces in a few hours!

IEC · Feb 27, 2025

Hulk said:
I have some additional data and it's kind of mind blowing. It takes those 48 h100GPU's 20 minutes to render 10 seconds of video! That's 0.25fps. Or 1 rendered frame every 4 seconds.

Now I wonder if you had each client computer render 1 frame if this could work? Could a client computer render 1 frame in 20 minutes? If so they you'd need 300 client computers to equal the performance of the Topaz server.

That's a colossally inefficient and wasteful use of compute, then.

Hulk · Feb 27, 2025

IEC said:
That's a colossally inefficient and wasteful use of compute, then.

I'm starting to feel the same way. Even with Starlight I'm seeing "monster faces."
This more like alpha software at this point.
Just feel kind of ripped off for buying the software last year even though at the time I knew I was buying something that was in development. I just didn't know it would move to the cloud and I've have to buy credits to use it!

But yes, the juice ain't worth the squeeze.

igor_kavinski · Feb 27, 2025

AI is getting better but most of the incredible stuff they show off is manually tweaked with dozens or hundreds of man hours involved.

Hulk · Feb 27, 2025

igor_kavinski said:
AI is getting better but most of the incredible stuff they show off is manually tweaked with dozens or hundreds of man hours involved.

Yes exactly. Sometimes a little good old human cleverness is worth 10,000 GPU's and 10,000kWhrs of electricity.

As a mathematical analogy I look at it like this. Suppose you want to find the root for an algebraic expression with no closed form solution. The most dumb and computationally expensive way to do it would be an incremental search. Start at some number you know is below the actual value, or guess is, and then start trying numbers to find solutions (zeros). That the AI way.

Or be smart like Newton and Raphson and use their tangent slope method to quickly hone in on solutions. That's the smart human way.

There ARE plenty of great uses for AI and all of this research is important in figuring that out. It's just that getting there is expensive both in terms of hardware and power.

It's like all of these giant tech corps are rushing to build the biggest, smartest AI system on the planet so they can be the first to ask it the meaning of life, the universe, and everything.

And of course we already know that is 42.

Hulk · Mar 5, 2025

This is pretty funny. Someone over at the Topaz Labs forum jokingly reponded that this is the Topaz Labs server grade hardware running their clould based AI rendering.

igor_kavinski · Mar 5, 2025

Hulk said:
View attachment 118974

Looks like something our company's IT guy would do. Currently, he has a 4U server ON his desk, making that lovely noise anyone would love /s

soresu · Mar 5, 2025

Hulk said:
I had thought about the interframe (temporal) encoding aspect as well. But I'm thinking they are not based on the enormous encoding time. I have a gut feeling each frame is a picture unto itself. But on second thought they could use temporal analysis to generate B frames. But either way the I frames would have to be generated from scratch and perhaps that could be on one a single client?

So you are approaching it from a perspective of video encoding/transcoding rather than classical upscaling?

Interesting idea.

Sort of like neuro symbolic AI hybrid models - but replacing the typical symbolic parts with the framework of a video codec 🤔

(I haunt 2 different AV1 discord servers and occasionally Doom9 forums btw, no professional knowledge of codec design, just a couple of decades reading other people writing about them)

Hulk · Mar 5, 2025

soresu said:
So you are approaching it from a perspective of video encoding/transcoding rather than classical upscaling?

Interesting idea.

Sort of like neuro symbolic AI hybrid models - but replacing the typical symbolic parts with the framework of a video codec 🤔

(I haunt 2 different AV1 discord servers and occasionally Doom9 forums btw, no professional knowledge of codec design, just a couple of decades reading other people writing about them)

I know just enough to get myself in trouble. Had a write a chapter in a book about encoding years ago. What to do? You know, do what you gotta do to get paid. Buy a book, read a book, write a book.

Now that I think about this AI video enhancing some more, I think there are two ways they could be doing it. First, just generate every frame using AI. Brute force.

Or you might be able to use temporal encoding logic to create a B frame between two I frames. Pretty much straight interpolation and you don't need the actual frame you are creating. You might be able to push it to a couple frames if the content was relatively static

P frames on the other hand would be "difference" frame mathematically represented by the actual frame subtracted the frame before it "moved ahead" in time 1 frame. You'd need that original reference frame to create that one.

At the end of the day I think Topaz Video AI is generating each frame "from scratch" but may be smart enough to look at the previous frame and say, "Oh, that's that car from the last frame, I know already know how to creat it." But who really knows what the heck is going on inside an AI inference engine? It's constantly rewiring itself as it "learns."

soresu · Mar 5, 2025

Hulk said:
I know just enough to get myself in trouble. Had a write a chapter in a book about encoding years ago. What to do? You know, do what you gotta do to get paid. Buy a book, read a book, write a book.

Now that I think about this AI video enhancing some more, I think there are two ways they could be doing it. First, just generate every frame using AI. Brute force.

Or you might be able to use temporal encoding logic to create a B frame between two I frames. Pretty much straight interpolation and you don't need the actual frame you are creating. You might be able to push it to a couple frames if the content was relatively static

P frames on the other hand would be "difference" frame mathematically represented by the actual frame subtracted the frame before it "moved ahead" in time 1 frame. You'd need that original reference frame to create that one.

At the end of the day I think Topaz Video AI is generating each frame "from scratch" but may be smart enough to look at the previous frame and say, "Oh, that's that car from the last frame, I know already know how to creat it." But who really knows what the heck is going on inside an AI inference engine? It's constantly rewiring itself as it "learns."

People wonder at AI/ML, but IMHO modern video codecs are insanely complicated and endlessly impressive workhorses of the IT world when you consider how much information is being compressed to get a 4K movie to fit in roughly the capacity of a dual layer DVD.

That's just the base codecs too, once you get into psychovisual optimisations it's a whole extra layer of complexity again.

They always fascinated me from the moment I started following x264 development on Doom9 in the 00s.

Hulk · Mar 5, 2025

I remember in the late '90's when I first became interested in digital video asking some experts on a forum when we'd be able to edit MPEG II on the desktop without dedicated hardware? The unanimous answer was something like in 10 years to never. Of course a few years later we were easily handling MPEG II on CPU alone.

About 5 years after that I predicted on this forum that tape will soon be gone and we'll be using flash for storage in cameras. Everyone exploded on me. Never happen. It's too expensive. What about long term storage. You'll have to pry tape out of my cold dead hands. Of course about 3 or 4 years later tape was gone and everybody was quite happy. Old habits/thoughts die hard.

igor_kavinski · Mar 5, 2025

Hulk said:
Of course a few years later we were easily handling MPEG II on CPU alone.

About 5 years after that I predicted on this forum that tape will soon be gone and we'll be using flash for storage in cameras.

You are definitely this

than this:

poke01 · Mar 5, 2025

Hulk said:
Currently, Project Starlight runs off of 6 8xh100 machines interconnected with infinitiband and a single h100 GPU is 80GB of VRAM."

With the release of today’s Mac Studio, it’s doable if you take loan.
You would need 8 512GB Mac Studio M3 Ultras. They cost ~$10K each.

So it would cost you $80K to match 48 h100 VRAM. You would have connect them via thunderbolt 5.

It’s cheaper than what 48 H100s cost.

DrMrLordX · Mar 6, 2025

poke01 said:
With the release of today’s Mac Studio, it’s doable if you take loan.
You would need 8 512GB Mac Studio M3 Ultras. They cost ~$10K each.

You can get old EPYCs or Xeons with 512GB or RAM or more. It won't be fast, but you can load pretty much any size model you want.

soresu · Mar 6, 2025

Hulk said:
I remember in the late '90's when I first became interested in digital video asking some experts on a forum when we'd be able to edit MPEG II on the desktop without dedicated hardware? The unanimous answer was something like in 10 years to never. Of course a few years later we were easily handling MPEG II on CPU alone.

Ah, the heady days of Dennard scaling, le sigh 😭

Hulk · Mar 6, 2025

IIFC it was the PII300 that was the first CPU to be able to playback 480i MPEG II video. Forget about editing, just playing back. Despite our constant demand for more compute we have come a long way.

Discussion Compute requirement for Topaz video ai Starlight AI model require clould computing

Diamond Member

Senior member

Diamond Member

Golden Member

Diamond Member

Lifer

Diamond Member

Diamond Member

Golden Member

Diamond Member

Elite Member

Diamond Member

Lifer

Diamond Member

Diamond Member

Lifer

Diamond Member

Diamond Member

Diamond Member

Diamond Member

Lifer

Diamond Member

Lifer

Diamond Member

Diamond Member