nVidia GT200 Series Thread

Page 9 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

happy medium

Lifer
Jun 8, 2003
14,387
480
126
Originally posted by: superbooga
Originally posted by: nitromullet
This is exactly why you don't have an effective 512 bit bus.

Using your example, say that maximum amount of problems that a 256 bit student can carry from the teacher's desk to their own in one trip is 50 problems. Between the two of them, the students are both carrying 100 problems, but combined they are still only solving the same 50 in one sitting. Now, if these kids were 512 bit, one of them could carry 100 problems at once and solve them in one sitting. This would be 50 additional problems that your two kids haven't even started working on yet.

Nitro, the students solve different problems at the same time.

Let's say a 256 bit student can solve 50 problems in ONE HOUR, and a 512 bit student can solve 100 problems in the same amount of time. If you have 2 256 bit students, then student A can solve the first 50 problems in one hour, and student B can solve the second 50 problems in one hour. You have 100 problems solved in one hour.

Perhaps are you confused by data duplication -- bus width has nothing do with this. Data duplication just means effective memory size, not bandwidth is limited. Stored data is duplicated, but accessed data is not. The data is stored at the teacher's desk, not at the student's desk. =)

Now the framebuffer from one of the GPUs has to be written to the framebuffer in the primary GPU, since only that GPU can output to the display. This uses bandwidth, but it is far less than the bandwidth used to render the image, unless we are talking about very high frame rates. This is one reason why SLI usually doesn't help if a single GPU is already getting 200 fps -- combining the frame buffers 200 times a second starts eating up bandwidth.

As an analogy, the school principal (CPU) creates all the problems, and uses the PCI-E bus to deliver the same set of 10000 problems to two teachers. Each teacher has one 256 bit student that can solve 50 problems in our hour. The students solve problems for 23 hours a day, then spend one hour to deliver solved problems from the second teacher's desk to the first teacher's desk.

So in 24 hours, two 256 bit students solve 2300 problems, while a single 512 bit student solves 2400 problems.

Works for me.

 

BFG10K

Lifer
Aug 14, 2000
22,709
3,000
126
Let's say a 256 bit student can solve 50 problems in ONE HOUR, and a 512 bit student can solve 100 problems in the same amount of time. If you have 2 256 bit students, then student A can solve the first 50 problems in one hour, and student B can solve the second 50 problems in one hour. You have 100 problems solved in one hour.
This example is ignoring the major issue of multi-GPU scaling. You don?t have two 256 bit lanes on a single crossbar memory controller, you have two separate GPUs, each with a separate 256 bit bus.

That bandwidth is not shared because it?s not a global pool of memory with two busses tied to it.

The only way to ?share? it is by attaining perfect multi-GPU scaling which we know never happens. Even the best AFR scaling usually tops out at an 80% or 90% performance gain over a single GPU.

For this reason combining multi-GPU stats will not have the same performance as a single-GPU with the same stats, or in this case two 256 bit GPUs against one 512 bit GPU.

It?s like claiming a dual-core 3 GHz is 6 GHz effective. The former can approach ?6 GHz? but only if you?re running code that constantly loads both cores at 100% and has absolutely no interdependencies. In any other situation the single 6 GHz core will be better.
 

Wreckem

Diamond Member
Sep 23, 2006
9,535
1,100
126
Any idea on how big the PCB is going to be?

Will we be able to fit the GTX 280 in most typical mid tower cases? Will it be around the same size as the 8800GTX?
 

taltamir

Lifer
Mar 21, 2004
13,576
6
76
that school example has very little to do with reality, or video cards.

I also expect the GX2 to be demolished, even the 260 should annihilate it, and the 280 should just be a beast of pure ownage.

Now the interesting bit, both of those can be tri slied... maybe we will have maxed out crysis benches coming in....
 

nitromullet

Diamond Member
Jan 7, 2004
9,031
36
91
Originally posted by: superbooga
Originally posted by: nitromullet
This is exactly why you don't have an effective 512 bit bus.

Using your example, say that maximum amount of problems that a 256 bit student can carry from the teacher's desk to their own in one trip is 50 problems. Between the two of them, the students are both carrying 100 problems, but combined they are still only solving the same 50 in one sitting. Now, if these kids were 512 bit, one of them could carry 100 problems at once and solve them in one sitting. This would be 50 additional problems that your two kids haven't even started working on yet.

Nitro, the students solve different problems at the same time.

Let's say a 256 bit student can solve 50 problems in ONE HOUR, and a 512 bit student can solve 100 problems in the same amount of time. If you have 2 256 bit students, then student A can solve the first 50 problems in one hour, and student B can solve the second 50 problems in one hour. You have 100 problems solved in one hour.

Perhaps are you confused by data duplication -- bus width has nothing do with this. Data duplication just means effective memory size, not bandwidth is limited. Stored data is duplicated, but accessed data is not. The data is stored at the teacher's desk, not at the student's desk. =)

Now the framebuffer from one of the GPUs has to be written to the framebuffer in the primary GPU, since only that GPU can output to the display. This uses bandwidth, but it is far less than the bandwidth used to render the image, unless we are talking about very high frame rates. This is one reason why SLI usually doesn't help if a single GPU is already getting 200 fps -- combining the frame buffers 200 times a second starts eating up bandwidth.

As an analogy, the school principal (CPU) creates all the problems, and uses the PCI-E bus to deliver the same set of 10000 problems to two teachers. Each teacher has one 256 bit student that can solve 50 problems in our hour. The students solve problems for 23 hours a day, then spend one hour to deliver solved problems from the second teacher's desk to the first teacher's desk.

So in 24 hours, two 256 bit students solve 2300 problems, while a single 512 bit student solves 2400 problems.

You are mixing up bandwidth with processing power.

The width of the bus is the number of problems the students can carry from the teacher's desk to their own to work on, not the number of problems they can solve per hour. These are independent things, and is the reason why sometimes you will be bandwidth limited, and other times you will be limited by the raw power of the gpu itself.
 

Foxery

Golden Member
Jan 24, 2008
1,709
0
0
Originally posted by: Wreckem
Any idea on how big the PCB is going to be?

Will we be able to fit the GTX 280 in most typical mid tower cases? Will it be around the same size as the 8800GTX?

Manufacturers know how large our cases are. I wouldn't worry about it.

Announcing the Folding@Home client is nice marketing strategy.
 

Keysplayr

Elite Member
Jan 16, 2003
21,209
50
91
Originally posted by: superbooga
Originally posted by: nitromullet
This is exactly why you don't have an effective 512 bit bus.

Using your example, say that maximum amount of problems that a 256 bit student can carry from the teacher's desk to their own in one trip is 50 problems. Between the two of them, the students are both carrying 100 problems, but combined they are still only solving the same 50 in one sitting. Now, if these kids were 512 bit, one of them could carry 100 problems at once and solve them in one sitting. This would be 50 additional problems that your two kids haven't even started working on yet.

Nitro, the students solve different problems at the same time.

Let's say a 256 bit student can solve 50 problems in ONE HOUR, and a 512 bit student can solve 100 problems in the same amount of time. If you have 2 256 bit students, then student A can solve the first 50 problems in one hour, and student B can solve the second 50 problems in one hour. You have 100 problems solved in one hour.

Perhaps are you confused by data duplication -- bus width has nothing do with this. Data duplication just means effective memory size, not bandwidth is limited. Stored data is duplicated, but accessed data is not. The data is stored at the teacher's desk, not at the student's desk. =)

Now the framebuffer from one of the GPUs has to be written to the framebuffer in the primary GPU, since only that GPU can output to the display. This uses bandwidth, but it is far less than the bandwidth used to render the image, unless we are talking about very high frame rates. This is one reason why SLI usually doesn't help if a single GPU is already getting 200 fps -- combining the frame buffers 200 times a second starts eating up bandwidth.

As an analogy, the school principal (CPU) creates all the problems, and uses the PCI-E bus to deliver the same set of 10000 problems to two teachers. Each teacher has one 256 bit student that can solve 50 problems in our hour. The students solve problems for 23 hours a day, then spend one hour to deliver solved problems from the second teacher's desk to the first teacher's desk.

So in 24 hours, two 256 bit students solve 2300 problems, while a single 512 bit student solves 2400 problems.

These two students still have to share 1 textbook.
You may get 100% scaling on few occasions, but for the most part, 50 to 80% is more inline with reality.
 

superbooga

Senior member
Jun 16, 2001
333
0
0
Originally posted by: nitromullet
You are mixing up bandwidth with processing power.

The width of the bus is the number of problems the students can carry from the teacher's desk to their own to work on, not the number of problems they can solve per hour. These are independent things, and is the reason why sometimes you will be bandwidth limited, and other times you will be limited by the raw power of the gpu itself.

The original argument was whether 9800GX2 has an effective 256 bit or 512 bit bus. It has an effective 512 bit bus.

Bandwidth is a type of throughput, and so is processing power, which we are leaving out in this discussion.

So fine, let's change it to so that it takes 10 seconds for the students to CARRY 50 problems to their desk. And we'll leave processing power out of this equation by saying the problems are solved infinitely fast. The stack of 10,000 problems is the memory, and the students are the bus. The students are carrying different problems. The GPUs render different parts of the final image.

I've already tried to make it very clear that the concept of stored data is different from accessed data.

If anything, memory bandwidth is more likely be under-utilized under SLI.
 

superbooga

Senior member
Jun 16, 2001
333
0
0
Originally posted by: keysplayr2003
These two students still have to share 1 textbook.

Actually, it's two copies of the same textbook. One student reads the first chapter while the second student reads the second chapter, and then they exchange ideas (framebuffer).
 

Keysplayr

Elite Member
Jan 16, 2003
21,209
50
91
That makes little difference. Ever exchange ideas with someone over lunch? Takes a while.
J/K. hehe
 

Jumpem

Lifer
Sep 21, 2000
10,757
3
81
Waiting on the GTX 280. With a 8800GTX, I am only getting 18-28fps in AoC with everything high at 1920x1200.
 

bryanW1995

Lifer
May 22, 2007
11,144
32
91
keys, do you have any idea when they'll be releasing the nvidia f@h client? I haven't heard anything concrete yet on it.
 

nitromullet

Diamond Member
Jan 7, 2004
9,031
36
91
Originally posted by: superbooga
Originally posted by: keysplayr2003
These two students still have to share 1 textbook.

Actually, it's two copies of the same textbook. One student reads the first chapter while the second student reads the second chapter, and then they exchange ideas (framebuffer).

That is exactly why the it's not an effective 512 bit bus... both students are carrying the entire textbook, but only reading half. It's a theoretical 512 bit bus because in theory the bus carries two books, but effectively it carries the same book twice.
 

Keysplayr

Elite Member
Jan 16, 2003
21,209
50
91
Originally posted by: bryanW1995
keys, do you have any idea when they'll be releasing the nvidia f@h client? I haven't heard anything concrete yet on it.

Basically, the F&H gents would have to download CUDA SDK and port their executable to run on Nvidia 8 series and beyond GPU's. I don't know if there are any roadblocks to this solution however, and I do not know the current status of the F&H client. I would be happy to ask about it for you though.

Keys
 

superbooga

Senior member
Jun 16, 2001
333
0
0
Originally posted by: nitromullet
That is exactly why the it's not an effective 512 bit bus... both students are carrying the entire textbook, but only reading half. It's a theoretical 512 bit bus because in theory the bus carries two books, but effectively it carries the same book twice.

Poor nitro ... you're still confusing memory size with bandwidth. =)

Under your analogy that the bus "carries" the book, the two students don't carry the same book; but they get the books from two identical libraries. In other words, library = 512 MB of memory, carrying book = 256 x 2 bit bus.

In my analogy, the bus reads the book; it doesn't carry the book. Both students are required to carry the same book, but they don't READ the same chapters at the same time. In other words, book = 512 MB of memory, reading = 256 x 2 bit bus.

Originally posted by: nitromullet
That is exactly why the it's not an effective 1024 MB memory... It's a theoretical 512 MB memory because in theory the memory stores two books, but effectively it stores the same book twice.

This is correct.
 

nitromullet

Diamond Member
Jan 7, 2004
9,031
36
91
Originally posted by: superbooga
Originally posted by: nitromullet
That is exactly why the it's not an effective 512 bit bus... both students are carrying the entire textbook, but only reading half. It's a theoretical 512 bit bus because in theory the bus carries two books, but effectively it carries the same book twice.

Poor nitro ... you're still confusing memory size with bandwidth. =)

Under your analogy that the bus "carries" the book, the two students don't carry the same book; but they get the books from two identical libraries. In other words, library = 512 MB of memory, carrying book = 256 x 2 bit bus.

In my analogy, the bus reads the book; it doesn't carry the book. Both students are required to carry the same book, but they don't READ the same chapters at the same time. In other words, book = 512 MB of memory, reading = 256 x 2 bit bus.

Originally posted by: nitromullet
That is exactly why the it's not an effective 1024 MB memory... It's a theoretical 512 MB memory because in theory the memory stores two books, but effectively it stores the same book twice.

This is correct.

Actually, I'm not confused at all...

I've owned an 8800GTX with a 384-bit bus and a 9800GX2 with dual 256-bit buses, and in bandwidth intensive situations such as WoW at 1920x1200 with 8x super sampling transparency AA the 8800GTX wins. In this situation the GX2 can crawl along at fps in the teens at times, but the core temp barely rises above idle because the gpus are completely bandwidth starved by their two tiny 256 bit buses. Whereas in shader intensive situations, such as Crysis with 0xAA the GX2 wins. The reason: because the 8800GTX has a larger effective memory bus than the 9800GX2 but the GX2 with dual gpus has more raw gpu computational power than the GTX.
 

Kuzi

Senior member
Sep 16, 2007
572
0
0
Isn't the data being "duplicated" in both 512MB memory pools? If that is the case, then we have less memory, 512MB (instead of 1024MB) and "less" bandwidth. Since data is being duplicated, then we are wasting bandwidth.

So the GX2 is only 256bits or a little more, but not 512bits. That's how I understand it, I could be wrong.
 
sale-70-410-exam    | Exam-200-125-pdf    | we-sale-70-410-exam    | hot-sale-70-410-exam    | Latest-exam-700-603-Dumps    | Dumps-98-363-exams-date    | Certs-200-125-date    | Dumps-300-075-exams-date    | hot-sale-book-C8010-726-book    | Hot-Sale-200-310-Exam    | Exam-Description-200-310-dumps?    | hot-sale-book-200-125-book    | Latest-Updated-300-209-Exam    | Dumps-210-260-exams-date    | Download-200-125-Exam-PDF    | Exam-Description-300-101-dumps    | Certs-300-101-date    | Hot-Sale-300-075-Exam    | Latest-exam-200-125-Dumps    | Exam-Description-200-125-dumps    | Latest-Updated-300-075-Exam    | hot-sale-book-210-260-book    | Dumps-200-901-exams-date    | Certs-200-901-date    | Latest-exam-1Z0-062-Dumps    | Hot-Sale-1Z0-062-Exam    | Certs-CSSLP-date    | 100%-Pass-70-383-Exams    | Latest-JN0-360-real-exam-questions    | 100%-Pass-4A0-100-Real-Exam-Questions    | Dumps-300-135-exams-date    | Passed-200-105-Tech-Exams    | Latest-Updated-200-310-Exam    | Download-300-070-Exam-PDF    | Hot-Sale-JN0-360-Exam    | 100%-Pass-JN0-360-Exams    | 100%-Pass-JN0-360-Real-Exam-Questions    | Dumps-JN0-360-exams-date    | Exam-Description-1Z0-876-dumps    | Latest-exam-1Z0-876-Dumps    | Dumps-HPE0-Y53-exams-date    | 2017-Latest-HPE0-Y53-Exam    | 100%-Pass-HPE0-Y53-Real-Exam-Questions    | Pass-4A0-100-Exam    | Latest-4A0-100-Questions    | Dumps-98-365-exams-date    | 2017-Latest-98-365-Exam    | 100%-Pass-VCS-254-Exams    | 2017-Latest-VCS-273-Exam    | Dumps-200-355-exams-date    | 2017-Latest-300-320-Exam    | Pass-300-101-Exam    | 100%-Pass-300-115-Exams    |
http://www.portvapes.co.uk/    | http://www.portvapes.co.uk/    |