Originally posted by: Rusin
http://www.pcper.com/images/news/folding4.jpg
Green is GTX 280, red is HD3870, grey is PS3, and then there are typical CPU in far left.
Originally posted by: superbooga
Originally posted by: nitromullet
This is exactly why you don't have an effective 512 bit bus.
Using your example, say that maximum amount of problems that a 256 bit student can carry from the teacher's desk to their own in one trip is 50 problems. Between the two of them, the students are both carrying 100 problems, but combined they are still only solving the same 50 in one sitting. Now, if these kids were 512 bit, one of them could carry 100 problems at once and solve them in one sitting. This would be 50 additional problems that your two kids haven't even started working on yet.
Nitro, the students solve different problems at the same time.
Let's say a 256 bit student can solve 50 problems in ONE HOUR, and a 512 bit student can solve 100 problems in the same amount of time. If you have 2 256 bit students, then student A can solve the first 50 problems in one hour, and student B can solve the second 50 problems in one hour. You have 100 problems solved in one hour.
Perhaps are you confused by data duplication -- bus width has nothing do with this. Data duplication just means effective memory size, not bandwidth is limited. Stored data is duplicated, but accessed data is not. The data is stored at the teacher's desk, not at the student's desk. =)
Now the framebuffer from one of the GPUs has to be written to the framebuffer in the primary GPU, since only that GPU can output to the display. This uses bandwidth, but it is far less than the bandwidth used to render the image, unless we are talking about very high frame rates. This is one reason why SLI usually doesn't help if a single GPU is already getting 200 fps -- combining the frame buffers 200 times a second starts eating up bandwidth.
As an analogy, the school principal (CPU) creates all the problems, and uses the PCI-E bus to deliver the same set of 10000 problems to two teachers. Each teacher has one 256 bit student that can solve 50 problems in our hour. The students solve problems for 23 hours a day, then spend one hour to deliver solved problems from the second teacher's desk to the first teacher's desk.
So in 24 hours, two 256 bit students solve 2300 problems, while a single 512 bit student solves 2400 problems.
This example is ignoring the major issue of multi-GPU scaling. You don?t have two 256 bit lanes on a single crossbar memory controller, you have two separate GPUs, each with a separate 256 bit bus.Let's say a 256 bit student can solve 50 problems in ONE HOUR, and a 512 bit student can solve 100 problems in the same amount of time. If you have 2 256 bit students, then student A can solve the first 50 problems in one hour, and student B can solve the second 50 problems in one hour. You have 100 problems solved in one hour.
Originally posted by: superbooga
Originally posted by: nitromullet
This is exactly why you don't have an effective 512 bit bus.
Using your example, say that maximum amount of problems that a 256 bit student can carry from the teacher's desk to their own in one trip is 50 problems. Between the two of them, the students are both carrying 100 problems, but combined they are still only solving the same 50 in one sitting. Now, if these kids were 512 bit, one of them could carry 100 problems at once and solve them in one sitting. This would be 50 additional problems that your two kids haven't even started working on yet.
Nitro, the students solve different problems at the same time.
Let's say a 256 bit student can solve 50 problems in ONE HOUR, and a 512 bit student can solve 100 problems in the same amount of time. If you have 2 256 bit students, then student A can solve the first 50 problems in one hour, and student B can solve the second 50 problems in one hour. You have 100 problems solved in one hour.
Perhaps are you confused by data duplication -- bus width has nothing do with this. Data duplication just means effective memory size, not bandwidth is limited. Stored data is duplicated, but accessed data is not. The data is stored at the teacher's desk, not at the student's desk. =)
Now the framebuffer from one of the GPUs has to be written to the framebuffer in the primary GPU, since only that GPU can output to the display. This uses bandwidth, but it is far less than the bandwidth used to render the image, unless we are talking about very high frame rates. This is one reason why SLI usually doesn't help if a single GPU is already getting 200 fps -- combining the frame buffers 200 times a second starts eating up bandwidth.
As an analogy, the school principal (CPU) creates all the problems, and uses the PCI-E bus to deliver the same set of 10000 problems to two teachers. Each teacher has one 256 bit student that can solve 50 problems in our hour. The students solve problems for 23 hours a day, then spend one hour to deliver solved problems from the second teacher's desk to the first teacher's desk.
So in 24 hours, two 256 bit students solve 2300 problems, while a single 512 bit student solves 2400 problems.
Originally posted by: Wreckem
Any idea on how big the PCB is going to be?
Will we be able to fit the GTX 280 in most typical mid tower cases? Will it be around the same size as the 8800GTX?
Originally posted by: superbooga
Originally posted by: nitromullet
This is exactly why you don't have an effective 512 bit bus.
Using your example, say that maximum amount of problems that a 256 bit student can carry from the teacher's desk to their own in one trip is 50 problems. Between the two of them, the students are both carrying 100 problems, but combined they are still only solving the same 50 in one sitting. Now, if these kids were 512 bit, one of them could carry 100 problems at once and solve them in one sitting. This would be 50 additional problems that your two kids haven't even started working on yet.
Nitro, the students solve different problems at the same time.
Let's say a 256 bit student can solve 50 problems in ONE HOUR, and a 512 bit student can solve 100 problems in the same amount of time. If you have 2 256 bit students, then student A can solve the first 50 problems in one hour, and student B can solve the second 50 problems in one hour. You have 100 problems solved in one hour.
Perhaps are you confused by data duplication -- bus width has nothing do with this. Data duplication just means effective memory size, not bandwidth is limited. Stored data is duplicated, but accessed data is not. The data is stored at the teacher's desk, not at the student's desk. =)
Now the framebuffer from one of the GPUs has to be written to the framebuffer in the primary GPU, since only that GPU can output to the display. This uses bandwidth, but it is far less than the bandwidth used to render the image, unless we are talking about very high frame rates. This is one reason why SLI usually doesn't help if a single GPU is already getting 200 fps -- combining the frame buffers 200 times a second starts eating up bandwidth.
As an analogy, the school principal (CPU) creates all the problems, and uses the PCI-E bus to deliver the same set of 10000 problems to two teachers. Each teacher has one 256 bit student that can solve 50 problems in our hour. The students solve problems for 23 hours a day, then spend one hour to deliver solved problems from the second teacher's desk to the first teacher's desk.
So in 24 hours, two 256 bit students solve 2300 problems, while a single 512 bit student solves 2400 problems.
Originally posted by: nitromullet
You are mixing up bandwidth with processing power.
The width of the bus is the number of problems the students can carry from the teacher's desk to their own to work on, not the number of problems they can solve per hour. These are independent things, and is the reason why sometimes you will be bandwidth limited, and other times you will be limited by the raw power of the gpu itself.
Originally posted by: keysplayr2003
These two students still have to share 1 textbook.
Originally posted by: superbooga
Originally posted by: keysplayr2003
These two students still have to share 1 textbook.
Actually, it's two copies of the same textbook. One student reads the first chapter while the second student reads the second chapter, and then they exchange ideas (framebuffer).
Originally posted by: bryanW1995
keys, do you have any idea when they'll be releasing the nvidia f@h client? I haven't heard anything concrete yet on it.
Originally posted by: nitromullet
That is exactly why the it's not an effective 512 bit bus... both students are carrying the entire textbook, but only reading half. It's a theoretical 512 bit bus because in theory the bus carries two books, but effectively it carries the same book twice.
Originally posted by: nitromullet
That is exactly why the it's not an effective 1024 MB memory... It's a theoretical 512 MB memory because in theory the memory stores two books, but effectively it stores the same book twice.
Originally posted by: superbooga
Originally posted by: nitromullet
That is exactly why the it's not an effective 512 bit bus... both students are carrying the entire textbook, but only reading half. It's a theoretical 512 bit bus because in theory the bus carries two books, but effectively it carries the same book twice.
Poor nitro ... you're still confusing memory size with bandwidth. =)
Under your analogy that the bus "carries" the book, the two students don't carry the same book; but they get the books from two identical libraries. In other words, library = 512 MB of memory, carrying book = 256 x 2 bit bus.
In my analogy, the bus reads the book; it doesn't carry the book. Both students are required to carry the same book, but they don't READ the same chapters at the same time. In other words, book = 512 MB of memory, reading = 256 x 2 bit bus.
Originally posted by: nitromullet
That is exactly why the it's not an effective 1024 MB memory... It's a theoretical 512 MB memory because in theory the memory stores two books, but effectively it stores the same book twice.
This is correct.