News Finnish start-up proposes GPU like CPU

marees

Senior member
Apr 28, 2024
578
639
96
Startup Says It Can Make a 100x Faster CPU

Flow Computing aims to boost central processing units with their ‘parallel processing units’​

Instead of trying to speed up computation by putting 16 identical CPU cores into, say, a laptop, a manufacturer could put 4 standard CPU cores and 64 of Flow Computing’s so-called parallel processing unit (PPU) cores into the same footprint, and achieve up to 100 times better performance.

Timo Valtonen, CEO and co-founder of Finland-based Flow Computing. and his collaborators laid out their case at the IEEE Hot Chips conference in August.

The PPU provides a speed-up in cases where the computing task is parallelizable, but a traditional CPU isn’t well equipped to take advantage of that parallelism, yet offloading to something like a GPU would be too costly.


To demonstrate the power of the PPU, Forsell and his collaborators built a proof-of-concept FPGA implementation of their design. The team says that the FPGA performed identically to their simulator, demonstrating that the PPU is functioning as expected. The team performed several comparison studies between their PPU design and existing CPUS. “Up to 100x [improvement] was reached in our preliminary performance comparisons assuming that there would be a silicon implementation of a Flow PPU running at the same speed as one of the compared commercial processors and using our microarchitecture,” Flow Computing CTO and co-founder Martti Forsell says.

Now, the team is working on a compiler for their PPU, as well as looking for partners in the CPU production space. They are hoping that a large CPU manufacturer will be interested in their product, so that they could work on a co-design. Their PPU can be implemented with any instruction set architecture, so any CPU can be potentially upgraded.

 
Reactions: Nothingness

jdubs03

Golden Member
Oct 1, 2013
1,079
746
136

soresu

Diamond Member
Dec 19, 2014
3,323
2,599
136
Startup Says It Can Make a 100x Faster CPU

Flow Computing aims to boost central processing units with their ‘parallel processing units’​

Instead of trying to speed up computation by putting 16 identical CPU cores into, say, a laptop, a manufacturer could put 4 standard CPU cores and 64 of Flow Computing’s so-called parallel processing unit (PPU) cores into the same footprint, and achieve up to 100 times better performance.

Timo Valtonen, CEO and co-founder of Finland-based Flow Computing. and his collaborators laid out their case at the IEEE Hot Chips conference in August.

The PPU provides a speed-up in cases where the computing task is parallelizable, but a traditional CPU isn’t well equipped to take advantage of that parallelism, yet offloading to something like a GPU would be too costly.


To demonstrate the power of the PPU, Forsell and his collaborators built a proof-of-concept FPGA implementation of their design. The team says that the FPGA performed identically to their simulator, demonstrating that the PPU is functioning as expected. The team performed several comparison studies between their PPU design and existing CPUS. “Up to 100x [improvement] was reached in our preliminary performance comparisons assuming that there would be a silicon implementation of a Flow PPU running at the same speed as one of the compared commercial processors and using our microarchitecture,” Flow Computing CTO and co-founder Martti Forsell says.

Now, the team is working on a compiler for their PPU, as well as looking for partners in the CPU production space. They are hoping that a large CPU manufacturer will be interested in their product, so that they could work on a co-design. Their PPU can be implemented with any instruction set architecture, so any CPU can be potentially upgraded.

It sounds a lot like EDGE ISA that was designed to use multiple ALU units and compose them into larger 'hyperblocks' as needed.

The idea was to reach teraFLOP level compute on CPU, but that was before GPUs started reaching that goal line.

I have my doubts about this.

As others have commented elsewhere it feels like Soft Machines VISC.

aka tech acquisition bait.
 

coercitiv

Diamond Member
Jan 24, 2014
6,759
14,682
136
One thing to note, if the presentation lacks "AI" even as reference to compiler tricks, then they may still try to do something worthwhile.

Just as a concept, I could see this as a saner way to approach E-core spam in modern CPUs: use as many P cores as makes sense, sprinkle a few E cores for consumer low power needs, then invest die area in something that is solely focused on parallel computing. The "ubercompiler" part is curbing all the enthusiasm though.

  1. Flow systems will provide full binary-level backwards compatibility with the existing software base and tools with current performance level via the frontend.
  2. If the current programs are recompiled for the Flow system with our compiler (currently under development), the compiler recognizes patterns that can easily be targeted for backend execution and compiles the code accordingly leading to increased performance. We plan to port a set of key libraries to utilize Flow Computing so that if the programs employs them, it will have further performance boost.
  3. We aim to provide a tool for helping migration by recognizing additional patterns that can be potentially executed in the backend with a help of the programmer.
  4. Full performance and simplicity of native and natural parallel programming can be achieved if the application is written for the Flow system from the beginning. This simplifies greatly the parallel parts of the program. Utilizing this for future software development (and high school/university education) improves the productivity of software engineering and makes usage of explicit parallel algorithms available also for average programmers.
 
Reactions: marees

soresu

Diamond Member
Dec 19, 2014
3,323
2,599
136
how much more performant would it be vs AMD's unified software stack (or is that just smokes & mirrors too)
The unified software stack is mainly about reducing the amount of redundant coding for multiple HW architectures to support the same software.

(also why re-unifying the consumer and datacenter GPU µArchs is a priority to reduce R&D redundancy and waste for both AMD and customers)

Chip level interconnect bandwidth and latency will have a lot to do with how effective it is.
 

marees

Senior member
Apr 28, 2024
578
639
96
The unified software stack is mainly about reducing the amount of redundant coding for multiple HW architectures to support the same software.

(also why re-unifying the consumer and datacenter GPU µArchs is a priority to reduce R&D redundancy and waste for both AMD and customers)

Chip level interconnect bandwidth and latency will have a lot to do with how effective it is.
Are you referring to UDNA ?

I am referring to the below:

AMD is working to introduce a Unified AI Software Stack around the end of the calendar year. Simply put, it aims to ensure a performant and optimally accelerated AI experience whether it means tasking your CPU cores, NPU, or GPU(s) with AI workloads.
 
sale-70-410-exam    | Exam-200-125-pdf    | we-sale-70-410-exam    | hot-sale-70-410-exam    | Latest-exam-700-603-Dumps    | Dumps-98-363-exams-date    | Certs-200-125-date    | Dumps-300-075-exams-date    | hot-sale-book-C8010-726-book    | Hot-Sale-200-310-Exam    | Exam-Description-200-310-dumps?    | hot-sale-book-200-125-book    | Latest-Updated-300-209-Exam    | Dumps-210-260-exams-date    | Download-200-125-Exam-PDF    | Exam-Description-300-101-dumps    | Certs-300-101-date    | Hot-Sale-300-075-Exam    | Latest-exam-200-125-Dumps    | Exam-Description-200-125-dumps    | Latest-Updated-300-075-Exam    | hot-sale-book-210-260-book    | Dumps-200-901-exams-date    | Certs-200-901-date    | Latest-exam-1Z0-062-Dumps    | Hot-Sale-1Z0-062-Exam    | Certs-CSSLP-date    | 100%-Pass-70-383-Exams    | Latest-JN0-360-real-exam-questions    | 100%-Pass-4A0-100-Real-Exam-Questions    | Dumps-300-135-exams-date    | Passed-200-105-Tech-Exams    | Latest-Updated-200-310-Exam    | Download-300-070-Exam-PDF    | Hot-Sale-JN0-360-Exam    | 100%-Pass-JN0-360-Exams    | 100%-Pass-JN0-360-Real-Exam-Questions    | Dumps-JN0-360-exams-date    | Exam-Description-1Z0-876-dumps    | Latest-exam-1Z0-876-Dumps    | Dumps-HPE0-Y53-exams-date    | 2017-Latest-HPE0-Y53-Exam    | 100%-Pass-HPE0-Y53-Real-Exam-Questions    | Pass-4A0-100-Exam    | Latest-4A0-100-Questions    | Dumps-98-365-exams-date    | 2017-Latest-98-365-Exam    | 100%-Pass-VCS-254-Exams    | 2017-Latest-VCS-273-Exam    | Dumps-200-355-exams-date    | 2017-Latest-300-320-Exam    | Pass-300-101-Exam    | 100%-Pass-300-115-Exams    |
http://www.portvapes.co.uk/    | http://www.portvapes.co.uk/    |