News

News

PS3 CPU & 360 CPU

Let us start off by just showing what Microsoft and Sony released to the public in regards to the cpus in both their machines. Many press releases in many different formats and or styles, but this is the gist of it.

360 Central processing unit (aka Xenon) # 90 nm process, 165 million transistors (65 nm process SOI revision in 2007) # Three symmetrical cores, each one SMT-capable and clocked at 3.2 GHz # One VMX-128 SIMD unit per core, dual threaded. # 128×128 register file for each hardware thread, 2 sets per VMX unit # 1 MB L2 cache (lockable by the GPU) # Dot product performance: 9.6 billion per second (33.6 billion combined with GPU) # 115 GFLOPS theoretical peak performance # ROM storing Microsoft private encrypted keys

360 CPU information provided by Microsoft

PlayStation 3 Central-processing unit (aka Cell Broadband Engine) # PowerPC-base Core @3.2GHz # 1 VMX vector unit per core # 512KB L2 cache # 7 x SPE @3.2GHz # 7 x 128b 128 SIMD GPRs # 7 x 256KB SRAM for SPE # Dot product performance 22.4 billion (51 billion combined with GPU) # 1 of 8 SPEs reserved for redundancy # Total floating point performance: 218 GFLOPS

PS3 CPU information provided by Sony

Now before I get into it I’d like to point out that while both consoles have powerful CPUs both Sony and Microsoft have played a dirty little numbers game with everyone… numbers that can easily be misinterpreted by most people to mean “The one with the highest numbers must be the better of the 2” and that isn’t how it works at all (atleast not all the time and here is the kicker both Sony and Microsoft want you to misinterpret the numbers).

Why isn’t a “higher is better” mentality always a safe bet? Simple really, one has to take into consideration important things like the architecture. To only concentrate on the raw numbers without understanding the specifics of how it operates can lead to mistakes like this example here “Midway has a car that can reach a top speed of 180MPH and Australia has a car that can reach a top speed of 90MPH.” Someone only looking at the raw numbers may assume “This is far too easy clearly Midway is going to win because his car goes up to 180MPH” Now did anyone stop to consider the fact that maybe Australia is not only the better driver of the 2, but his car has quicker acceleration plus better braking and the road they’ll be racing on is dripping wet and packed full of sharp turns which may prevent the more inexperienced driver from banking on all that raw speed?

Not the best of analogies, but this will teach everyone to be cautious when they see either side throwing around their Megahertz, dot products and GFLOPS. I’m not saying the numbers are 100% meaningless as there are numbers that are actually trustworthy, but its getting you all ready for what I’m about to tell you.

Dispelling Some of the Hype

Now there are people that look at the 360 having a triple core processor and the PS3 with the much publicized Cell Processor and start to wonder… # #1 How in God’s name can the 360 ship with a 3 core processor in November 2005 while there isn’t an available purchase for 3 or 4 core CPUs for desktop computers?

# #2 Why didn’t Intel or AMD manufacture and start selling such a processor at the same time or before the Xbox 360 shipped?

# #3 How can the cell have 1 Power PC core and, in addition to that, have 7 SPE, which are basically seven extra processors?

# #4 Everyone knows processors aren’t cheap and when you factor in everything else you need, it’s even more expensive. How can Microsoft get away with charging as low as $299 for the Xbox 360? How can Sony get away with charging as low as $500 for the PS3, when the processors themselves cost 90% of the PS3’s price or cost more than $500?

Marketing talk from Microsoft and Sony: The processors inside these machines are extremely powerful and cutting edge you literally have a supercomputer in your home as the Xbox 360 has 1 Teraflop worth of computing power and the PS3 has 2 Teraflops worth of computing power.

TRUTH: Both the 360 and PS3’s CPUs are heavily stripped down compared to what most of us are probably using on our desktop computers to view this article. Both consoles are labeled as 3.2GHZ, but they don’t offer performance comparable to that of a typical Athlon 64 3200+ or better than even an Athlon XP 2800+ CPU. The CPUs inside the Xbox 360 and PS3 are “In-Order Execution” CPUs with narrow execution cores, whereas what we use on our computers are classified as “Out-of-Order Execution” CPUs with wider execution cores.

The reason they can sell for so cheap is because they are not as robust or complex as what we have inside our computers. The execution theme in both the 360 and PS3’s CPUs is similar to that of what you would see in the original Intel Pentium Processor. (Not referring to the Pentium 2 3 or 4, but the original) This is because they’ve stripped out hardware designed to optimize the scheduling of instructions at runtime. As a result, neither the 360 nor PS3’s CPU contain an instruction window. Instead, instructions pass through the processor in the order in which they were fetched; hence both are “In-Order Execution” CPUs.

Marketing talk from Microsoft and Sony: Thanks to these multi-core processors developers will be able to multi-thread their games and get significant performance improvements and achieve Artificial Intelligence in games that people previously thought impossible for a videogame. It’ll be as if you’re playing with another living breathing human being.

TRUTH: “What is the big deal? How exactly does the fact that both processors being “In-Order Execution” CPUs hurt them? Well, see the 3.2GHZ clock speed for both CPUs? The type of nasty game code, full of branches, loops etc… that would’ve been greatly improved speedwise, thanks to out-of-order execution and a wider execution core is not there to help, so that 3.2GHZ actually performs slower than out-of-order execution CPUs available to desktop computer users.

This brings us to the very reason why both the PS3 and Xbox 360 are using multiple processors in an effort to combat the lack of an instruction window and the fact that they have a narrow execution core. It gets even better, because this very same code that they hope to speed up using parallelism on multiple cores isn’t by any means parallel programming friendly.

On the other hand, Graphics-related code is great on both these processors, as graphics code is nice and parallelism friendly. There is a reason people consider graphics accelerators to be the poster child for parallelism. As a matter of fact, it’s the most successful form of parallelism the field of computer science has ever witnessed. GPUs are able to get all transistors firing that actually produce a significant real world benefit to the people using the product.

For the CPU to become more like the GPU is the ultimate goal for many and AMD together with ATI seem to be going for it. The cell processor is actually one such attempt to do so, but it’s not yet at the level everyone had hoped. (Perhaps a bit early as a cell like CPU isn’t on Intel’s to do list until about 2015) Long story short, both Microsoft and Sony have given developers more than enough on the graphics side of things, but at the same time, are asking developers to do more with less on the aspects of the game unrelated to graphics.

A bit of review

# #1 Both consoles are using in-order execution CPUs that are half the speed of out-of-order execution processors when it comes to running most game code, especially the more troublesome type which contains branches, loops and pointers. # #2 The very code they’re hoping to get improved performance out of isn't the type to lend itself so easily to multi-threading… to say it's hard would be the understatement of the century.

Here is a bit of what John Carmack, technical director of id Software, has to say about this.

“I do somewhat question whether we might have been better off this generation having an out-of-order main processor, rather than splitting it all up into these multi-processor systems.”

“It’s probably a good thing for us to be getting with the program now, the first generation of titles coming out for both platforms will not be anywhere close to taking full advantage of all this extra capability, but maybe by the time the next generation of consoles roll around, the developers will be a little bit more comfortable with all of this and be able to get more benefit out of it.”

But it’s not a problem that I actually think is going to have a solution. I think it’s going to stay hard, I don’t think there’s going to be a silver bullet for parallel programming. There have been a lot of very smart people, researchers and so on, that have been working this problem for 20 years, and it doesn’t really look any more promising than it was before.”

Everyone should be aware that these processors while powerful and a leap over what the current generation consoles had, they aren’t the second coming they were marketed to be and what drives this point home even further is the fact that Multi-threaded programming on these CPUs will definitely not be achieved at the snap of a finger; the developers have their work cut out for them.

How is one CPU better than another?

GFLOPS is something that gets thrown around a lot, but it should be clear that the peak theoretical GFLOP numbers for both these CPUs are: # 115GFLOPS Theoretical Peak Performance for 360 CPU # 218GFLOPS Theoretical Peak Performance for PS3 CPU.

These CPU theories will not be achieved in real world performance. What IBM did when testing for theoretical peaks on both CPUs can't really be considered as representative of how the processors would actually perform in real world situations, because of the type of testing done is too controlled. It’s a much too perfect of an environment and game development is going to involve an unforgiving environment that doesn’t cater so well to the perfect environment the CPUs were tested under.

The GFLOP numbers for the PS3 were calculated based on 8 running SPE, so the fact that the PS3 uses only 6 SPE for game applications lowers the peak theoretical even further, as majority of the floating point work on the PS3’s CPU is done via the SPE. Each SPE has a peak theoretical of 25.6GFLOPS. So the total peak theoretical performance for all 6 SPE would be 153.6GFLOPS, but why is that number also not achievable?

In IBM’s controlled testing environment, their optimized code on 8 SPE only yielded a performance number of 155.5GFLOPS. If it took 8 SPE to achieve that, no way 6 will be able to and that testing was done in a fashion that didn’t model all the complexities of DMA and the memory system. Using a 1Kx1K matrix and 8 SPE they were able to achieve 73.4GFLOPS, but the PS3 uses 6 SPE for games and these tests were done in controlled environments. So going on this information, even 73.4GFLOPS is seemingly out of reach, showing us that Sony didn’t necessarily lie about the cell’s performance as they made clear the 218GFLOPS was “theoretical.” But just like Microsoft they definitely wanted you to misinterpret these numbers into believing they were achievable.

Even while taking all of this into consideration, the CPUs can’t reach those crazy performance numbers; the PS3’s cell still comfortably comes out on top in terms of overall floating point capability, but it should be known that the available power on the PS3’s cell will be significantly more difficult to harness than the available power on the 360’s CPU.

It’s also worth mentioning that even the PS2 CPU had more than twice the GFLOPS of the original Xbox’s CPU, but it didn’t necessarily lead it to being the performance winner. This time around, while the cell has the GFLOPS advantage, its advantage isn’t quite as big as the PS2 CPU had on the Xbox. This teaches us that there is more than one meter of real world performance.

The PS3’s cell processor has 1 Power PC core similar to that of the 3 Power PC cores sustaining the 360’s 3 core design (without the vmx-128 enhancements available on each of the 360’s cores) and 7 SPE (synergistic processing element). The 8th is disabled to improve yields. One of the SPE is used to run the PS3’s operating system while the other 6 are available for games. The reason the PS3’s CPU will be significantly more difficult to program for is because the CPU is asymmetric, unlike the 360’s CPU. Because of the PS3 CPU only having 1 PPE compared to the 360’s 3, all game control, scripting, AI and other branch intensive code will need to be crammed into two threads which share a very narrow execution core and no instruction window. The cell’s SPE will be unable to help out here as they are not as robust; hence, not fit for accelerating things such as AI, as it’s fairly branch intensive and the SPE lacks branch prediction capability entirely.

I’m sure people remember from the section detailing how the 360 and PS3’s processors are less robust compared to processors we use on our desktop computers and the consequences of being in order execution. Well the PS3’s SPE are further stripped down than even the Power PC Cores and, as a result, isn’t as capable of handling as many different types of code like the 1 Power PC Core available on the PS3’s cell or the 3 Power PC Cores available on the 360’s CPU. The problem with being asymmetric is when you program for the Power PC Core on the PS3 CPU, the method of programming you used to get the most out of that Power PC core is no longer effective when breaking off tasks for the SPE to work on. Going from the PPE to the SPE on the PS3 requires a different compiler and a different set of tools.

When you come to the realization that the key to making up for the CPU is in-order execution is the rather complicated parallel programming, you realize that the CPU being asymmetric and having just a single PPE makes something that was already extremely difficult even more difficult. So a developer’s job is harder when you factor in that the PS3 has a 512KB L2 cache which is half the size of the 360 CPU’s 1MB L2 cache… that single PPE the PS3 CPU has isn’t receiving much help with branches in the cache department.

Microsoft made a better decision from the perspective of the developer; it's still difficult, but much easier compared to working with the Cell architecture. The 360’s CPU isn’t asymmetric like the PS3’s cell and has 3 PPE as opposed to 1, but all 3 are robust enough to help handle the type of code only the PS3’s single PPE is capable of handling. When Microsoft says they have three times the general purpose processing power this is what they mean. Based on the simple fact that the 360 has 3 Power PC cores to the PS3’s 1, more processing power can be dedicated to helping with things such as game control AI, scripting and other types of branch intensive code.

From the perspective of a developer the 360’s CPU’s biggest advantage is that all 3 of the 360’s cores are identical, all run from the same memory pool and they’re synchronized, in addition to being cache coherent. You can just create an extra thread right in your program and have it do some work. This allows the developer to create very nice structures so if you know how to get the best possible performance out of one core you know how to get the best possible performance out of all 3 because they operate in perfect synch.

Each core on the 360’s processor is capable of performing 2 threads each (Think of it as similar to hyper threading), so the 360’s CPU is capable of handling 6 simultaneous running threads at once. This brings me to a very important advantage for the PS3’s Cell CPU, its concurrency. While the 360 CPU may be able to handle 6 processor threads simultaneously it still only has 3 physical CPU cores so every 2 threads must share processing power on a single core. Whereas with the PS3, it has 1 PPE and 6 SPE for games, which are like extra physical processors). If each of the PS3’s 6 SPE used for games are working on a specific task such as collision, cloth physics, animation, water surface simulation or particles, they wouldn’t need to worry about processing power being taken away from another part of the game because the SPE don’t share processing power.

The only cause for concern would be the 512KB L2 cache being shared by 7 simultaneous running SPE and a PPE, but that’s what developers are for; they work around things like this. In practice, this should allow PS3 games to potentially have more things going on at once than 360 games. Ignoring the difficulties of programming for the PS3 CPU, it should be known that the PS3’s CPU is very good when it comes to vertex-related operations because the PS3’s CPU handles graphics code better than the 360’s CPU. It is also possible that through good parallelism of physics code on the SPE that physics code could also run better on the PS3 CPU due to the concurrency advantage.

The 360 CPU however, due to its 3 symmetric General Purpose Cores, is not only much easier to program for than the cell, but having 3 PPE capable of handling things such as AI also means the 360’s CPU will be the better of the 2 CPUs when it comes to AI code. Either way we can look forward to great things from both CPUs in the future.

Before I end off, I’d like to point out a game that in my opinion, from a technical standpoint, is one the most brilliant uses of the PS3’s CPU. All things considered, such as in-order execution and the other complications of the architecture, Heavenly Sword is quite the standout in nearly every regard: incredible combat animations, awesome group enemy AI, and great physics. At the very least this is what I gathered from seeing videos of the E3 demo; it’s a reminder that regardless of the challenges, there are developers that are up to the challenge and its only going to get better with time.

The Impact of Blu-Ray

It’s no secret that Blu-Ray is something that Sony has been trying to push as a key differentiator between the PS3 and the 360, but I’ll never truly believe the reason Sony has given for the decision to make it a standard item inside the PS3. With that said, there is a question that needs to be asked. Have we yet to see any signs of Blu-Ray having a significant impact? The answer thus far is no. There has yet to be a PS3 title that is showcasing anything that isn’t also possible with the 360’s standard DVD. Will PS3 titles have better graphics due to Blu-Ray? It’s highly doubtful that Blu-Ray will lead to better graphics because the PS3, due to split memory pools containing 256MB worth of GDDR3 memory and 256MB worth of XDR memory can at best dedicate 256MB worth of ram to textures at any given moment whereas the 360 uses unified memory for a total of 512MB. That alone is a major limiting factor to Blu-Ray’s space advantage.

There are however methods to gain use of more ram as from e-mail exchanges with developers I found out that it’s possible for the PS3’s GPU (RSX) to texture from the XDR, but there is a penalty for texturing from the XDR ram because in order to do so it would need to travel over the cell’s FlexiO , which some developers have actually done. There is also another way of going about doing this which is by copying from the XDR to the GDDR3 memory kind of like a fast cache and just proceed to stream in the content. Even with such methods available, the Xbox 360 just has more memory to work with, especially when you factor in the costs associated with running the operating systems for both consoles which I’ll get into later.

There are many titles that are currently claiming to already be filling up an entire Blu-Ray disc, but I’d be lying to myself if I said I actually see visual signs of it. There just isn’t anything yet that makes me say “This is a result of having Blu-Ray”. From talking to developers none seem to be concerned with the 360’s disc space and there are some that say they expect disc space to become an issue only if games use a lot of high definition movie content. With the graphical horsepower we have today the need for CG video is dropping significantly, but even so 360 titles like Blue Dragon appear to have a healthy dose of CG video and “The Darkness” reportedly has over 4 hours worth of high def movie footage in addition to the actual game, all on the same disc.

Maybe Blu Ray will lead to longer games? Blue Dragon for the 360 has over 40 hours worth of gameplay and Elder Scrolls Oblivion has over 120 hours worth of gameplay. Then compression needs to be taken into consideration which has come a very long way and it’d be wise not to underestimate the real-time decompression abilities of these new consoles. Blu-Ray doesn’t necessarily mean developers will have no need to compress their game data because compression can help improve load times. Now am I saying that Blu-Ray is worthless? Of course not!

Naturally PS3 titles due to Blu-Ray will be able to hold more content than 360 games, but is it the type of content that will make or break a gaming experience? How much more exactly before it reaches a state of diminishing returns? Will games be over 400 hours in length to make use of Blu-Ray’s disc space? One thing that escapes most people is that anything that makes it into a game level is taking up space in memory. Now if the PS3 had a gigabyte maybe even 2 gigs worth of ram, for example, then in that case Blu-Ray would end up being a major factor between the 2 machines, but as of now it seems more like a luxury or convenience rather than a necessity.

While still doubtful that Blu-Ray will be anything more than a luxury this generation I’m still on the lookout for any signs of if Blu-Ray’s disc space will be a major factor this generation. Now don’t mistake this for some-anti-Blu-Ray rant because it’s nothing of the sort. I’m simply saying Blu-Ray’s impact hasn’t been quite as significant yet. It’ll be even harder to see the impact in the near future as a number of pretty big titles such as Assassin’s Creed, Grand Theft Auto 4 and Resident Evil 5 will be making their way to both the 360 & PS3.

Now between Blu-Rays immense amount of storage space and it being a new High Definition Movie format which will enable all PS3s to play Blu-Ray movies there seems to not be a single drawback to the inclusion of Blu-Ray…or is there? Well there is one drawback. The 360’s DVD drive pulls information off of a 12X DVD disc twice as fast as the PS3’s 2X Blu-Ray does off of a Blu-Ray disc. The 360’s 12 DVD drive has a speed of 16.5 megabytes per second compared to the PS3’s 2X Blu-ray drive which has a speed of 8.7 megabytes per second. I found this information regarding the speed of the ps3's blu ray drive and 360's DVD drive while reading a developer's rants on what he thought about both these consoles and he said both consoles are extremely powerful, but are neglecting something rather important. He says processing speed continues to increase, GPU performance continues to increase and the amount of available memory is increasing and yet there have been no such similar improvements as to how fast they can read data from the disc. He suggests either giving the console 1GB of ram or come up with a solution in the future. I’m not immediately counting out Blu-Ray because it deserves a chance to prove itself so I’ll still actively be on the lookout for any decisive signs which may make its true impact a bit more clear.

The Cost of the Operating System

This will be a much shorter section than the others just showing how much of each console’s resources are allocated to run the operating system.

Starting off with the Xbox 360 here is what we are looking at.

The 360 operating system is constantly running in the background and I’d go into what it offers, but everyone most likely knows by now so I’ll go straight to the resource allocations.

Everything comes at a cost and here are the costs for the 360. # 32MB of the 512mb of available GDDR3 RAM # 3% CPU time on Core1 and Core2 (nothing is reserved on Core0)

Microsoft still has room left from what they’ve already reserved for future updates.

Transitioning to the PS3’s Operating system here is what the resource allocation looks like.

Sony of course has decided to match Microsoft by using a constantly running in the background Operating system and here is what it includes. Again I will not be going into what it offers as most already know by now.

The costs for the PS3’s operating system are as follows # 32mb of the 256mb of available GDDR3 memory off the RSX chip # 64mb of the 256mb of available XDR memory off the Cell CPU # 1 SPE of 7 constantly reserved # 1 SPE of 7 able to be "taken" by the OS at a moments notice (games have to give it up if requested)

Now the thing that probably jumps at people the most is the fact that the 360 uses much less resources to run its operating system. How could this be? No answers were provided on exactly why (Nondisclosure Agreements Suck), and while I may have a couple of technical reasons for why that may explain it, I don’t truly know so I wont attempt to. I assume maybe it has something to do with Microsoft’s experience with Operating Systems. Or it could be the ps3’s browser nobody really knows and the people that do know wont talk.

RSX (PS3GPU) & Xenos (360GPU)

Alright let’s get underway the GPU inside the PS3 is NV47 based which is another name for the 7800GTX. It has 24 pixel shader pipelines and 8 vertex shader pipelines. It’s capable of 136 shader operations per clock and according to Sony it has 256MB of GDDR3 memory at 700MHZ and performs 74.8 billion shader operations per second. Sony also said it’s capable of 1.8 teraflops, which I can tell everyone right now with 100% confidence isn’t true (numbers game) I’m not entirely sure of all the little tricks they used to arrive at such an extreme flops number, but rest assured it isn’t a type of a performance this GPU will ever really achieve. PC videocards such as the X1900XTX have far more raw horsepower than either of the 2 videocards in either console and is pushing a GPU clock speed of up to 650MHZ (some have shipped at 675MHZ) along with 24 more pixel shader pipelines and yet the X1900XTX is just over 500GFLOPS so to even begin entertaining the thought that a less advanced GPU with significantly less raw power could brute force 1.3 teraflops better performance is wishful thinking, but there is no cause to be angry at Sony in this case as they are entitled to market their product regardless of how they choose to do it. As long as they avoid disturbingly untrue statements about the competition its all fair game as far as I’m concerned)

I’m sure some people are wondering how Sony came to the conclusion that the RSX does 136 shader operations per clock or even 74.8 billion shader ops per second? Easy

# The RSX has 24 pixel pipes (each of which performs 5.7 ops) 5.7ops *24 Pixel Pipelines=136.8 shader ops per clock.

# The RSX is clocked at 550MHZ *136 shader ops per clock =74800 (or 74,800,000,000)

There is talk and even an event which took place in Japan in which Sony attended claiming that the RSX will no longer be 550MHZ and it will instead be clocked at 500MHZ and the 256MB of GDDR3 will now be @650MHZ instead of 700. Now there is a lot pointing to this being true, but Sony still hasn’t officially come out and admitted so I’m not sure what to think, but this is a perfect opportunity to see if we learned how to calculate this stuff.

If the RSX is clocked at 500MHZ*136 shader ops per clock that would make the new shader operations per second for the RSX 68 billion instead of the original 74.8 billion weakening the GPU’s performance, but I guess we wont truly find out till the PS3 releases because if anyone has noticed Sony has never posted the RSX clockspeed on the official ps3 site nor did they re-iterate the RSX clockspeed at E3 06. The RSX has 20.8GB/s of video memory bandwidth from the GDDR3 ram. The RSX has an extra 32 GB/sec writing to the system's main memory. If the RSX can fully utilize the memory system it can achieve pushing out 58.2GB/s worth of pixel rendering to memory. The RSX is pretty much a 7800GTX class GPU in some cases its worse in some cases better, nothing that is really new. Now the same can’t be said about the 360’s GPU at all.

Now the 360’s GPU is one impressive piece of work and I’ll say from the get go it’s much more advanced than the PS3’s GPU so I’m not sure where to begin, but I’ll start with what Microsoft said about it. Microsoft said Xenos was clocked at 500MHZ and that it had 48-way parallel floating-point dynamically-scheduled shader pipelines (48 unified shader units or pipelines) along with a polygon performance of 500 Million triangles a second.

Before going any further I’ll clarify this 500 Million Triangles a second claim. Can the 360’s GPU actually achieve this? Yes it can, BUT there would be no pixels or color at all. It’s the triangle setup rate for the GPU and it isn’t surprising it has such a higher triangle setup rate due to it having 48 shaders units capable of performing vertex operations whereas all other released GPUs can only dedicate 8 shader units to vertex operations. The PS3 GPU’s triangle setup rate at 550MHZ is 275 million a second and if its 500MHZ will have 250 million a second. This is just the setup rate do NOT expect to see games with such an excessive number of polygons because it wont happen.

Microsoft also says it can also achieve a pixel-fillrate of 16Gigasamples per second. This GPU here inside the Xbox 360 is literally an early ATI R600, which when released by ATI for the pc will be a Directx 10 GPU. Xenos in a lot of areas manages to meet many of the requirements that would qualify it as a Directx 10 GPU, but falls short of the requirements in others. What I found interesting was Microsoft said the 360’s GPU could perform 48 billion shader operations per second back in 2005. However Bob Feldstein, VP of engineering for ATI, made it very clear that the 360’s GPU can perform 2 of those shaders per cycle so the 360’s GPU is actually capable of 96 billion shader operations per second.

To quote ATI on the 360’s GPU they say. "On chip, the shaders are organized in three SIMD engines with 16 processors per unit, for a total of 48 shaders. Each of these shaders is comprised of four ALUs that can execute a single operation per cycle, so that each shader unit can execute four floating-point ops per cycle." # 48 shader units * 4 ops per cycle = 192 shader ops per clock # Xenos is clocked at 500MHZ *192 shader ops per clock = 96 billion shader ops per second.

(Did anyone notice that each shader unit on the 360’s GPU doesn’t perform as many ops per pipe as the rsx? The 360 GPU makes up for it by having superior architecture, having many more pipes which operate more efficiently and along with more bandwidth.)

Did Microsoft just make a mistake or did they purposely misrepresent their GPU to lead Sony on? The 360’s GPU is revolutionary in the sense that it’s the first GPU to use a Unified Shader architecture. According to developers this is as big a change as when the vertex shader was first introduced and even then the inclusion of the vertex shader was merely an add-on not a major change like this. The 360’s GPU also has a daughter die right there on the chip containing 10MB of EDRAM. This EDRAM has a framebuffer bandwidth of 256GB/s which is more than 5 times what the RSX or any GPU for the pc has for its framebuffer (even higher than G80’s framebuffer).

Thanks to the efficiency of the 360 GPU’s unified shader architecture and this 10MB of EDRAM the GPU is able to achieve 4XFSAA at no performance cost. ATI and Microsoft’s goal was to eliminate memory bandwidth as a bottleneck and they seem to have succeeded. If there are any pc gamers out there they notice that when they turn on things such as AA or HDR the performance goes down that’s because those features eat bandwidth hence the efficiency of the GPU’s operation decreases as they are turned on. With the 360 HDR+4XAA simultaneously are like nothing to the GPU with proper use of the EDRAM. The EDRAM contains a 3D logic unit which has 192 Floating Point Unit processors inside. The logic unit will be able to exchange data with the 10MB of RAM at 2 Terabits a second. Things such as antialiasing, computing z depths or occlusion culling can happen on the EDRAM without impacting the GPU’s workload.

Xenos writes to this EDRAM for its framebuffer and it’s connected to it via a 32GB/sec connection (this number is extremely close to the theoretical because the EDRAM is right there on the 360 GPU’s daughter die.) Don’t forget the EDRAM has a bandwidth of 256GB/s and its only by dividing this 256GB/s by the initial 32GB/s that we get from the connection of Xenos to the EDRAM we find out that Xenos is capable of multiplying its effective bandwidth to the frame buffer by a factor of 8 when processing pixels that make use of the EDRAM, which includes HDR or AA and other things. This leads to a maximum of 32*8=256GB/s which, to say the least, is a very effective way of dealing with bandwidth intensive tasks.

In order for this to be possible developers would need to setup their rendering engine to take advantage of both the EDRAM and the available onboard 3D logic. If anyone is confused why the 32GB/s is being multiplied by 8 its because once data travels over the 32GB/s bus it is able to be processed 8 times by the EDRAM logic to the EDRAM memory at a rate of 256GB/s so for every 32GB/s you send over 256GB/s gets processed. This results in RSX being at a bandwidth disadvantage in comparison to Xenos. Needless to say the 360 not only has an overabundance of video memory bandwidth, but it also has amazing memory saving features. For example to get 720P with 4XFSAA on traditional architecture would require 28MB worth of memory. On the 360 only 16MB is required. There are also features in the 360's Direct3D API where developers are able to fit 2 128x128 textures into the same space required for one, for example. So even with all the memory and all the memory bandwidth, they are still very mindful of how it’s used.

I wasn’t too clear earlier on the difference between the RSX’s dedicated pixel and vertex shader pipelines compared to the 360s unified shader architecture. The 360 GPU has 48 unified pipelines capable of accepting either pixel or vertex shader operations whereas with the older dedicated pixel and vertex pipeline architecture that RSX uses when you are in a vertex heavy situation most of the 24 pixel pipes go idle instead of helping out with vertex work.

Or on the flip side in a pixel heavy situation those 8 vertex shader pipelines are just idle and don’t help out the pixel pipes (because they aren’t able to), but with the 360’s unified architecture in a vertex heavy situation for example none of the pipes go idle. All 48 unified pipelines are capable of helping with either pixel or vertex shader operations when needed so as a result efficiency is greatly improved and so is overall performance. When pipelines are forced to go idle because they lack the capability to help another set of pipelines accomplish their task it’s detrimental to performance. This inefficient manner is how all current GPUs operate including the PS3's RSX. The pipelines go idle because the pixel pipes aren't able to help the vertex pipes accomplish a task or vice versa. Whats even more impressive about this GPU is it by itself determines the balance of how many pipelines to dedicate to vertex or pixel shader operations at any given time a programmer is NOT needed to handle any of this the GPU takes care of all this itself in the quickest most efficient way possible. 1080p is not a smart resolution to target in any form this generation, but if 360 developers wanted to get serious about 1080p, thanks to Xenos, could actually outperform the ps3 in 1080p. (The less efficient GPU always shows its weaknesses against the competition in higher resolutions so the best way for the rsx to be competitive is to stick to 720P) In vertex shader limited situations the 360’s gpu will literally be 6 times faster than RSX. With a unified shader architecture things are much more efficient than previous architectures allowed (which is extremely important). The 360’s GPU for example is 95-99% efficient with 4XAA enabled. With traditional architecture there are design related roadblocks that prevent such efficiency. To avoid such roadblocks, which held back previous hardware, the 360 GPU design team created a complex system of hardware threading inside the chip itself. In this case, each thread is a program associated with the shader arrays. The Xbox 360 GPU can manage and maintain state information on 64 separate threads in hardware. There's a thread buffer inside the chip, and the GPU can switch between threads instantaneously in order to keep the shader arrays busy at all times.

Want to know why Xenos doesn’t need as much raw horsepower to outperform say something like the x1900xtx or the 7900GTX? It makes up for not having as much raw horsepower by actually being efficient enough to fully achieve its advertised performance numbers which is an impressive feat. The x1900xtx has a peak pixel fillrate of 10.4Gigasamples a second while the 7900GTX has a peak pixel fillrate of 15.6Gigasamples a second. Neither of them is actually able to achieve and sustain those peak fillrate performance numbers though due to not being efficient enough, but they get away with it in this case since they can also bank on all the raw power. The performance winner between the 7900GTX and the X1900XTX is actually the X1900XTX despite a lower pixel fillrate (especially in higher resolutions) because it has twice as many pixel pipes and is the more efficient of the 2. It’s just a testament as to how important efficiency is. Well how exactly can the mere 360 GPU stand up to both of those with only a 128 bit memory interface and 500MHZ? Well the 360 GPU with 4XFSAA enabled achieves AND sustains its peak fillrate of 16Gigasamples per second which is achieved by the combination of the unified shader architecture and the excessive amount of bandwidth which gives it the type of efficiency that allows it to outperform GPUs with far more raw horsepower. I guess it also helps that it’s the single most advanced GPU currently available anyway for purchase. Things get even better when you factor in the Xenos’ MEMEXPORT ability which allows it to enable “streamout” which opens the door for Xenos to achieve DX10 class functionality. A shame Microsoft chose to disable Xenos’ other 16 pipelines to improve yields and keep costs down. Not many are even aware that the 360’s GPU has the exact same number of pipelines as ATI’s unreleased R600, but to keep costs down and to make the GPU easier to manufacture, Microsoft chose to disable one of the shader arrays containing 16 pipelines. What MEMEXPORT does is it expands the graphics pipeline in more general purpose and programmable manner.

I’ll borrow a quote from Dave Baumann since he explains it rather well. “With the capability to fetch from anywhere in memory, perform arbitrary ALU operations and write the results back to memory, in conjunction with the raw floating point performance of the large shader ALU array, the MEMEXPORT facility does have the capability to achieve a wide range of fairly complex and general purpose operations; basically any operation that can be mapped to a wide SIMD array can be fairly efficiently achieved and in comparison to previous graphics pipelines it is achieved in fewer cycles and with lower latencies. For instance, this is probably the first time that general purpose physics calculation would be achievable, with a reasonable degree of success, on a graphics processor and is a big step towards the graphics processor becoming much more like a vector co-processor to the CPU.”

Conclusion

Hopefully this article has helped to dispell some rumors surrounding the processing power of these two great consoles and demonstrate some of the differences that give them their unique feel. There are many attempts on both sides to distort the numbers or misconstrue their importance, but looking at the features as a whole allows the opportunity to determine how these consoles will operate overall. While both consoles shine in some areas, they do have their softer spots. Ultimately, the good features of each of these consoles outweigh the bad and the amount of high quality games being released this winter will give Sony and Microsoft fans alike a lot to be happy about.

posted on Mar 4, 2007

In 1999 the Unreal Tournament series broke out onto the first-person shooter landscape and earned a huge following right out of the gate. The game's combination of vicious, lightning-fast gameplay, colorful, great-looking graphics, surprisingly excellent artificial intelligence, and outstanding multiplayer play made it a smash hit. Some years later, Epic returned with Unreal Tournament 2003, which was powered by all-new technology and noticeably different gameplay--something not all fans appreciated. However, Epic returned the following year with Unreal Tournament 2004, which improved everything on all fronts and added drivable vehicles in the new onslaught game mode, and it ended up being a fantastic game that still enjoys a sizable fan following from a very active community. And all the while, Epic has been making a name for itself with unprecedented support for its fan community, calling out the best fan-made content using Unreal technology in its annual Make Something Unreal contest, which awards prizes to the best fan-made maps and modifications.

So what's next?

How about an all-new game powered by the next generation of Epic's powerful Unreal engine, known as Unreal Engine 3? This time around, the game will be powered by the very same graphically impressive technology we've seen bits and pieces of at this year's and last year's Game Developers Conference in March. The engine will have support for advanced special effects, including high dynamic range lighting and bump offset mapping--which is an advanced form of lighting that can make a completely flat surface appear to have protruding features, like a brick wall built from jagged, uneven stones--and an all-new physics engine powered by Ageia's Novodex technology. "We've never been able to do an avalanche in-game before," says Epic president Mike Capps, referring to both the simulated mountain avalanche in this year's GDC demo and to the sorts of effects you'll see in the game.

Capps explains that beyond the graphics, UT 2007 will also feature improved gameplay, based on feedback from the fans and from Epic's own goals. According to Capps, the studio is "trying to make sure that UT 2007 is a mix of UT 2004 and [the original Unreal Tournament from 1999]," while maintaining the series' focus on multiplayer competition. "We want to own the deathmatch space." Competitive play is a key element in the series' success, so Epic definitely plans to keep head-to-head competition around in the form of deathmatch, team deathmatch, and one-on-one duels, as well as capture the flag. But what about the other modes? "Domination is currently not on the table," was Capps' answer. We'll probably have to bid domination (or "double domination" as it was more recently known) a fond farewell, since it was apparently the least popular multiplayer mode by far. Capps explains that the decision to remove this mode wasn't easy, but the team felt that it had fundamental issues. Since it focused on capturing two control points at opposite ends of a level, players often found themselves losing points while they went after one control point, only to find out that on the other side of the level, they had lost the other control point--something that was more or less completely out of their control.

And what about the popular assault mode, which presents team-based, goal-oriented gameplay, and onslaught mode, which is vehicle-based gameplay centered around capturing a network of control points? Onslaught is currently planned to make its triumphant return as it was in UT 2004, but the two modes will also be the proud parents of an all-new gameplay type: conquest. "Assault is the kind of thing we want to be bringing into conquest...a mix of [Unreal 2's] XMP, assault, and onslaught." Capps describes conquest as a much more evolved version of onslaught with the kind of directed gameplay and the exciting and varied environments you'd expect from assault, which has featured skyscrapers, moving trains, medieval castles, and many other environments.

Where onslaught focused on capturing control points to link up a network into an enemy base, conquest will instead focus on controlling actual territory, which will yield tyridium resources (the same crystalline energy source featured in Unreal Championship 2) when captured. The idea behind conquest is to create huge, expansive levels that take advantage of the engine's new content-streaming technology and use onslaught-style instant transport to jump to hot spots on the map where the action is. Epic apparently feels that one of the greatest strengths of onslaught was the way it could accommodate large groups of players in vehicles but also drive them to congregate around specific areas with the control point system, rather than have them wandering around the map aimlessly. While the overall format of the mode--whether it will be a persistent-world sort of game similar to Sony Online Entertainment's PlanetSide or more of a pick-up-and-play game like onslaught--has not yet been revealed, it should continue to support the variety of gameplay styles that Epic felt onslaught did. That is, like with onslaught, you should be able to approach the game as a team player looking to help your buddies capture a win, by providing covering fire, going after key targets, and generally not goofing off, but you should also be able to focus more on the deathmatch aspects of the mode, using weapons and vehicles to blast enemy players as a base defender. If anything, Epic wants to make conquest a mode that will appeal to a wide variety of players, from hardcore team-shooter fans to deathmatch specialists to everyone in between.

Flak Cannons for Everyone

Epic is already testing prototypes for the game, and apparently one aspect of the game that's receiving heavy scrutiny at the studio's Raleigh, North Carolina, headquarters is its weapon loadout. The final weapon selection is still being tweaked and tuned, so there are very few final details to report. We're told that there aren't any plans to add in any close-range melee beyond a default melee weapon for game balance reasons (that is, we're not looking at any Halo-style pistol-whipping or Unreal Championship-style melee). But join us in a few weeks for a report on the weapons that will be shown at E3, including three Unreal Tournament classics: the combination-firing shock rifle, the rocket launcher, and the flak cannon.

While the final status of all weapons isn't confirmed (Epic is still going over the list of weapons that may or may not make the cut), Capps said in no uncertain terms, "We can't ship the game without a flak cannon." The bright-yellow, insanely powerful short-range cannon/flak grenade launcher is pretty much an Unreal Tournament classic at this point and will definitely be making the cut, untouched (Xbox fans will recall that this was the only weapon that remained as it was in Unreal Championship 2 as well--largely because of how emblematic of the Unreal series the weapon has become). Otherwise, Epic is "trying to find a good balance between the deadliness of [the original Unreal Tournament's weapons] and UT 2004." And what about mutators, the optional gameplay tweaks that have let UT players play games in low gravity or with big heads? Whether all the same mutators from the previous games will make the cut in 2007 remains to be seen, but Capps confides, "We can't ship without instagib"--a mode that's so popular it was even included in the melee-heavy Unreal Championship 2.

When asked about the plan for vehicles in UT 2007, the answer we got was, "more, bigger, more." Yes, if all you want is a re-creation of ONS-Torlan, the now-famous multiplayer level featured in the UT 2004 demo, with that same environment and with those same vehicles, you'll be able to make that happen in UT 2007 using the game's modification tools, which we'll touch on later. But while UT 2004 featured vehicles created by the militaristic Axon Research Corporation (which largely ended up resembling futuristic Earth vehicles, such as tanks and buggies, as well as a few sci-vehicles like the manta hovercraft), the new game will feature an all-new line of vehicles from the Necris race--the pale-faced humanoid warriors who have been a part of the UT series since 1999. Details on this new line of vehicles remain under wraps, but the Necris will certainly have much more "outlandish" vehicles that look and handle very differently from those in 2004--and you'll be able to choose to play games using only Necris vehicles (or only Axon vehicles from 2004, or team games pitting one set of vehicles against the other).

The new game will also push even harder on one of the series' strongest, but perhaps not quite as famous, traits--the artificial intelligence used for its "bot characters" in the single-player game. Epic's Steven Polge, who was the AI programmer for the original Unreal Tournament, is the lead designer on UT 2007, and he and the development team are putting more emphasis on voice chat--both how characters react to commands and also what sort of on-the-fly chatter bots will use. Currently, the team hopes to make UT 2007's bots just as chatty as they were before but with more context--for instance, they may call out to you that they've sighted two enemies coming up over the next hill (rather than giving out a more-generic warning that some enemies have been sighted somewhere). The new game will also continue to support the less-well-known voice commands that were featured in UT 2004--if you have a headset, you'll still be able to give vocal commands to your teammates to converge on key points or give you backup.

The game's single-player mode is currently planned to have a similar structure to UT 2004's--a single-player tournament ladder through multiple play modes where different computer-controlled characters challenge you for rankings. Epic acknowledges that multiplayer play is the core of Unreal Tournament, but apparently a great many UT players surveyed stated that they generally played the game offline--this is due in no small part to the excellent computer AI, which has always followed orders well, acted on objectives, and put up a surprisingly good fight (without always resorting to unfair superhuman tactics). The single-player game will even advance the story and timelines of the Unreal universe beyond what was seen in the previous PC and Xbox games; you can expect to see reappearances from key characters like the Earth soldiers-turned-gladiators Malcom, Brock, and Lauren, for instance, now older and wiser.

When asked about Epic's plans to expand outreach to the fan community (beyond the studio's already unprecedented support), Capps replied emphatically, "Yes, we really, really want that." Though nothing has been officially announced, Epic apparently has no plans of letting up on its community support, including its Make Something Unreal contest. Epic is also apparently looking to create built-in functionality for things like clan Web pages, matchmaking, and player stats all in-game (but it plans to run master servers for online play just like it has been all along, rather than rely on third-party applications). Epic aims to "at least have the functionality of Xbox Live," a service that has long offered features such as real-time voice-over-Internet and friends lists and is one that the developer is no stranger to. Epic is even investigating ways to help call out the best fan-created content, possibly as an in-game browser that can automatically queue up downloads (so that even casual fans can find the 10 best maps and mods quickly and easily). Regardless, the plan back at the home office is to keep "giving away lots of cool free stuff" in the form of extra content after launch and possibly to revisit the game at a later time with a special-edition packed with all the postrelease content. Any way you slice it, Unreal Tournament 2007 will have a whole lot to offer: all-new gameplay on a massive scale, an entirely new line of vehicles, expanded community support, and graphics and physics that are truly a generation ahead. Stay tuned to GameSpot for more updates on the game leading up to its release next year.

posted on Jul 27, 2006

The year 2001 marked the release one of the best uses of the Quake III engine (since Quake III itself), Return to Castle Wolfenstein, and shortly afterward, developer Splash Damage, in collaboration with id Software and Activision, released a free multiplayer add-on, Enemy Territory. The game was extremely well-received by fans, not only because it was free, but also because it really expanded on the original Wolfenstein's multiplayer--a team-based mode that let players choose to play as medics, assault troopers, and engineers, among others, while embarking on specific, directed objectives (like stealing key documents or destroying gun emplacements). And there was that whole "free" thing, which we may have mentioned. Now, the creators of the first Enemy Territory add-on are working on an all-new game, Enemy Territory: Quake Wars.

This new game will at once be very different, but also very similar, to the original Enemy Territory. Perhaps the most obvious (and possibly the most jarring) difference will be the fact that Enemy Territory will be a full-on retail product that id and Splash Damage are supporting with as much of a budget and production resources as they would expend on any other top-tier game. Why put the second Enemy Territory game on store shelves with a price tag? According to id Software's Kevin Cloud, this was a decision that had been made some time ago, and it wasn't a sudden change of heart. Cloud and Splash Damage managing director Paul Wedgwood agreed some time ago to approach the next game as a full-on retail product that will expand upon everything that made the original game great, but will also add lots of new content and features that should hopefully justify the ticket price.

Enemy Territory: Quake Wars will be an online-only multiplayer shooter that will actually take place just before the beginning of Quake II (when the warlike aliens known as the strogg had invaded the Earth). Quake Wars will take place during the initial invasion and will pit two teams against each other: the strogg and a united group of human soldiers known as Earth Defense Force (or EDF, for short). The EDF will be armed with fairly conventional weaponry, like assault rifles and grenade launchers, as well as four-wheeled jeep-class vehicles and tanks. The strogg will possess alien weaponry, and they'll be divided into totally different character classes. And yes, by the way, we did say vehicles.

Many of the missions in Quake Wars will take place in huge outdoor areas in which the fastest mode of transport will be in a vehicle, and thanks to reworked physics that have been appropriated in part from the Doom III engine, vehicles will handle and deform realistically (we watched a demonstration as part of a promotional trailer that showed a jeep skidding off the side of a road, only to have its tires shot off a moment later). And yes, some parts of Doom III technology are being used for Quake Wars. But we can say with complete certainty that Quake Wars does not look like Doom III. This is due at least in part to the huge and surprisingly detailed outdoor areas that are possible, thanks to an all-new "megatexture" mapping technology developed by id Software programming guru John Carmack. The megatexture is essentially one huge, continuous texture map that can stretch all the way to the horizon, without any need for fog or other effects to mask a limited draw distance or texture tiles that repeat and show seams at the edges.

When asked about what kind of game modes Quake Wars would have, Cloud offered that the concept of game modes didn't really apply, since each of the game's missions will play out very differently. Some missions will be night-ops missions that will involve tense, close-quarters battles augmented by the way the game will realistically model the physical properties of objects (walking on steel-plated floors will be noisier than walking on grass, for instance). However, in all missions, players will be rewarded not only for having excellent aim, but also for contributing to the team goals with points that will add to a persistent ranking system. The game will also further encourage teamwork by giving each player specific goals that should help new players figure out how to play their character classes, and should give players of all skill levels a good indication, at a glance, of what needs to be done to help win the match (beyond "hey, I'm bored, I think I'll grab a vehicle that other players need and drive around someplace"). In any case, both Wedgwood and Cloud reaffirmed their dedication to making Enemy Territory a challenging but balanced game that rewards skill and good teamwork, citing their lengthy development and support for the previous game as proof.

Enemy Territory: Quake Wars certainly looks great from a visual standpoint, but it should be interesting to see whether it will be able to go toe-to-toe with heavy-hitter vehicle-based shooters like the soon-to-be-released Battlefield 2. Quake Wars is tentatively scheduled for release sometime next year

posted on Jul 25, 2006
The year is 2142, and the dawn of a new ice age has thrown the world into a panic. The math is simple and brutal: The soil not covered by ice can only feed a fraction of the Earth's population. Some will live, most will die. Players will choose to fight for one of two military superpowers in an epic battle for survival, the European Union or the newly formed Pan Asian Coalition. Armed with a devastating arsenal of hi-tech assault rifles, cloaking devices and sentry guns, players will also do battle using some of the most imposing vehicles known to man. Massive battle Mechs wage fierce combat on the ground, while futuristic aircraft rule the skies. When facing one of these new behemoths, players will need to use their wits and an arsenal of new countermeasures like EMP grenades to level the playing field. So what do I think? This is really big news! The Battlefield developers are currently hard at work developing "Battlefield 2142", which is scheduled for release this Autumn. As you'll have guessed, this is a departure for the BF series as we're going into the future - something that a lot of you wanted to see happen in the BF universe. There creating a ton of cool features that you won't have seen in any BF before, including cloaking devices, mines that follow you, awesome new rifles, guns and grenades, and of course, MECHS! There developing this in DICE Stockholm while the DICE Canada studio will be hard at work on a 1.3 update for Battlefield 2, as well as the Armoured Fury Booster Pack. This is going to be a Battlefield worthy of my money and I can't wait. My only hope is they don't screw this one up with RANKED servers . I for one will no doubt get a Battlefield 2142 Game Server. See you on the Battlefield
posted on Jul 14, 2006